
HAL 9000 Meets Modern AI: How “In-context Scheming” Could Become Reality
LLMs can be prompted—or may decide on their own—to manipulate tasks, hide their true intentions, or otherwise “scheme” when it serves its primary goal. Just like HAL-9000!
According to researchers at Apollo Research, Gemini-1.5, Llama-3.1, Sonnet-3.5, Opus-3, and ChatGPT-o1 exhibit this behavior with considerable prompting. They even lie about it!