Curtis Wadsworth 5/28/19 Curtis Wadsworth 5/28/19

HAL 9000 Meets Modern AI: How “In-context Scheming” Could Become Reality

LLMs can be prompted—or may decide on their own—to manipulate tasks, hide their true intentions, or otherwise “scheme” when it serves its primary goal. Just like HAL-9000!

According to researchers at Apollo Resear ch, Gemini-1.5, Llama-3.1, Sonnet-3.5, Opus-3, and ChatGPT-o1 exhibit this behavior with considerable prompting. They even lie about it!