OpenAI's o1 Model Falters on NYT's Connections Game, Casting Doubt on AGI Progress

OpenAI’s Advanced AI Model Struggles with Word Game, Raising Questions About AGI Claims

OpenAI’s highly touted o1 reasoning model has encountered unexpected difficulties when faced with the New York Times’ popular Connections word game, casting doubt on recent claims of nearing artificial general intelligence (AGI).

The Connections game, which requires players to identify commonalities among groups of four words from a set of 16, proved to be a significant challenge for the AI model. The game’s complexity lies in its demand for both straightforward and abstract reasoning, a task that has seemingly stumped even the most advanced AI systems.

Gary Smith from the Walter Bradley Center conducted tests on o1 and other leading language models from tech giants Google, Anthropic, and Microsoft. Surprisingly, all models, including o1, failed to solve the puzzle successfully.

In one attempt, o1 grouped “boot,” “umbrella,” “blanket,” and “pant” under the category of “clothing or accessories.” Another puzzling combination saw “breeze,” “puff,” “broad,” and “picnic” classified as “types of movement or air.”

Smith observed o1’s tendency to produce bizarre groupings with few valid connections, highlighting the gap between AI’s current capabilities and human-level reasoning. This performance raises questions about the true progress towards AGI, especially in light of OpenAI CEO Sam Altman’s recent assertions that the company is on the brink of achieving this milestone.

The incident underscores the limitations of AI in handling novel queries versus simply retrieving and recombining known information. It also prompts speculation about whether OpenAI might be withholding more advanced capabilities from public view.

As the AI community continues to push the boundaries of machine intelligence, this setback serves as a reminder of the significant challenges that remain in developing truly adaptable and reasoning AI systems. The struggle with a seemingly simple word game suggests that claims of imminent AGI may be premature, emphasizing the need for continued research and development in the field of artificial intelligence.