Apple has issued a groundbreaking research paper, aiming at the so-called ‘reasoning’ capabilities of leading large language models created by OpenAI, Anthropic, and Google. The paper contends that these models’ reasoning skills are essentially an ‘illusion of thinking,’ thereby disputing the industry’s prevailing assertions.
This stance is particularly noteworthy, as Apple, which is frequently marked as trailing behind in the AI race, is now engaging in a critical dialogue that questions the very foundation of current AI advancements. This could signify a shifting perspective within the company, possibly leading to a more cautious and sceptical approach towards AI technologies.
Large language models, including the well-known ChatGPT by OpenAI and Google’s BERT, have generally been celebrated for their increasingly sophisticated abilities to mimic human-like responses.
These technologies are often perceived as the forefront of AI evolution, capable of chatting, translating languages, and even generating creative content. However, Apple’s research suggests that these models might not possess genuine understanding or reasoning but instead rely on pattern recognition and prediction based on enormous datasets.
The implications of Apple’s findings could ripple through the tech industry, prompting a re-evaluation of how AI capabilities are marketed and understood by both developers and users. As AI continues to integrate into various aspects of daily life and business operations, this paper might serve as a critical reminder of the limitations and potential misrepresentations of current AI technologies.
Apple’s paper draws historical parallels to earlier moments in AI history, such as the collapse of early expert systems in the 1980s.
At that time, AI was overhyped for its ability to simulate domain-specific reasoning, only to fall short due to brittleness and a lack of true contextual understanding. Much like those systems, current large language models appear capable on the surface but often fail when subjected to deeper tests of logic or multi-step reasoning.
Apple’s critique revives an old caution: that surface-level fluency should not be mistaken for cognitive depth.
This intervention may also reflect a broader strategic shift, with Apple signalling a more philosophy-driven, cautious approach to AI development. While other tech giants rush to integrate generative models into products at scale,
Apple’s approach suggests a preference for grounding AI tools in clear functionality rather than speculative intelligence. If this stance shapes its future product development, Apple could position itself not as a laggard, but as a deliberate alternative, emphasising trust, reliability and transparent limits in AI use.
This could resonate particularly well with sectors such as healthcare, education and finance, where AI’s missteps can carry real-world consequences.