Fors: Marcus on AI: “A knockout blow for LLMs?”

Apple has a new paper; it’s pretty devastating to LLMs, a powerful followup to one from many of the same authors last year.

The new Apple paper adds to the force of Rao’s critique (and my own) by showing that even the latest of these new-fangled “reasoning models” still — even having scaled beyond o1 — fail to reason beyond the distribution reliably, on a whole bunch of classic problems, like the Tower of Hanoi. For anyone hoping that “reasoning” or “inference time compute” would get LLMs back on track, and take away the pain of m multiple failures at getting pure scaling to yield something worthy of the name GPT-5, this is bad news.

If you can’t use a billion dollar AI system to solve a problem that Herb Simon one of the actual “godfathers of AI”, current hype aside) solved with AI in 1957, and that first semester AI students solve routinely, the chances that models like Claude or o3 are going to reach AGI seem truly remote.
Gary Marcus

Nothing terribly surprising to this conclusion. As the author mentions in this newsletter, this is a known problem of the LLM architecture going back decades: neural networks perform well enough within the bounds of their training data, but can break down in unpredictable patterns when applied to tasks outside their training range. And so this relentless drive to replace good, old-fashioned deterministic algorithms, which are more power- and compute-efficient on top of that, with LLMs is a recipe for ballooning costs and uncomfortable failures.

Chart six from Apple's paper titled The Illusion of Thinking — Figure 6: Accuracy and thinking tokens vs. problem complexity for reasoning models across puzzle environments. As complexity increases, reasoning models initially spend more tokens while accuracy declines gradually, until a critical point where reasoning collapses—performance drops sharply and reasoning effort decreases. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

The timing of its release feels a little conspicuous though, considering Apple held its annual developer conference last week. To release a paper highlighting the flaws and shortcomings of LLMs while the company is actively evading to mention its own AI assistant Siri, it all makes it seem like Apple is laying out justifications for its slowness in the AI race, after they shamelessly rebranded the term as ‘Apple Intelligence’ last year. Regardless of their ultimate usefulness, the stock market seems to eat up LLM hype for the time being, which could dent Apple’s huge valuation over the medium term.

More remarkable though is that Sam Altman felt the need to address the paper’s findings on his blog, with the same overblown assertions that Humanity is close to building digital superintelligence. Between Altman and Musk, it would be high time for people to realize that what most founders declare publicly is closely tied to their business interests, that they promote narratives beneficial to them and downplay facts that would diminish their prospects.

Fors

Pages

17 June 2025

Marcus on AI: “A knockout blow for LLMs?”

No comments:

Post a Comment