17 June 2025

Marcus on AI: “A knockout blow for LLMs?”

Apple has a new paper; it’s pretty devastating to LLMs, a powerful followup to one from many of the same authors last year.


The new Apple paper adds to the force of Rao’s critique (and my own) by showing that even the latest of these new-fangled “reasoning models” still — even having scaled beyond o1 — fail to reason beyond the distribution reliably, on a whole bunch of classic problems, like the Tower of Hanoi. For anyone hoping that “reasoning” or “inference time compute” would get LLMs back on track, and take away the pain of m multiple failures at getting pure scaling to yield something worthy of the name GPT-5, this is bad news.


If you can’t use a billion dollar AI system to solve a problem that Herb Simon one of the actual “godfathers of AI”, current hype aside) solved with AI in 1957, and that first semester AI students solve routinely, the chances that models like Claude or o3 are going to reach AGI seem truly remote.

Gary Marcus

Nothing terribly surprising to this conclusion. As the author mentions in this newsletter, this is a known problem of the LLM architecture going back decades: neural networks perform well enough within the bounds of their training data, but can break down in unpredictable patterns when applied to tasks outside their training range. And so this relentless drive to replace good, old-fashioned deterministic algorithms, which are more power- and compute-efficient on top of that, with LLMs is a recipe for ballooning costs and uncomfortable failures.