Over the past three years, the AI/LLM boom and its associated capex supercycle has been having a profound effect on the economy, financial markets, and society (though not on corporate productivity, as we’ll discuss). However, as unrestrained AI optimism and its financial market manifestations have continued to mushroom, evidence has been quietly mounting that many of the boom’s foundational assumptions are starting to buckle. The divergence between the two has become so stark that there is a strong case to be made that the boom has now morphed into a fully-fledged bubble, and of a size that makes the dot.com bubble look positively quaint.
There is no denying that the capabilities of LLMs are amazing; I now use ChatGPT on a daily basis. Initially the hype seemed justified, which is why I refrained at first from viewing it as a bubble, despite my contrarian instincts. The capabilities of LLMs seemed impressive and were advancing at a rapid clip, with “scaling laws” promising more to come – perhaps all the way to AGI and beyond. The societal changes and productivity benefits that would accrue from such an advance would be profound.
But recent developments
are casting significant doubt on that trajectory, and people’s rabidly bullish
expectations appear increasingly decoupled from the evidence. As I will discuss,
the so-called “scaling laws”, which are a critical foundational assumption,
appear to be faltering, while the promised productivity benefits have thus
far utterly failed to materialize. LLMs increasingly appear to have structural
limitations in their reasoning capabilities and an incurable propensity to hallucinate,
which iterative LLM architectures may not be able to correct.
But these “inconvenient
truths” are being roundly ignored. Undeterred by recent setbacks, the industry
is plowing ahead with another order of magnitude increase in compute investment
to levels fast approaching half a trillion dollars a year, while stock market
and AI start-up valuations have continued to levitate. At this stage, a boom with
rational underpinnings appears to have morphed into an irrationally exuberant
bubble, and its bursting could have major consequences – both economic and
financial market.
Before proceeding
to my argument, a caveat is in order. AI is a rapidly evolving field, so any “etched
in time” views will always come with a significant risk of looking foolishly wrong
in hindsight. In situations of high uncertainty and rapid change like this, it
is best to keep one’s views fluid and adaptable to new evidence. But right now
there seems to be such a growing divergence between recent technology trends and
broader market perceptions that I feel compelled to write something.
To the
extent I am right, the implications will be truly massive. This really is a $10tr+ dollar question. The current AI capex supercycle is acting as a major economic
stimulus at a time when other parts of the economy are exhibiting signs of
weakness, and has supercharged the stock market even more given the large index
weighting and high multiples of its (now many) constituents. And it’s not just
the usual Mag-7 suspects: MSFT, GOOGL, AMZN, and META (expensive but not egregious);
TSLA (outrageously egregious as besides Elon-fever, it has come to be seen as an AI play – previously
autonomous vehicles/robotaxis, and now the promise of AI robots, notwithstanding failing to execute on the former); and NVDA
and AVGO (great companies but egregious if one does not believe the current
level of AI capex to be sustainable).
It also includes
all the chip supply chain companies (TSMC, and memory makers SK Hynix, Samsung,
Micron) and semi-equipment companies (ASML, Lam, Applied Materials, KLA). While multiples here are not outrageous, in an AI capex bust their earnings will fall precipitously. Far more egregious is perceived AI beneficiaries like Palantir (now sporting
a US$430bn market cap on just US$4bn of revenue), filtering down to companies
you wouldn’t expect to be AI plays like Axon Enterprises – a US$56bn market cap
taser and body camera company trading at 100x earnings as people believe they
will be able to apply AI to its copious data. The ranks of the anointed also now
include Oracle, which a few months ago added US$200bn in market cap in a single
day on signs it is becoming a major AI cloud computing play.
Moreover, Chinese
large cap tech companies have also recently surged (notably Alibaba), as it has come to be seen as an AI play instead of merely a structurally challenged e-commerce
company in an increasingly saturated and price-war prone market with poor
capital allocation to boot. Such is the level of AI mania at present that news
the company would spend US$50bn on AI data center capex (buybacks and dividends will have to wait further) has been rewarded with
a 50% pop, adding US$150bn in market cap. AI is also now estimated to comprise a low-single-digit (and rapidly growing) percentage of US electricity demand, so even utilities have rallied and are seen as AI-adjacent
plays. The AI boom's tentacles are long and varied. When you add all these up you are
talking about a material minority of the global market cap.
In short,
the sustainability of the current AI capex supercycle is absolutely critical to
the market (and economic) outlook, and with an asymmetry – valuations largely
assume the boom will continue unabated, whereas if the market is wrong, many
valuations could fall 50-90%. The stakes are huge and of a scale that, in a
bear scenario, could lead to a market rout of historical proportions, with the “LLM
craze” taking its place alongside the dot.com bubble, GFC, and covid as the notable
historic market events of this century.
This doesn’t
necessarily mean, by the way, that in the future AI won’t be transformative. The
internet turned out to be everything the bulls hoped and dreamed it would be in
the 1990s, and more. But we still had a huge bubble that burst in 2000-03. You
don’t have to be pessimistic about the long-term outlook for AI to believe there
is a huge mismatch between current expectations and the realistic medium-term outlook.
And economic realities also matter, not just technology trends.
Moreover, though
rapidly slowing, there will likely continue to be at least some progress in AI/LLMs.
The pertinent question is whether it will be enough to justify the prodigious economic costs. There
is already a huge mismatch between AI costs/capex and revenues, and it
appears likely to rapidly worsen from here. If that indeed proves to be
the case, a financial reckoning is only a matter of time.
From
justified hype to bubble; scaling laws “buckle”
Like most
bubbles, the LLM mania started out with solid foundations, but is now being
taken to morbid, irrational excess. The initial LLM hype was justified. Not
only did the output of early LLM models (particularly ChatGPT-3) wow and amaze,
but they seemed to be improving at a rapid clip and adhering to so-called “scaling
laws”. The latter was a belief/assumption/empirical observation that all that was
needed to generate more performant models was to (after scraping all data from
the internet) bring orders of magnitude more compute to bear, increase the
number of parameters, and train the models for longer, and boom, you got a
major increase in performance.
This
optimism was reinforced when scaling laws held up for ChatGPT-4 – an order of
magnitude more compute yielded a huge gain in performance exactly in line with scaling
law predictions. People were giddy with excitement and confidently extrapolated
the gains, with scaling laws seemingly implying a fast-track to AGI.
To the
extent these assumptions held, the hype was justified. AGI being just a few
years away – this was a very big deal indeed. This is what I believe caused
people such as Eric Schmidt to argue AI had actually been “under-hyped”. It was
going to be revolutionary in so many ways. Moreover, along the way it came to
believed that LLMs were not just statistically predicting text, but were also
developing an internal model of the world – imbibing not just tokenized words
but the underlying meaning behind them. After all, the best way to predict text
may be to actually understand the meaning the words represent. I also
subscribed to this view for a while, and it did seem to point to the genuine emergence
artificial intelligence. The future looked interesting indeed.
Houston,
we have a problem…
The problem is that recent evidence is calling into question many of these foundational assumptions. The most significant development is that the “scaling law” appears to be breaking down – more compute is no longer delivering proportionately meaningful gains in model performance. Indeed, it is even possible future models start to get worse on account of AI “pollution” of the training data set (discussed more below).
Moreover, evidence is also emerging that LLMs
have fundamental limitations in their capacity to reason, and in contrast to early speculative optimism, it appears they do not
in fact have internal models of the world and are instead simply
sophisticated imitation engines. Unreliable output, or “hallucinations”, are
proving persistent, and may in fact be an incurable feature of LLM
architectures, rather than merely temporary nuisances. To the extent this
proves to be the case, LLMs may be a dead end and genuine breakthroughs
in AI/AGI may require us to go “back to the drawing board” with RL and/or entirely new and more targeted architectures, potentially a tougher grind and setting us back decades relative to prior expectations.
The most significant development is that since ChatGPT-4, which was a major improvement over v3, scaling laws have started to break down. ChatGPT-5, which took more than 2yrs to develop and was released late (a possible indication OpenAI encountered problems with it behind closed doors) was a major disappointment, at best delivering only marginal improvements (some users even believe it inferior to ChatGPT-4) despite another order of magnitude of compute being brought to bear. More broadly, the pace of gains in foundation models appears to be rapidly decelerating and competing model performance with divergent access to computing resources are asymptotically converging, instead of the largest players pulling further and further ahead (which you’d expect in a scaling-law world).
Emphasis is
also shifting from pre-training to post-model training, including using human RL
techniques to tweak/improve outputs. If pretraining was still yielding huge “scaling
law” gains, they would not be bothering with post training – the effort would
not be worth it and dwarfed by pretraining gains. Often these tweaks are focused on narrowly boosting performance on various model benchmark tests, and so are fairly cosmetic in nature. This shift in emphasis is further
circumstantial evidence.
The scaling
law is of course not a law, but simply an empirical observation, assumption,
and hope. No such natural law exists that promises smarter LLMs in exchange for
more compute, and the assumption of infinite scaling rather than diminishing
returns was arguably always questionable, as observers like Gary Marcus have
long argued. If you have the same pool of data (most of the internet has
already been scraped), is it really reasonable to believe that each additional
order of magnitude of compute will yield the same exponential improvements,
rather than rapidly attenuating ones?
In addition
to scaling issues, there is also growing evidence that LLMs have fundamental
limitations in their reasoning capabilities. A recent influential study “The Illusion of Thinking” scrutinized chain-of-thought
LLMs and concluded that AI reasoning is more illusory than real. This is a fundamentally why “hallucinations” have not gone away, and it looks
increasingly likely they are an inherent feature of LLM architectures.
Moreover,
the idea LLMs have been developing an internal model of the world is also being
questioned. This is evident in the fact that, despite all the sophisticated output
they can generate, they still make very rudimentary mistakes. Cal Newport
highlighted that GPTs will sometime suggest illegal chess moves, which
demonstrates a fundamental lack of any “world model”
on chess, while Gary Marcus noted that diffusion models often give you the
wrong quantity of (for instance) tennis racquets when requested. This is consistent
with the models having no understanding of their generated output. LLMs are imitation
models, not reasoning models; sophisticated text guessing engines. They provide the illusion of intelligence, but
don’t actually understand anything. They can’t even differentiate between when
they are being trained and when they are being deployed.
Indeed, at
a more fundamental level it can be argued that LLMs are not intelligent at all.
In this excellent deep discussion on the true nature of intelligence (see also
this presentation here), Rich Sutton argues persuasively that LLMs are not
genuinely intelligent because they are not capable of learning and adapting
through goal-driven interaction with the real world. Unlike intelligent agents
like humans, they do not apply themselves and learn at the same time through active contact with their environment – they rely on training using second-hand human-acquired data expressed in text, and in deployment, they are in stasis and do not acquire new knowledge or adapt
to new experience. They therefore have a fundamental inability to iteratively learn
from the world they find themselves in.
Sutton
believes LLM's fundamental architecture makes the acquisition of genuine
intelligence impossible; the best they can do is synthesize (plagiarize?) existing
human-generated input, and reproduce it with some inherent degree of unreliability. Moreover,
they are incapable of generating genuinely new insights; if they were built
in 1900 for instance, they would be unable to come up with the theory
of relativity de novo before Einstein, as they have no capacity to interact with and adaptively
learn from the real world. If he is right about this, LLMs could prove to be a fundamental dead end in the pursuit of AI.
Another
issue not being talked about nearly enough is as more content on the internet
becomes AI generated, the training pool of data will increasingly become “polluted”.
AI was initially trained on 100% human input, and while that input is varied in
its quality, it at least represents the thoughtful reflections of human agents
acting in the real world. AI content however contains random LLM generated hallucinogenic
noise, and as LLM adoption grows and more and more of the scrapable online
content itself becomes AI generated, AI “pollution” will worsen, conceivably to
the point where the quality of models may actually start to decline (this is speculative,
but a real possibility).
This Kurzgesagt
video discusses this dynamic well. Their efforts to use AI to facilitate the generation
of content on brown dwarf planets led to them encountering many fake AI hallucinated
studies/facts, which their meticulous source checking uncovered. Subsequently,
other less rigorous channels have included the fake AI content in their videos,
which have acquired hundreds of thousands of views. In the future, AI models
will use videos with false AI hallucinated facts as authoritative training
sources, and they have no architectural way of correcting the errors as they
have no internal model of the world or intrinsic capacity to directly interact with the world or reason, only to
assimilate what they are fed. This potentially portends a major future problem,
as LLMs will be unable to distinguish between AI generated content and
human-generated content.
Consuming LLM
output is to some extent akin reading the output of a bad journalist. If you
know little of the underlying facts, the output seems credible and impressive.
But the more you know about actual events, the more factual inaccuracies are
apparent to you. LLM output is superficially impressive if you lack underlying domain
knowledge, but the more you know, the more the limitations of the LLM output
are apparent.
Such is the
degree to which faith and hype rather than first principled thinking is being
used today, is that “AI” has now become a synonym for “LLM”. But LLMs are not
artificial intelligence, they are a very particular neural network algorithm
that yields a sophisticated text guessing engine. That can be useful in some
contexts, but it increasingly appears as though they have fundamental limitations that
will not be a path to AGI; will not create genuine scientific breakthroughs;
nor trigger a productivity miracle.
Where
are the productivity benefits?
This brings
me to the next point – where are the vaunted productivity benefits? At the same
time as gains in LLM capabilities have started to rapidly attenuate (despite
the economic cost of those improvements rapidly mushrooming), more and more
studies have been emerging pointing to a conspicuous lack of productivity
benefits from LLM adoption (this ColdFusion video, from a techno-optimist no less, is a useful primer). A highly quoted recent MIT study found that as many
as 95% of companies that have tried to internally adopt AI have seen no productivity
benefits. Surprisingly, there has even been a study showing coders using
LLMs were less efficient. Cal Newport attributes this to less deep work and
reduced developer focus, in addition to the inevitable need to debug mistakes
in the GPT generated code.
Indeed,
there is even evidence it may be reducing corporate productivity through “workslop”,
which leads to the introduction of (sometimes well hidden) errors that require human effort to discover and correct. There are also doubtlessly inefficiencies associated with lower-level
employees being forced to scramble to meet management dictates to introduce “AI”
into workflows, which LLMs are currently fundamentally unable to deliver.
At the dawn
of the internet, it was widely believed it would be an unmitigated good for
human productivity by providing the world with easy and instantaneous access to information.
While we have seen some of this, we have also seen people reduce focus and sink
copious amounts of time into social media and online gaming. On net, it is questionable
whether there have been any net productivity benefits. In a perfect world LLMs
could boost knowledge acquisition, but it is probably just as likely it makes
people lazier; corrodes deep work and focus; and makes people more susceptible to
misinformation by obfuscating the delineation of genuinely authoritative sources, and by reducing the consumption of primary sources in favour of secondary, LLM-generated ones.
Personally,
I think in niche situations (such as for myself), LLMs can be a productivity
booster by speeding up knowledge acquisition, and my instinct is that it probably
can aid the best (or at least the most motivated) coders. However, LLMs are only useful as a supervised tool, not
a wholesale human replacement. They are most useful for people where “approximately
right” answers are good enough (such as stock research), and where a knowledgeable
operator with critical thinking can probe and question answers and independently
verify important claims. It is best viewed like talking to a knowledgeable human;
you can learn a lot but you also don’t/shouldn’t trust that everything they say
is 100% correct; you must triangulate it and interrogate answers that are
unsatisfying.
To that end,
LLMs are a useful resource for certain high-performing people in niche occupations,
but they will not cut it for most enterprise applications where 100% accuracy and
reliability are required to automate fundamental business processes, as LLMs are
unable to deliver that degree of dependability. All the talk of AI replacing
humans and yielding a productivity bonanza were all premised on an assumption
that scaling laws would hold and carry LLMs to AGI and hallucinations would
fade, but that naïve extrapolation is inconsistent with what is currently
happening.
LLMs seemed
(and perhaps were) too good to be true – the idea we could get to AGI simply by
hooking up more GPUs & letting em run. But these “inconvenient truths” mean
achieving genuine AI and its associated productivity benefits might turn out to
be much more difficult, if not intractable – like autonomous driving which has
been “just around the corner” for 15 years (and we are still only at L2 vs L5
for full autonomy).
The last
shot on goal
Notwithstanding
the above, the AI industry is nevertheless doubling down in the face of faltering
scaling laws and ramping capex meaningfully further. We have one more order of
magnitude to go, and we are evidently going to try it. OpenAI’s Stargate project
aims to bring as much as half a trillion dollars of compute to bear on its next
models. This ongoing commitment is one reason AI stocks have surged over the
past month despite tangible evidence piling up of a bubble, because in the short
to medium term the boom will roll on and picks and shovel providers will
continue to make bank.
But this latest
order of magnitude capex boost to increase LLM performance will be the last.
The next leg to US$5tr is not affordable even for the hyperscalers. Datacenters (including non-AI) already consumed 4.4% of US electricity in 2023 and would have grown sharply
since then, with a growing share of energy use coming from AI training (the US DoE
projects 6.7-12% of all US electricity demand coming from datacenters by 2028).
Let’s call it circa 2% for AI specifically at present. It can’t go to 20% and then 200% (granted more performant chips will help, but even with Nvidia’s heady gains in
chip performance – which will also attenuate with time – power consumption and
capex has been mushrooming).
If the next leg of capex fails to yield meaningful gains, the breakdown in scaling laws and the inherent limitations of LLMs will be undeniable. The most expensive frontier models will be perhaps two orders of magnitude uneconomic, forcing a pivot to more energy efficient, but less powerful models (a la DeepSeek), with radically scaled-down commercial potential. There will then be an inevitable collapse in capex (especially with the underlying chipsets themselves becoming more performant and quickly obsoleting legacy chips), and the industry will be left with copious amounts of overcapacity (housed mostly by the big cloud companies, as well as certain foundational model companies such as OpenAI, that are shelling out hundreds of billions of dollars annually for LLM GPU training clusters).
In this scenario,
the share prices of many AI companies will likely fall 90%+; Nvidia will most
likely fall 60-80%; chip supply chain companies will probably fall 50%; and the
hyperscale cloud compute players, who will be facing revenue pressures and
massively increased AI-kit depreciation charges, could also fall 30-50%, as cloud
earnings and growth crater and aggressive future growth expectations are
curtailed. And AI startups will fail in large numbers. OpenAI will likely survive
given that it is Microsoft backed and ChatGPT will still be useful and capable
of commanding a number of consumer subscriptions, but in a vastly diminished
form and at a significantly lower valuation.
But all of
this begs the question: why, given the evident deterioration in scaling laws, is
the industry still ramping capex like crazy? Do they know something we don’t? It
might have as much to do with them desperately trying to find a solution to
floundering scaling laws than their continuing unwavering belief in them. A “multi-shot”
approach (e.g. get 100 answers instead of 1 and aggregate them, and/or think
longer) is one possibility – another shot on goal they hope yields the targeted
performance improvements.
But the
truth is, foundational model companies and other AI start-ups have already gone
all in on LLM scaling laws and have no alternative/plan B. Most AI/foundation
model companies are losing prodigious amounts of money and need to keep the scaling
dream alive to keep raising capital. Given the whole AI ecosystem (rather than
pick and shovel cloud providers, who are making money for now) is losing money,
admitting defeat would cut off funding and put them all on a fast track to bankruptcy.
Moreover,
people do not want to risk taking a contrarian stance and being wrong. What if
LLMs do scale to AGI”? If you don’t invest, you’ll be left behind and look
foolish for dropping the ball – it might cost a CEO his job. In this interview,
Marc Zuckerberg used the rationalization “we can afford it, and we can’t afford
to risk missing out” to justify spending several hundred billion dollars on LLM
training kit. Meanwhile, the cloud companies are simply responding to strong
compute demand from the loss-making AI complex. They need to invest to meet
that demand or else cede share in what could be a major future growth driver
for the cloud computing business. But what if the demand dries up and they are left
with hundreds of billions of underutilized kit? They are willing to take that
risk, because they can afford it, and because they can’t afford to risk missing
out.
But what
about Jenson, who remains unremittingly bullish? Indeed, he is now taking major
principle stakes in AI companies/customers, including a US$100bn investment in OpenAI (which he
considers a sure bet)? I have long been a huge admirer of Jenson Huang, and
have followed Nvidia for 10 years (though never owned the stock). I have found
him highly competent, incredibly articulate, down to earth, and “no bullshit”. Despite
my contrarian instincts, for a long while I refrained from calling (or
believing) NVDA to be a bubble stock, as there was previously real substance to
the LLM boom while scaling was holding.
NVDA is a phenomenal company with
tremendous executional capabilities and a robust CUDA developer ecosystem, and their relentless pace
of innovation and execution makes it pragmatically impossible for anyone to
catch them – so long as the environment remains fast moving. All of the best people want to work there, to work with the best and brightest and have Nvidia
on their CV. Jenson has single handedly built one of the world's best companies and is a living legend.
But in
Jensen’s most recent interview (here), this was the first time it struck me
that he may have allowed himself to be swept up in the hype; it’s been a giddy
ride to date so can you blame him? Perhaps he is right? Or perhaps he is just
really good at running a chip company and has succumbed to hubris – he is,
after all, human (barely). Time will tell.
What to
do about it?
I don’t envy
large institutional managers. Like in the dot.com bubble, if you bet against it
too early, and/or are ultimately proven wrong, it can lead to disastrous
career-ending underperformance. However, if you are right and can weather
the near-term performance pressures, the bubble can create an opportunity for career-defining
outperformance by dodging the fallout. The current set up appears analogous.
Fortunately as a niche boutique manager, I can just choose to not play the game, but large institutional
managers have a tougher hand to play (though in reality most will just market
weight).
That being
said, now is a slightly easier moment, because stock valuations for the big AI
tech companies are already at giddy levels that embed years of very robust growth,
while the fundamentals have already started to deteriorate. That doesn’t mean if
you avoid the stocks, you can’t still underperform catastrophically as valuations go from high to absolute lunacy (and in
some cases already have), but it is nevertheless less risky to bet against LLM mania
now than it was a year or two ago. Still, markets can do crazy things for long
periods of time – good luck to you, you’ll need it!
For the
rest of us, the best opportunities may lie in purchasing stocks that have been
sold off on excessive AI disruption fears (I don’t like shorting for reasons
discussed in prior posts, and long dated put options are likely expensive atm).
Gartner is one such example, though the stock is only modestly cheap atm, and
the market multiple overall will likely decline in a major AI-led bear market.
I have identified one candidate in France but I’m not willing to share it at this
point as I’m still buying.
What if I’m
wrong? Then I lose the opportunity to buy very expensive stocks that probably
will not generate above-average returns in the long term from current levels
even if the LLM boom continues unabated; and even if they do, came attendant with very considerable ex ante risks. That is an opportunity I'm more than comfortable passing up.
That being said, I must reiterate the caveat I outlined earlier. Aside from this being a complex
and fast moving area where even career pros have widely differing opinions, I
am not a tech/AI expert and have spent only perhaps 1-2% of my time on it, though my views lean heavily on experts – particular credit
goes to Rich Sutton, Cal Newport (especially this interview), and Gary Marcus for the bear case; on the
bull case I have got the most insights from Jenson Huang and Geoffrey Hinton. I
could be wrong.
It will be
interesting to see how events develop. A bust is more likely to occur in 2027-28
than 2026 as we won’t see the outcome of recent scaling efforts until then, though it is always possible cracks in the veneer appear earlier and markets
succumb in advance (and any associated reduction in available funding for AI start-ups will have a reflexive impact on compute demand). But don’t bet the farm – at the
end of the day, “who the fuck knows” is the safest conclusion to reach on such
a complex, fast evolving issue.
LT3000