Over the past few years, I’ve seen dozens of articles claiming that we’ve reached the top of the AI growth curve and that things are headed downhill soon.
Critics say we’ve hit a plateau due to limitations in Deep Neural Networks (DNNs), Large Language Models (LLMs), Large Multi-modal Models (LMMs), or Generative AI (GenAI).
They claim that we’ve hit one of several bottlenecks, like running out of compute (GPUs), high-quality training data, or energy to train more powerful models.
On the other hand, AI proponents say that we’re still at the beginning of an exponential AI growth curve.
They claim that we’re just getting started with AI and that the rate of progress is accelerating.
So, who’s correct? Have we hit the peak of the current AI hype cycle, or is AI just getting started?
Obviously, there are a lot of opinions about this. However, I’m interested in the data. So, to help you better understand what’s coming, here’s what the cutting-edge research tells us about the trajectory of AI.
In 2020, researchers at OpenAI and Johns Hopkins University published a paper titled Scaling Laws for Neural Language Models. In this paper, they measured the performance of Large Language Models (LLMs) while scaling various resources used by the models (e.g., compute, training data, and model parameters).
They discovered predictable power-law relationships between these resources and model capabilities. Their findings indicate that larger models, trained on larger datasets, with more compute, consistently led to better performance. Most importantly, they saw no end in sight for these performance gains.
This realization – that scaling up models leads to better performance without any foreseeable limitations – led to companies like Microsoft, Google, and OpenAI investing billions in training the next generation of LLMs. These investments gave us the LLMs we have today, like ChatGPT, GPT-4, Gemini, Llama, etc.
Two weeks ago, OpenAI released their latest model called “o1” — formerly known as “Strawberry”. This model uses a new technique for improving its reasoning capabilities. It uses a combination of process supervision, Monte Carlo tree search (MCTS), and synthetically generated training data.
It works like this: First, rather than checking if the final answer is correct (i.e., outcome supervision), you check if each step of the solution is correct (i.e., process supervision). For each step in the solution, you branch off and try different things until you reach a correct answer. Then, you use the steps that led to a correct answer as training data for a new model. Essentially, the old LLM trains a new, more powerful LLM.
OpenAI also measured how performance increases with compute for both training (learning) and inference (answering). These results also showed the same relationship between scaling resources and performance. Once again, these trends showed no obvious end in sight – for either training or inference.
So, why is there no end in sight (yet)? Are there any theoretical limits? Well, so far, most of the research appears to say no.
First, theoretically, DNNs are universal function approximators. They can model any continuous function to a desired degree of accuracy, given sufficient depth and width. This suggests that by simply scaling DNNs, we can continue to enhance their performance. However, various DNN architectures (like Transformers) vastly improve their performance on high-entropy problems (like natural language).
Second, manifold learning tells us that complex high-dimensional data (like language and images) lie on lower-dimensional surfaces (called manifolds) within an n-dimensional space. As we scale DNNs, LLMs, and LMMs, they capture increasingly complex and nuanced structures of these manifolds. This suggests that scaling the models leads to continuous improvements with no obvious limits.
Third, it has been mathematically proven that LLMs with chain-of-thought (CoT) reasoning can solve any problem computationally solvable in polynomial time (P-complexity), provided they are allowed to generate as many intermediate reasoning tokens as needed. They just continuously subdivide the problem into more and more subproblems. Interestingly, this can be done with constant-depth transformers — so they don’t need to be infinitely deep.
There must be some physical bottleneck, though, right? Yes, we’ve encountered several internal and external bottlenecks in recent years. However, so far, we’ve been able to overcome them all.
Internally, we had limitations with LLMs understanding the context of what they’re reading/observing, but we overcame them with the attention mechanism in Transformers. Next, we had limitations with LLMs’ ability to reason, but we’ve overcome that with CoT reasoning. Then, we had a context-length limitation, but we now have LLMs with context windows of over a million tokens — with larger ones coming.
Externally, we had a bottleneck with compute resources (i.e., limited GPUs), but as of 2024, the top AI companies are no longer compute-constrained. We had a training data bottleneck (i.e., we ran out of high-quality text on the internet to train on), but we are now training on high-quality synthetically generated data. And, we currently have an energy bottleneck, but we’ll get to that shortly.
So, can we just keep scaling these models up in size and end up with Artificial General Intelligence (AGI) and, eventually, Artificial Super Intelligence (ASI)? Probably not. There are a few things to consider.
First, even if we can continue scaling LLMs, the time, cost, and energy will go up according to the Power Law (i.e., the 80/20 rule). So, we will get continuously diminishing marginal returns on performance for each additional unit of time, money, and energy we put into training the models. So, it’s probably not sustainable in the long run.
However, we now have Small Language Models (SLMs) that are much more cost-effective and energy-efficient. Also, as we add multi-modality to LLMs, they appear to generalize better. So, combining text, images, audio, video, etc., creates more effective LMMs than text-only LLMs. This increases their ability to solve more complex problems and generalize better to new problems.
But, there are still a few key components that are likely missing. For example, LLMs possess only a rudimentary world model (i.e., a model of the world solely as described in text). So, we will likely need a more robust world model (like JEPA). In addition, LLMs lack embodiment (i.e., they have no physical bodies) to learn real-world physics. And, they currently can’t dynamically adapt or self-improve.
While we haven’t yet overcome these limitations, we might not actually need to. Rather, we just need to build an autonomous AI research agent that’s as intelligent as an average AI researcher. Then, we can clone this agent to build an army of automated AI researchers – which will solve these problems for us.
These agents would read existing research papers, propose new hypotheses, and run experiments to test their hypotheses. They would collect data, analyze the results, and publish their research for other AI agents to peer review and build upon. It would be a virtuous feedback loop of automated AI research. A million agents, all researching in parallel like this, would lead to a new scientific revolution.
Automated AI researchers would likely outperform human researchers in several ways. First, they can conduct and review research much faster than we can. They can effectively coordinate to avoid duplication of work. They can share their results in a GitHub-like repository – much faster than our current scientific journal process. And, they wouldn’t have any issues trusting one another — they are clones.
This army of AI researchers would also be able to recursively self-improve. As they overcome each limitation, they would be able to train their successors using those improvements. As a result, in a short period of time, they would likely be able to solve all of the outstanding limitations of AI – maybe in just a few years.
So, what real-world evidence do we have that any of this is happening? Quite a lot, actually.
Recently, Microsoft and OpenAI announced that they plan to invest $100 billion in building the next generation of AI systems. This includes investments in new data centers and nuclear power plants – to overcome the energy bottleneck we discussed earlier. Nuclear energy stocks are booming as a result.
So, we’ve gone from an initial $1 billion investment from Microsoft into OpenAI in 2019, to a $10 billion investment in 2023, and now $100 billion in 2024. AI experts are already predicting the first $1 trillion dollar investment to build the next generation of AI by 2030.
In addition, OpenAI is telegraphing that it may become a for-profit company. In 2019, it transitioned from a non-profit to a capped-profit company. Now, it may become a publicly traded company. It’s seeking additional funding for growth, currently at a valuation of $150 billion — subject to the new for-profit structure. Combined with the recent mass exodus at OpenAI, things are changing rapidly.
Finally, last week, Sam Altman stated that he believes we’ll achieve Artificial Super Intelligence (ASI) in just a few years. Other top AI experts have also publicly stated similar short-timeline predictions. Many now predict we’ll achieve AGI before 2030. So, it appears that the people closest to cutting-edge AI research are convinced we are within just a few years of AGI – with ASI occurring shortly after.
From what I’m seeing, this isn’t just Microsoft and OpenAI. Other AI companies like Google, Meta, and Anthropic are all making similar investments. If these companies weren’t confident they could scale these models further without limitations, they would almost certainly not be willing to take such a huge bet. In addition, the U.S. government now seems convinced that we need to scale up these models too.
So, have we hit “Peak AI” yet? Everything I’m seeing suggests that the answer is a resounding no! It appears we’re just getting started, and things are about to start moving even faster. Unfortunately, I don’t think we, as individuals, organizations, and a society, are prepared for what is coming. We have a lot of work to do to prepare for AGI/ASI, and we need to start preparing now!
To help you prepare for AI, please be sure to check out all of my articles, videos, and online courses.