In the era of transformer based language models, there has been one trend that has stood out above all others: the belief that scaling these models will improve purely as a result of scaling. It is a controversial theory, as many believe that it is a matter of architecture.

Will intelligence emerge as a result of increasing size and complexity? Or do we need to go back to the building blocks for developing genuine intelligence.

We will examine an essay written by Gwern Branwen presenting the concept, and a video summarizing the essay.


Essence of the Scaling Hypothesis

When neural networks were first introduced, they were a novel concept. The idea of using a network of neurons to solve problems was revolutionary. However, it was not until the introduction of the transformer architecture that neural networks truly began to shine.

Many before believed that we required more complex architectures to solve problems. But what if the key to solving problems was simply scaling up the size of the network? What if intelligence emerges naturally as a result of scaling?


Holes in the Idea

While we have noticed that scaling up the size of the network has led to improved performance, it is important to note that this is not always the case. There are many factors that can affect the performance of a neural network, and simply scaling up the size of the network is not always the solution.

Not to mention, despite all the work that has been done to understand the nature of the human brain, we still barely understand anything about the underlying concepts of intelligence and consciousness. It is possible that we are simply scratching the surface of what is possible with neural networks, and that there is still much to be learned.

Or, maybe the key to intelligence is a completely different approach altogether, where neural networks are simply a stepping stone on the path to understanding the true nature of intelligence.