Nobel Prize to the Statistical Physics of artificial neural networks
Why that's about physics and can't be otherwise
Is #ComplexityThoughts intriguing to you? Follow us and share your thoughts! Your feedback is invaluable for this free service, and it helps shape the frequency and quality of our newsletter. Thank you for being part of our journey!
NEWS: Check out 'Our Podcast' for short discussions on our posts, now available on Spotify!
A few minutes ago the Royal Swedish Academy of Science has announced the Nobel Prizes in Physics 2024. The awardees are John J. Hopfield (Princeton University, NJ, USA) and Geoffrey E. Hinton (University of Toronto, Canada):
“for foundational discoveries and inventions that enable machine learning with artificial neural networks”
Roughly speaking, we could summarize their contributions as statistical physics of artificial neural networks or the statistical physics foundations of machine learning.
Artificial neural networks, which are somehow inspired by the brain's structure, consist of nodes connected like synapses, and are trained by adjusting the strength of these connections based on input data. Hopfield developed a network that recreates patterns using energy minimization, while Hinton expanded on this with the introduction of Boltzmann machines, statistical physics-based systems that learn to recognize and generate patterns, providing groundwork for modern machine learning.
Their models are deeply connected to physical principles like energy minimization, showing how ideas from the physical world (especially from spin glass theory) can inspire machines to learn. But how??
A statistical physics of machine learning?
Statistical physics and machine learning might sound like separate domains, but both deal with complex systems made up of many interacting components (I will further explore this point in future posts).
In physics, we study how particles or systems’ units interact and evolve toward stable states. In machine learning, we study how neurons (or artificial neurons) interact to learn patterns directly from data.
The connection lies in energy minimization (see this paper for a recent review): both approaches define an energy function to describe the stability of a system, and the optimization of this function helps to find optimal configurations that correspond to useful patterns or memories.
Hopfield networks
Hopfield introduced his network in 1982, leveraging ideas from statistical mechanics to model memory. Its model is a recurrent neural network where each neuron interacts with every other neuron: the system’s dynamics are described by an energy function, which resembles the Ising model from physics
where s_i is a number (either -1 or +1) that represents the binary state of neuron i (similarly to spin in the Ising model), w_{ij} is the weight of the connection between neurons i and j, and h_i is a bias term applied to each neuron (similar to an external field).
The Hopfield network minimizes this energy as it processes input, similarly to how a system of interacting particles, governed by the Ising model, evolves toward a state of minimal energy. The network iterates through neuron states to recall a stored memory, even if the input is noisy or incomplete and the memory retrieval process is akin to a physical system finding its lowest energy configuration, a hallmark of energy-based models.
Boltzmann machines
Hinton extended the idea of energy minimization in machine learning with the Boltzmann machine in 1985. While Hopfield networks recall memories, Boltzmann machines take this a step further by directly learning patterns from data. Like Hopfield networks, they use an energy function:
However, Boltzmann machines incorporate stochastic behavior through the Boltzmann distribution:
where P(…) is the probability of the system being in a given state, Z is the partition function that correctly normalizes the probability and T is a “temperature” parameter controlling randomness.
Unlike Hopfield networks, Boltzmann machines introduce randomness in neuron states, allowing the network to explore different configurations, helping to avoid getting stuck in local minima of the energy function. In this way, Boltzmann machines are literally inspired by thermodynamics: just as particles in a gas can temporarily occupy higher-energy states due to thermal fluctuations, neurons in a Boltzmann machine have a small probability of being in a “wrong” state. This stochastic behavior allows the network to learn more robust patterns from data.
There are other distinctive features of this approach, but I refer to the mentioned papers for details.
Overall comment
The intersection of statistical physics and machine learning, embodied in models like Hopfield networks and Boltzmann machines, demonstrates how physical principles like energy minimization can be used to build powerful computational models. It is remarkable that these “simple” models have provided a fertile ground for many modern advances in machine learning, showing that the laws governing physical systems also apply to the world of artificial intelligence.
One may agree or disagree with the committee’s decision, but it is significant—and important—that these approaches, at the intersection of theoretical and computational physics, are now widely recognized as part of the field. Ten years ago, this recognition would likely have been improbable.
Strange that they picked the Boltzmann Machine when all the recent fuss has derived from LLMs which owe very little, if anything, to them. However, given that they did, I can't help feeling a bit sorry for Terry Sejnowski.