What's going on with assembly theory?
Claims, controversial claims and merits after 60 years of complexity science
If you find Complexity Thoughts, click on the Like button, leave a comment, repost on Substack or share this post. It is the only feedback I can have for this free service.
The frequency and quality of this newsletter relies on social interactions. Thank you!
A few days ago, a new paper entitled “Assembly theory explains and quantifies selection and evolution” appeared on Nature, igniting online discussions and debates. Unfortunately, it also ignited some violent reactions from academic communities, outraged by the content of the paper, from its title to its (lack of) context and/or its interpretations. In the worst cases, some authors have been personally attacked, as in the case of Lee Cronin (one of the senior authors, together with Sara Imari Walker).
In this post I will try to provide a neutral overview of the ongoing debate:
starting from some background material;
moving to the ongoing debate around the paper;
commenting about the current state of academic practices.
I have shared my thoughts with some friends and colleagues, reporting here the replies from Ricard Solè, Sara Imari Walker and Hector Zenil providing, respectively, neutral, positive and negative perspectives on the debate.
This is not a hate-post: even if hate posts are popular and engage more users, the goal of this space is — and will always be — to discuss novel research in complexity science, one paper at a time.
Background
Imagine to put an extraordinary number of (very simple) interacting molecules inside your sterile laboratory glassware and to put them in some peculiar thermodynamic conditions, let’s say far from equilibrium, for a reasonable amount of time. How likely is to observe that those molecules start to extract energy from their environment and use it to create and maintain (ie., self-organize) novel structures and processes? Such as amino acids or nucleotides, or some form of primitive metabolism.
Poorly rephrasing: what is the probability that complex organic molecules first, and some complex structures like proteins then, emerge from the glassware as building blocks of a primordial living system?
Answering this question is far from being easy without some simplifying assumptions. For instance, one could assume fully independent probabilities for each molecule and demonstrate that a specific arrangement into increasingly ordered structures would not be observed within a time much larger than the current age of the universe. So far so good: there are no experiments, to date, observing the emergence of living organisms under our experimental setup.
In this post, let’s skip a review of current theories and debates (Prebiotic Chemistry, RNA World, Hydrothermal Vent, ...) about the origin of life and, instead, let’s focus on one of their common goals: explaining the transition from an abiotic (non-living) state to biotic (living) state. Finding the ultimate explanation for this process is one of the most formidable scientific challenges, ever.
On a related matter, let’s imagine that a rover like NASA’s Curiosity finds some complex molecules on Mars: another extraordinarily important task might be to determine if those molecules are the remnants of some evolutionary process (possibly leading to the emergence of living organisms at some point in history). This is one of the formidable goals of astrobiology (see the NASA’s program about this, if curious).
The common rationale behind the two examples described above is that one wants to quantitatively find signatures of evolution, by measuring the complexity of observed products (eg, complex molecules) under some hypothesis about the underlying mechanisms governing their formation.
One framework to rule them all: assembly theory
I have listened to assembly theory (AT from now on) for the first time during a closed-doors workshop about the origin of life, organized at the Santa Fe Institute by Chris Kempes and Ricard Solé. I was a curious listener, not giving a talk but allowed to ask questions. Following up, I have invited Sara Imari Walker for a talk in my lab, where she discussed, among other interesting results, some of the concepts that can be found in this 2021 paper.
The proposal is simple and elegant: consider (i) a set of fundamental building blocks (like colored squares or alphabet letters in the figure below), and (ii) a set of rules/mechanisms to assemble those units into new products, that can be later used as new additional building blocks. The formation of complex structures depends on specific assembly pathways: therefore the final product strictly depends on initial conditions and the “story” of the assembling building blocks. The smallest number of joining operations to create the final product is defined as assembly index.
The authors of that study have shown that one can use AT to analyze molecular assembly (MA), finding experimental correlations to mass spectrometry data to MA and MA analysis of mixtures. Using their words:
Our system avoids the potential of false negatives and allows us to search the universe agnostically for evidence of what life does rather than attempting to define what life is
Of course, this is a rather exciting outcome, that is aligned with Seth Lloyd’s prescriptions for measuring complexity: (1) How hard is it to describe? (2) How hard is it to create? (3) What is its degree of organization?
Some days ago, a new paper about AT appeared on Nature with a rather important title: “Assembly theory explains and quantifies selection and evolution”. It has been followed by a typical News & Views piece, very well (and maybe strategically) written by George F. R. Ellis, with a captivating title: “How purposeless physics underlies purposeful life”. Sara Imari Walker, one of the authors, commented:
If you cut through the noise, the primary epistemological debate seems centered on whether solving the origin of life requires new physics (call them ideas or theories if you prefer), or not. The "not" includes whether current concepts from any discipline (evolutionary theory, physics, complexity science etc) can solve it. Many of us working on origins of life, both foundationally and experimentally, see there really is a gap and current paradigms are inadequate. Assembly theory is uniquely poised to solve this problem and has many other exciting and deep ideas associated with it, as one might expect from a theory that could solve the origin of life. I am incredibly excited because these are ideas we can test, and it means we might hope to solve the problem of life's origins soon. — Sara Imari Walker
The debate
AT got some attention in the last couple of years. In a piece by Philip Ball, entitled “A New Idea for How to Assemble Life” and appeared on the popular Quanta Magazine, it is reported:
What’s missing from such previous complexity measures, Cronin said, is any sense of the history of the complex object — the measures don’t distinguish between an enzyme and a random polypeptide
In May 2023, another piece entitled “Time is an object” appeared on Aeon — published in association with the Santa Fe Institute, an Aeon Strategic Partner. In the piece, it is reported:
Assembly theory explains evolved objects, such as complex molecules, biospheres, and computers
as well as:
If the theory holds, its most radical philosophical implication is that time exists as a material property of the complex objects created by evolution. That is, just as Einstein radicalised our notion of time by unifying it with space, assembly theory points to a radically new conception of time by unifying it with matter
Together with the abstract of the recent Nature paper:
one could argue that AT has solved the problem of characterizing the origin of life (as well as many other problems in complexity science), by means of a breakthrough approach unifying time with matter, which is comparable in spirit to the one proposed by Einstein to unify time with space.
It is not surprising that the theory has attracted the attention of many other scientists: the stakes are quite high. Let’s try to follow a chronological order.
In his 2022 Medium post “The 8 fallacies of assembly theory”, still updated nowadays, Hector Zenil reports:
[…] almost five years before Assembly Theory we demonstrated how to separate organic from nonorganic compounds without reinventing the wheel or making bold unjustified claims, as the authors of Assembly Theory do
by leveraging on a quantitative result whose details can be found in a pre-print paper with his collaborators:
You can find here the 2018 paper mentioned by Zenil.
According to Zenil & col., the above figure shows that — by analyzing the same experimental data — it is possible to reproduce and even outperform the results obtained from molecular assembly (AT applied to molecules, roughly speaking), challenging the claim of AT authors that their measure of complexity is the only (or the best) one for such a task.
Organic molecules, however, have little to do with life, a pile of coal is organic. The fact that the authors of AT were not required to test their algorithm against anything else and that the authors of AT did not compare it to anything else, is astonishing and a failure of both the authors and scientific publishing. — Hector Zenil
In his blog, Zenil goes into details while arguing that complexity science has developed several tools to quantify the complexity that AT attempts to quantify, such as (Variable window length) block Shannon entropy, Huffman coding, LZW, Kolmogorov-Chaitin Algorithmic complexity, Solomonoff’s Algorithmic probability, Bennett’s Logical Depth, Resource-bounded Kolmogorov complexity, Block Decomposition Method. Furthermore, the results of Zenil & col. show that:
[…] despite the claims of experimental data, the assembly measure is driven mostly or only by InChI codes which had already been reported before to discriminate organic from inorganic compounds by other indexes.
I have requested Hector Zenil to provide some comments about the connection between AT and algorithmic complexity:
I find Assembly Theory (AT) to have multiple serious problems, among them, the theory mirrors algorithmic complexity to the point that feels borderline plagiarism. However, the methods do not correspond to the beautiful theory and they ended up implementing a Huffman coding scheme algorithm that was invented in the 1960s for the purpose of data compression. However, the authors of AT did it in the wrong way and their method therefore does not correspond to their intended method either (let alone the theory taken) as it does not count correctly as we have shown. — Hector Zenil
It should also be remarked the fact that “molecular units of remarkable simplicity self-assemble in solution to give single-molecule thin two-dimensional supramolecular polymers of defined boundaries” seems to be well known in organic chemistry and chemical biology (eg, see here). Similarly, the concept of molecular self-assembly (not AT) is widely used to explain the emergent complexity in simple organic salt, two-dimensional DNA crystals and the synthesis of nanostructures since the ‘90s. In a 2012 paper Ke et al have proposed a new approach (LEGO-like model) to 3D assembly that builds upon modular assembly of 2D DNA tiles.
By breaking the tradition of the action-reaction paradigm, another 2022 paper proposed a combination of processes with reciprocal and non-reciprocal interactions leading to non-equilibrium dynamical transitions between structures, that “can be implemented at different scales, from nucleic acids and peptides to proteins and colloids”. Again, from Zenil:
It is also hard to believe that an algorithm that simply counts repetitions (in this case, nested exact copies of molecules) can define life in any meaningful way. In fact, we already know it does not, even when it is widely known life is heavily hierarchical (nested). That was perhaps the first try to define life decades ago before scientists realised that this is the main type of generative mechanism of crystals and not only life. — Hector Zenil
All to say that — in the relevant fields — assembling fundamental units to build complex structures is quite a powerful idea circulating for more than three decades. Furthermore, there is a strong link between these results, their underlying theories and the theory of computation. Such a relation is fundamental and requires a connection to existing approaches, as stressed by Zenil:
The authors also ignore resource-bounded algorithmic complexity that are computable and not completely trivial, so it is false that being computable is their exclusive feature or, as we have argued, even a good one. Uncomputable or semi-computable means one can always improve and make the tool more precise as it is open not closed or trivial. — Hector Zenil
More recently, the debate moved online. Note that I am not going cover messages with personal attacks to any of the authors: I am against that type of approaches (more on this in the last section).
As a very starting point, there is an interesting exchange between Kasper Keep and Sara Imari Walker about selection and evolution, in the comments section of the paper (the exchange followed in an intricate web of replies on X/Twitter).
A balanced and interesting piece has been posted by Johannes Jäger, and I recommend to read that before going ahead, followed by this piece of Philip Ball supporting the idea that “Assembly theory might point to new directions in understanding molecular complexity”. I recommend to read Jäger’s post, since he also partially analyzes the relation between AT and physical time, where he concludes:
how can the time represented by assembly indices be a fundamental physical property of the universe, if it is crucially tied to the way the model is constructed?
I agree with Jäger that, at a fundamental level, the link with physical time is ill-defined and much work is needed to make a scientific claim.
Meanwhile, in social media platforms:
Ian Johnston argues that the new AT paper has value, although some parts (the title, specific sentences in the abstract, introduction and discussion) are not fully justified or can be even misleading. He remarks that a complexity measure quantifying the simplest possible assembly process needed to build a molecule has been introduced and later used to predict patterns of structural symmetry and complexity that arise in (biotic) evolution, while a connection between assembly history and (biotic) evolutionary history appeared in 2008.
Sergi Valverde points out that a minimal model for hyperbolic evolutionary dynamics — modeling combinatorial growth of innovations — was proposed by Solè et al in 2016. He also writes that very similar assembly ideas have been used already in the ‘70s by Atari developers to build games, while other ideas (such as tinkering and cultural production using the Polya urn model) have been proposed.
Palli Thordarson argues that the new AT paper is mostly a “chemistry paper” and has little to do in creating a “bridge between physics and biology”, while commenting as well about in favor of some points.
Carl Bergstrom writes that there is some cool science hidden in the paper, but “The main text of the paper is terribly written. Terribly.” and “Nature failed both the authors and its readers by publishing it in its present form.”
In a nutshell, many other suggested also the lack of a broader context, with seminal papers barely cited or not cited at all: work by Pross, Lanier and Williams, the thermodynamics hypothesis and multilevel learning, Nowak and Ohtsuki prevolution dynamics (this one proposing the simplest possible population dynamics that can produce information and complexity), Dyson’s theory, Kaufmann’s adjacent possible theory (and subsequent mathematical definitions, see this paper and references therein), Ulanowicz’s ascendancy theory (quantifying the ability of an ecosystem to use its organization and size to face disturbances) to mention some emblematic examples.
It is interesting to note that different scientific communities, as well as different scientists within the same community (such as the one of complexity science) all agree about core ideas involving assembly and evolution (broadly speaking). It is even more interesting to find some of those ideas even in “The architecture of complexity”, a 1962 paper by Herbert Simon (Nobel Prize in 1978 for Economics):
Complex systems will evolve from simple systems much more rapidly if there are stable intermediate forms than if there are not. The resulting complex forms in the former case will be hierarchic… Among possible complex forms, hierarchies are the ones that have the time to evolve. — Herbert Simon
I have requested Ricard Solè to provide comments regarding the connection with prior research and the potential of AT:
A general theoretical framework that seeks to understand how selection and evolution emerge should be able to account for some well-known case studies. This would include the generation of chemical diversity in Miller's experiment (which generates amino acids but is also a lot of chemical garbage) or artificial chemistries, where we understand the generative rules and the kinds of complexity that emerge. Moreover, a kinetic model of combinatorial growth might fail to provide relevant insight concerning the qualitative transitions that gave way to codes and error thresholds or protocells, that combine several key features (metabolism, information and container). AT might provide a mathematical description for some of these phenomena but needs to be expanded in other directions to adequately address many open questions regarding life's early evolution. — Ricard Solè
The academic arena
The freedom to break the status quo with a novel theory is what makes academia an exciting place. A place that should be firmly sustained by polite (sometimes colorful) and scientific exchange based on quantitative arguments. There should not be space for personal attacks: one might not like a paper or its conclusions, and even argue against the paper or write a new paper against it. It is famous the exchange between Mandelbrot and Simon, beautifully summarized in one of Peter Dodd’s lectures:
What is ongoing for AT, and already happened for other theories such as Integrated Information Theory — recently marked as pseudoscience in a preprint by a group of domain scientists — calls for our attention.
On the one hand, as scientists we cannot be distracted by personal feelings and opinions about the author(s) of a paper: we should keep calm and do our job in the best possible way. Personal attacks are not part of the game.
On the other hand, as scientists we must implement the best scientific practices — such as recognizing existing work, avoid sensationalism, etc — and must be open to constructive criticism.
Our primary aim should be to work together to enhance human understanding of the nature through collaborative efforts. However, this fundamental objective is hindered by the current incentives that encourage publishing extensively in prestigious journals (for personal reasons, to attract more funding, so forth and so on). This circumstance fosters actions resembling those in a competitive market, and it's essential to strive to prevent the commercialization of scientific knowledge.
AT, as any other new theory out there, will have to demonstrate how useful it is and in which domain. It cannot prescind to account for some controlled experiments.
Despite the media marketing on this matter, it is not a theory of everything and it cannot be because of its focused design: establish to which extent a product (like a complex molecule) is the result of evolutionary forces.
Hi, Prof. De Domenico
You might also find this work interesting (https://www.researchsquare.com/article/rs-3440555/v2), which is motivated by Francois Jacob's concept of tinkering and similar ideas, and falls within the broader category of Algorithmic Information Theory.
More importantly, it can characterize the hierarchical and nested relationships among repetitive substructures.
This reminds me of a paper Carl Woese worked on with Nigel Goldenfeld in 2008: https://arxiv.org/pdf/0712.3332.pdf