Exploring evolution (in life) through information
Which pathway to the emergence of biological complexity?
If you find Complexity Thoughts interesting, follow us! Click on the Like button, leave a comment, repost on Substack or share this post. It is the only feedback I can have for this free service. The frequency and quality of this newsletter relies on social interactions. Thank you!
Dall-e 3 representation of this issue’s content
This exploration was triggered by an interesting collaboration (I will not spoil details, for the moment) and it is part of series of papers I have recently read about evolutionary dynamics. Hopefully, I will be soon able to share more information about why I did so.
In the meanwhile, I am happy to share some of the ideas that I have found really interesting on this topic, without pretending (or even planning) to write a comprehensive introduction to it. It’s just too vast and I find it very difficult even to outline some map of all the theories out there.
The major evolutionary transitions
This is also the title of an influential paper by Szathmáry and Maynard Smith in 1995.
Since it can’t be summarized better than its abstract, here we go:
There is no theoretical reason to expect evolutionary lineages to increase in complexity with time, and no empirical evidence that they do so. Nevertheless, eukaryotic cells are more complex than prokaryotic ones, animals and plants are more complex than protists, and so on. This increase in complexity may have been achieved as a result of a series of major evolutionary transitions. These involved changes in the way information is stored and transmitted.
Nowadays the idea that information plays a crucial role in biological systems is often reported as revolutionary and modern, while in fact nearly 30 years ago Szathmáry and Maynard Smith were writing:
A central idea in contemporary biology is that of information. Developmental biology can be seen as the study of how information in the genome is translated into adult structure, and evolutionary biology of how the information came to be there in the first place. Our excuse for writing an article concerning topics as diverse as the origins of genes, of cells and of language is that all are concerned with the storage and transmission of information. The article is more an agenda for future research than a summary of what is known. But there is sufficient formal similarity between the various transitions to hold out the hope that progress in understanding any one of them will help to illuminate others.
Again Maynard Smith, in a paper published in 2000, wrote:
In biology, the use of informational terms implies intentionality, in that both the form of the signal, and the response to it, have evolved by selection. Where an engineer sees design, a biologist sees natural selection
thus establishing the fundamental role of information in evolution but not in the way that it is understood by an engineer. This reminds me about the famous paper by Francois Jacob where he argues that evolution acts like a tinkerer, assembling what’s available at a given time, instead of acting like an engineer that drives design by specific goals.
The 1995 paper by Szathmáry and Maynard Smith makes an outstanding job in identifying and describing the major evolutionary transitions resulting from this tinkering. To summarize them in just one concept (see papers below), we will be referring to “Megatrajectories” from now on.
Megatrajectories represent large-scale and long-term evolutionary trends encompassing significant changes across various levels of biological organization, from genetic and phenotypic alterations to ecosystem and macroevolutionary shifts. All together, they provide a framework for mapping the processes that are thought to guide the evolution of life on Earth.
They correspond to major evolutionary trends that are observed across extensive temporal and spatial scales: not just isolated events or changes, but patterns that are discernible across different lineages and geological timescales.
One of such patterns is related to the increase in biological complexity, from single-celled organisms to complex multicellular life forms.
[Ed → I know: we could argue about how to measure such complexity, but for now let’s agree on this statement]
Another general pattern is diversification, with life showing a quantifiable trend toward increasing diversity. Diversification is not always a gradual process but can occur in fits and starts (this is the theory of punctuated equilibrium proposed by Eldredge and Gould, according to which species remain relatively stable for long periods (stasis), interspersed with brief, intense periods of rapid evolution and speciation following environmental changes or mass extinctions).
Figure from here.
Another distinctive pattern is coevolution, a process where two or more species influence each other's evolution. Over long periods, these interactions can lead to significant evolutionary changes across ecosystems or even the entire biosphere (more about this later).
OK, but what drives such megatrajectories? Without pretending to be exhaustive, we can surely consider the following “forces”:
Natural selection: acting over small time scales, it can be relevant across vast timescales on variations within populations and leading to the emergence of traits that enhance survival and reproduction. Organisms that better adapted to their environment tend to survive and produce more offspring. Natural selection can also drive directionality, i.e., favor the traits that enhance survival and reproduction in changing environments. Over millions of years, this can lead to major evolutionary shifts, such as the transition from aquatic to terrestrial life forms: a change that can trigger a rapid increase in species diversity (adaptive radiations).
Genetic drift: random changes in allele frequencies that can lead to significant evolutionary changes over long periods (especially in small populations). While it is considered a microevolutionary process, over long timescales, genetic drift can contribute to macroevolutionary trends as well by means of the founder effect (the genetic traits of the first colonizers of a new habitat that disproportionately influence future populations) and population bottlenecks.
There is wide evidence about the existence of such megatrajectoeries, from the fossil record to comparative genomics (for some introductory material, see this website).
At this point, there is enough material to go ahead with the following paper.
Issues of directionality in the history of life can be framed in terms of six major evolutionary steps, or megatrajectories (cf. Maynard Smith and Szathmáry 1995): (1) evolution from the origin of life to the last common ancestor of extant organisms, (2) the metabolic diversification of bacteria and archaea, (3) evolution of eukaryotic cells, (4) multicellularity, (5) the invasion of the land and (6) technological intelligence. Within each megatrajectory, overall diversification conforms to a pattern of increasing variance bounded by a right wall as well as one on the left. However, the expanding envelope of forms and physiologies also reflects—at least in part—directional evolution within clades. Each megatrajectory has introduced fundamentally new evolutionary entities that garner resources in new ways, resulting in an unambiguously directional pattern of increasing ecological complexity marked by expanding ecospace utilization. The sequential addition of megatrajectories adheres to logical rules of ecosystem function, providing a blueprint for evolution that may have been followed to varying degrees wherever life has arisen.
Hierarchies
The Linnaean hierarchy is a biological classification system established by Carl Linnaeus that organizes living organisms into a nested hierarchy based on shared physical and genetic characteristics. This system classifies life forms using a series of ranked categories, from broad to specific, including kingdom, phylum, class, order, family, genus, and species.
In contrast, the Vernadskyan hierarchy, influenced by the ideas of Vladimir Vernadsky (who formalized the “biosphere” concept introduced by Eduard Suess in 1875), focuses on the biosphere and emphasizes the interplay between living organisms and their geochemical environment. Following the theory, energy and matter flow within the biosphere influencing life on Earth's geological and chemical processes.
Therefore, while the Linnaean hierarchy categorizes organisms based on their evolutionary relationships and morphological similarities, the Vernadskyan hierarchy addresses the ecological and geochemical roles organisms play within the Earth's system, thus providing a broader ecological and environmental context.
In the paper below, the authors propose the Bretskyan hierarchy: an eco-genealogical framework integrating both the genealogical aspects of the Linnaean hierarchy and the ecological-economic aspects of the Vernadskyan hierarchy. Unlike the Linnaean hierarchy, which classifies organisms based on their evolutionary relationships and morphological characteristics, and the Vernadskyan hierarchy, which focuses on the ecological interactions within the biosphere, the Bretskyan hierarchy considers communities of organisms that are defined by both their genetic relationships and their ecological interactions.
Roughly speaking, the Bretskyan hierarchy looks at groups of interacting organisms that are connected through their genetic lineage (like in the Linnaean system) and their ecological roles (similar to the Vernadskyan approach). These communities are polyphyletic, ie, they consist of species with different ancestral lines, but are integrated into functional units called holobionts at lower levels. At larger scales, these communities form geobiomes, large ecosystems shaped by and interacting with the planet's geophysical processes.
[Ed → If you are wondering if all of this is somehow related to the Gaia hypothesis and symbiogenesis: yes, it is]
My understanding is that the main difference between the Bretskyan hierarchy and the other two lies in its hybrid nature: it doesn't just categorize organisms or examine their ecological functions but blends these perspectives to look at how groups of organisms are interconnected across different scales and how they evolve together within their specific environments.
The Bretskyan hierarchy, multiscale allopatry, and geobiomes—on the nature of evolutionary things
The process of evolution and the structures it produces are best understood in the light of hierarchy theory. The biota traditionally is described by either the genealogical Linnaean hierarchy or economic hierarchies of communities or ecosystems. Here we describe the Bretskyan hierarchy—a hybrid eco-genealogical hierarchy that consists of nested sets of different-sized, usually polyphyletic communities of interacting individuals separated from other such communities in space and time at multiple scales. The Bretskyan hierarchy consists of elements that have both genealogical and economic properties and functions—situated between, and connecting the elements of, the economic hierarchies (Vernadskyan) and the genealogical (Linnaean) hierarchy. The described hierarchy at lower tiers is populated by holobionts, individuals composed of multiple polyphyletic lineages integrated by functional interactions or biotically fabricated structures, such as membranes. At larger spatial tiers and longer time scales, the members of the Bretskyan hierarchy are of a more diffuse nature, partially due to the small size and relatively short duration of us as observers of larger and longer-lasting structures, here described as geobiomes. Their individuality is externally forced and directly tied to the spatial and temporal physical structures of our planet. These are sub-bioprovinces and bioprovinces—large and effectively isolated spatiotemporal structures of biota integrated internally by coevolution and individuated externally by a hierarchy of barriers. Gaia is here understood as the largest eco-genealogical individual compartmentalized by the outer space of the Earth and integrated at long time scales by biotic interactions and plate tectonic mixing of biota. The existence of a hierarchy of barriers and multilevel allopatry suggests that geographic isolation takes part not only in individuating species lineages, but also in producing coherent complexes of separate lineages forming bioprovinces at multiple space and time scales. The sizes, configurations, and durations of Bretskyan units are directly tied to geodynamics, demonstrating the central role of the physical planet in the processes of individuation and merging of geobiomes and the control of coevolution, and all its ramifications, at multiple space and time scales. The Bretskyan hierarchy also allows the integration of previously unconnected themes—“egalitarian” major transitions in individuality (e.g., eukaryogenesis) and some of the megatrajectories in the history of life—into a single theoretical framework of spatial and temporal scaling of eco-genealogy. The pervasive scaling of geodynamical processes and the direct connection of geodynamics to the dynamics of Bretskyan units allows us to formulate conjectures on the scales and limits of spatial and temporal contingency and competitiveness of biotas in evolution.
Back to the role of information
We have started this short journey from the pioneering paper by Szathmáry and Maynard Smith, which has given to information a prominent role in biology, especially to understand evolutionary transitions.
To this aim, the natural question is if is it possible to relate in formal terms the concept of information to major transitions or at least to some basic principle in evolutionary biology. The following paper is a first step in this direction:
Natural selection maximizes Fisher information
In biology, information flows from the environment to the genome by the process of natural selection. However, it has not been clear precisely what sort of information metric properly describes natural selection. Here, I show that Fisher information arises as the intrinsic metric of natural selection and evolutionary dynamics. Maximizing the amount of Fisher information about the environment captured by the population leads to Fisher's fundamental theorem of natural selection, the most profound statement about how natural selection influences evolutionary dynamics. I also show a relation between Fisher information and Shannon information (entropy) that may help to unify the correspondence between information and dynamics. Finally, I discuss possible connections between the fundamental role of Fisher information in statistics, biology and other fields of science.
Fisher information I can quantify how much information a trait (or set of traits) provides about the environment, influencing the organism's fitness. Mathematically, for a trait X that depends on an environmental parameter \theta (Ed → inline latex is difficult in Substack), we can define Fisher information as
where p is the probability of trait X given the environmental parameter.
In the paper, Frank argues that natural selection tends to increase Fisher information in populations, because populations that encode more information about their environment can adapt more effectively, thus leading to higher fitness levels.
Can this be related to Fisher's theorem of natural selection? That theorem was formulated by Fisher in his Genetical Theory of Natural Selection and states that:
The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time
Frank shows that the process of maximizing Fisher information (instead of the more popular Shannon information) through natural selection can lead to an increase in the genetic variance, as traits that are more informative about the environment are also selected for.
I think that this is really interesting, if related to the famous Cramér-Rao inequality that provides a lower bound on the variance of an unbiased estimator exactly as the inverse of the Fisher information. Roughly speaking, higher Fisher information indicates more precise or reliable estimates.
Therefore, in the context of evolutionary biology the same concept could be applied, at least metaphorically: the traits of organisms can be thought of as "estimators" of the environmental conditions or parameters they adapt to. Higher Fisher information would suggest that the organism's traits provide more precise information about the environment, enabling more effective adaptation and, consequently, a potential increase in fitness. Thus, from this perspective, natural selection might be seen as favoring organisms that achieve lower bounds on the "estimation error" of environmental parameters, optimizing their adaptive traits. Fascinating, isn’t it?
OK, the above one is a potential explanation for the how: to increase their fitness, we might agree that organisms want to maximize the Fisher information they have about the environment. Or we can say that natural selection actually “selects” the organisms that maximize Fisher information, if we want to see natura selection as a kind of invisible hand and avoid causal claims.
The natural question is then how an organism can maximize such information? After all, we can reasonably assume that the “entities” involved in the underlying evolutionary processes have no idea of how to calculate Fisher information, roughly speaking.
A potential answer is provided by another work, suggesting that fitness should increase at criticality:
However, why is a living system fitter when it is critical? Living systems need to perceive and respond to environmental cues and to interact with other similar entities. Indeed, biological systems constantly try to encapsulate the essential features of the huge variety of detailed information from their surrounding complex and changing environment into manageable internal representations, and they use these as a basis for their actions and responses. The successful construction of these representations, which extract, summarize, and integrate relevant information (11), provides a crucial competitive advantage, which can eventually make the difference between survival and extinction.
Following the insights of a 2011 paper by Mastromatteo and Marsili, they interpret Fisher information a generalized susceptibility
[Ed → in statistical mechanics it is used to study the response to an applied field]
to measures the response of a biological system to parameter variations.
From physics, it is well known that susceptibility peaks at critical points, and therefore, thanks to the formal connection between this concept and Fisher information we have that, at criticality, Fisher information must peak.
The last step (which is speculative, but plausible and fascinating at this stage) is to connect this result to Frank’s paper and see that natural selection should operate at criticality. Again, any causal claim here is rather difficult.
However, we should go deeper into this connection and to this aim we need to explore, at least, the statistical physics of evolution. This is not impossible: in a future post we could inspect the development of a thermodynamic theory of evolution that starts from the 1970s, with a paper from Prigogine, Nicolis & Babloyantz.
Waiting for that post, I am looking for more papers exploring the link between evolution and information: if you have some foundational works to suggest, please comment them below or get in touch by email.
"There is no theoretical reason to expect evolutionary lineages to increase in complexity with time, and no empirical evidence that they do so."
Am baffled by this sentence : it means evolution is not science.
If there is no evidence and not even a theoretical expectation, then what do we have left ??
Very interesting post!
We recently proposed a new information-based narrative for the origin and evolution of life, from information processing (computation) to information storage (memory) and information transmission (communication), all possible with chemical automata. In principle we submitted this work to ALIFE 2024, but you can check it here:
https://arxiv.org/abs/2404.04374