Complexity Thoughts: Issue #55

Unraveling complexity: building knowledge, one paper at a time

Feb 14, 2025

If you find Complexity Thoughts interesting, follow us! Click on the Like button, leave a comment, repost on Substack or share this post. It is the only feedback I can have for this free service. The frequency and quality of this newsletter relies on social interactions. Thank you!

→ Don’t miss the podcast version of this post: click on “Spotify/Apple Podcast” above!

From the Lab

Coarse-graining network flow through statistical physics and machine learning

A big question in network science is whether and how we can coarse-grain a network while preserving its macroscopic information flow. Traditional renormalization methods, like the Geometric (GRG) and Laplacian Renormalization (LRG) Group, focus on some measure of structural invariance. In one case one exploits the latent geometry, in the other case one filters out fast diffusion modes, retaining only the slowest ones. But what if we want to preserve the entire flow dynamics across all scales, without enforcing structural constraints? This is exactly what we tackled in our latest work, led by two brilliant PhD students from my Lab, using a combination of statistical physics and machine learning (ML).

At the heart of our approach is the density matrix framework, which describes the ensemble of diffusion or more general nonlinear processes happening on a network. By framing the problem in terms of a partition function, we gain access to macroscopic descriptors like network entropy and free energy, key indicators of how diverse and fast information propagates. I wrote a dedicated post to this a while ago:

Less is more, more is different and sparse is better: but why?

Manlio De Domenico

February 2, 2024

Read full story

Our goal?

→ Find an optimal coarse-graining that minimizes deviations in the partition function.

Since solving this analytically is intractable, we turned to ML to learn the best compression rules. Our Network Flow Compression (NFC) model finds functionally redundant nodes and groups them, without the need for predefined structural rules.

This is where NFC and LRG diverge. While LRG follows a classical renormalization philosophy, discarding fast modes and focusing on scaling properties, NFC does not attempt to uncover self-similarity or structural invariance. Instead, it ensures that functional roles are preserved, meaning nodes with similar contributions to information flow may be merged even if they aren’t directly connected. This makes NFC a fundamentally different tool: rather than a renormalization method, it is a functional coarse-graining approach, providing a new multiscale perspective on network dynamics.

Our results show that NFC outperforms structural coarse-graining methods in preserving information flow across both synthetic and real-world networks, because it is clearly designed to do so. This opens up exciting applications in neuroscience (brain networks), epidemiology (disease spread) and many other domains. While NFC sacrifices some interpretability compared to traditional RG techniques, the unexpectedly strong performance makes it an exciting tool for studying complex systems at scale. This is just the beginning: I can't wait to see how far this framework can go!

Information dynamics plays a crucial role in complex systems, from cells to societies. Recent advances in statistical physics have made it possible to capture key network properties, such as flow diversity and signal speed, using entropy and free energy. However, large system sizes pose computational challenges. We use graph neural networks to identify suitable groups of components for coarse-graining a network and achieve a low computational complexity, suitable for practical application. Our approach preserves information flow even under significant compression, as shown through theoretical analysis and experiments on synthetic and empirical networks. We find that the model merges nodes with similar structural properties, suggesting they perform redundant roles in information transmission. This method enables low-complexity compression for extremely large networks, offering a multiscale perspective that preserves information flow in biological, social, and technological networks better than existing methods mostly focused on network structure.

Evolution

Reconstructing Early Microbial Life

For more than 3.5 billion years, life experienced dramatic environmental extremes on Earth. These include shifts from oxygen-less to overoxygenated atmospheres and cycling between hothouse conditions and global glaciations. Meanwhile, an ecological revolution took place. Earth evolved from one dominated by microbial life to one containing the plants and animals that are most familiar today. Many key cellular features evolved early in the history of life, collectively defining the nature of our biosphere and underpinning human survival. Recent advances in molecular biology and bioinformatics have greatly improved our understanding of microbial evolution across deep time. However, the incorporation of molecular genetics, population biology, and evolutionary biology approaches into the study of Precambrian biota remains a significant challenge. This review synthesizes our current knowledge of early microbial life with an emphasis on ancient metabolisms. It also outlines the foundations of an emerging interdisciplinary area that integrates microbiology, paleobiology, and evolutionary synthetic biology to reconstruct ancient biological innovations.

The global loss of avian functional and phylogenetic diversity from anthropogenic extinctions

“Human activities are a leading cause of species extinctions, either directly or indirectly, for millennia. Matthews et al. investigated how extinctions have affected global bird diversity, specifically in terms of birds’ traits and evolutionary history (see the Perspective by Kemp). About 5% of known bird species have gone extinct over the past 130,000 years, and these species are more distinct in terms of their traits and lineages then would be expected by chance, especially those that went extinct before 1500 CE. Species, functional, and phylogenetic diversity losses are greatest on islands. Projected future extinctions are predicted to cause even more severe effects on avian functional and phylogenetic diversity, emphasizing a need for conservation efforts, especially on islands.” —Bianca Lopez

Humans have been driving a global erosion of species richness for millennia, but the consequences of past extinctions for other dimensions of biodiversity—functional and phylogenetic diversity—are poorly understood. In this work, we show that, since the Late Pleistocene, the extinction of 610 bird species has caused a disproportionate loss of the global avian functional space along with ~3 billion years of unique evolutionary history. For island endemics, proportional losses have been even greater. Projected future extinctions of more than 1000 species over the next two centuries will incur further substantial reductions in functional and phylogenetic diversity. These results highlight the severe consequences of the ongoing biodiversity crisis and the urgent need to identify the ecological functions being lost through extinction.

Ecosystems

Incorporating Heterogeneous Interactions for Ecological Biodiversity

Understanding the behaviors of ecological systems is challenging given their multifaceted complexity. To proceed, theoretical models such as Lotka-Volterra dynamics with random interactions have been investigated by the dynamical mean-field theory to provide insights into underlying principles such as how biodiversity and stability depend on the randomness in interaction strength. Yet the fully connected structures assumed in these previous studies are not realistic, as revealed by a vast amount of empirical data. We derive a generic formula for the abundance distribution under an arbitrary distribution of degree, the number of interacting neighbors, which leads to degree-dependent abundance patterns of species. Notably, in contrast to the fully interacting systems, the number of surviving species can be reduced as the community becomes cooperative in heterogeneous interaction structures. Our study, therefore, demonstrates that properly taking into account heterogeneity in the interspecific interaction structure is indispensable to understanding the diversity in large ecosystems, and our general theoretical framework can apply to a much wider range of interacting many-body systems.

Biological Systems

Why is this relevant? Because the immune system is one of the most complex and fascinating systems, together with the human brain and the Italian bureaucracy.

“Mapping the amino acid sequence of a particular T cell receptor (TCR) to its antigen specificity is a holy grail of systems immunology”

A key challenge in molecular biology is to decipher the mapping of protein sequence to function. To perform this mapping requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservation among antigen-specific receptors relative to null expectations. We find that TCR specificity synergistically depends on the hypervariable regions of both receptor chains, with a degree of synergy that strongly depends on the ligand. Using a coincidence-based approach to measuring information enables us to directly bound the accuracy with which TCR specificity can be predicted from partial matches to reference sequences. We anticipate that our statistical framework will be of use for developing machine learning models for TCR specificity prediction and for optimizing TCRs for cell therapies. The proposed coincidence-based information measures might find further applications in bounding the performance of pairwise classifiers in other fields.

Neuroscience

Geometric Scaling Law in Real Neuronal Networks

We investigate the synapse-resolution connectomes of fruit flies across different developmental stages, revealing a consistent scaling law in neuronal connection probability relative to spatial distance. This power-law behavior significantly differs from the exponential distance rule previously observed in coarse-grained brain networks. We demonstrate that the geometric scaling law carries functional significance, aligning with the maximum entropy of information communication and the functional criticality balancing integration and segregation. Perturbing either the empirical probability model’s parameters or its type results in the loss of these advantageous properties. Furthermore, we derive an explicit quantitative predictor for neuronal connectivity, incorporating only interneuronal distance and neurons’ in and out degrees. Our findings establish a direct link between brain geometry and topology, shedding lights on the understanding of how the brain operates optimally within its confined space.

A Drosophila computational brain model reveals sensorimotor processing

The recent assembly of the adult Drosophila melanogaster central brain connectome, containing more than 125,000 neurons and 50 million synaptic connections, provides a template for examining sensory processing throughout the brain¹^,². Here we create a leaky integrate-and-fire computational model of the entire Drosophila brain, on the basis of neural connectivity and neurotransmitter identity³, to study circuit properties of feeding and grooming behaviours. We show that activation of sugar-sensing or water-sensing gustatory neurons in the computational model accurately predicts neurons that respond to tastes and are required for feeding initiation⁴. In addition, using the model to activate neurons in the feeding region of the Drosophila brain predicts those that elicit motor neuron firing⁵—a testable hypothesis that we validate by optogenetic activation and behavioural studies. Activating different classes of gustatory neurons in the model makes accurate predictions of how several taste modalities interact, providing circuit-level insight into aversive and appetitive taste processing. Additionally, we applied this model to mechanosensory circuits and found that computational activation of mechanosensory neurons predicts activation of a small set of neurons comprising the antennal grooming circuit, and accurately describes the circuit response upon activation of different mechanosensory subtypes⁶^,⁷^,⁸^,⁹^,¹⁰. Our results demonstrate that modelling brain circuits using only synapse-level connectivity and predicted neurotransmitter identity generates experimentally testable hypotheses and can describe complete sensorimotor transformations.

Mapping the structure-function relationship along macroscale gradients in the human brain

Functional coactivation between human brain regions is partly explained by white matter connections; however, how the structure-function relationship varies by function remains unclear. Here, we reference large data repositories to compute maps of structure-function correspondence across hundreds of specific functions and brain regions. We use natural language processing to accurately predict structure-function correspondence for specific functions and to identify macroscale gradients across the brain that correlate with structure-function correspondence as well as cortical thickness. Our findings suggest structure-function correspondence unfolds along a sensory-fugal organizational axis, with higher correspondence in primary sensory and motor cortex for perceptual and motor functions, and lower correspondence in association cortex for cognitive functions. Our study bridges neuroscience and natural language to describe how structure-function coupling varies by region and function in the brain, offering insight into the diversity and evolution of neural network properties.

Frequency-Dependent Covariance Reveals Critical Spatiotemporal Patterns of Synchronized Activity in the Human Brain

This paper builds on the recent work of Hu and Sampolinski, which might be useful to read before, to have a better context.

“A key question in theoretical neuroscience is the relation between the connectivity structure and the collective dynamics of a network of neurons”

Recent analyses, leveraging advanced theoretical techniques and high-quality data from thousands of simultaneously recorded neurons across regions in the brain, compellingly support the hypothesis that neural dynamics operate near the edge of instability. However, these and related analyses often fail to capture the intricate temporal structure of brain activity, as they primarily rely on time-integrated measurements across neurons. Here, we present a novel framework designed to explore signatures of criticality across diverse frequency bands and construct a much more comprehensive description of brain activity. Furthermore, we introduce a method for projecting brain activity onto a basis of spatiotemporal patterns, facilitating time-dependent dimensionality reduction. Applying this framework to a magnetoencephalography dataset, we observe significant differences in criticality signatures, effective dimensionality, and spatiotemporal activity patterns between healthy subjects and individuals with Parkinson’s disease, highlighting its potential impact.

Human behavior

Misinformation exploits outrage to spread online

We tested a hypothesis that misinformation exploits outrage to spread online, examining generalizability across multiple platforms, time periods, and classifications of misinformation. Outrage is highly engaging and need not be accurate to achieve its communicative goals, making it an attractive signal to embed in misinformation. In eight studies that used US data from Facebook (1,063,298 links) and Twitter (44,529 tweets, 24,007 users) and two behavioral experiments (1475 participants), we show that (i) misinformation sources evoke more outrage than do trustworthy sources; (ii) outrage facilitates the sharing of misinformation at least as strongly as sharing of trustworthy news; and (iii) users are more willing to share outrage-evoking misinformation without reading it first. Consequently, outrage-evoking misinformation may be difficult to mitigate with interventions that assume users want to share accurate information.

The rise and transformation of Bronze Age pastoralists in the Caucasus

The Caucasus and surrounding areas, with their rich metal resources, became a crucible of the Bronze Age¹ and the birthplace of the earliest steppe pastoralist societies². Yet, despite this region having a large influence on the subsequent development of Europe and Asia, questions remain regarding its hunter-gatherer past and its formation of expansionist mobile steppe societies³^,⁴^,⁵. Here we present new genome-wide data for 131 individuals from 38 archaeological sites spanning 6,000 years. We find a strong genetic differentiation between populations north and south of the Caucasus mountains during the Mesolithic, with Eastern hunter-gatherer ancestry⁴^,⁶ in the north, and a distinct Caucasus hunter-gatherer ancestry⁷ with increasing East Anatolian farmer admixture in the south. During the subsequent Eneolithic period, we observe the formation of the characteristic West Eurasian steppe ancestry and heightened interaction between the mountain and steppe regions, facilitated by technological developments of the Maykop cultural complex⁸. By contrast, the peak of pastoralist activities and territorial expansions during the Early and Middle Bronze Age is characterized by long-term genetic stability. The Late Bronze Age marks another period of gene flow from multiple distinct sources that coincides with a decline of steppe cultures, followed by a transformation and absorption of the steppe ancestry into highland populations.

Generalized contact matrices allow integrating socioeconomic variables into epidemic models

Variables related to socioeconomic status (SES), including income, ethnicity, and education, shape contact structures and affect the spread of infectious diseases. However, these factors are often overlooked in epidemic models, which typically stratify social contacts by age and interaction contexts. Here, we introduce and study generalized contact matrices that stratify contacts across multiple dimensions. We demonstrate a lower-bound theorem proving that disregarding additional dimensions, besides age and context, might lead to an underestimation of the basic reproductive number. By using SES variables in both synthetic and empirical data, we illustrate how generalized contact matrices enhance epidemic models, capturing variations in behaviors such as heterogeneous levels of adherence to nonpharmaceutical interventions among demographic groups. Moreover, we highlight the importance of integrating SES traits into epidemic models, as neglecting them might lead to substantial misrepresentation of epidemic outcomes and dynamics. Our research contributes to the efforts aiming at incorporating socioeconomic and other dimensions into epidemic modeling.

Machine learning mathematical models for incidence estimation during pandemics

Accurate estimates of the incidence of infectious diseases are key for the control of epidemics. However, healthcare systems are often unable to test the population exhaustively, especially when asymptomatic and paucisymptomatic cases are widespread; this leads to significant and systematic under-reporting of the real incidence. Here, we propose a machine learning approach to estimate the incidence of a pandemic in real-time, using reported cases and the overall test rate. In particular, we use Bayesian symbolic regression to automatically learn the closed-form mathematical models that most parsimoniously describe incidence. We develop and validate our models using COVID-19 incidence values for nine different countries, confirming their ability to accurately predict daily incidence. Remarkably, despite the differences in epidemic trajectories and dynamics across countries, we find that a single model for all countries offers a more parsimonious description and is more predictive of actual incidence compared to separate models for each country. Our results show the potential to accurately model incidence in real-time using closed-form mathematical models, providing a valuable tool for public health decision-makers.

Complexity Thoughts