“Model-free“ analysis of a complex system

Or: when you have a model and you don't know it

Nov 22, 2024

Imagine you’re trying to understand how different parts of the brain communicate based on signals recorded over time. Or perhaps you’re studying how genes interact within a cell by observing their activity levels. In both cases, you have multidimensional time series data from multiple sources (i.e., neurons or genes), and you want to figure out which ones are directly connected or influence each other.

This problem is about reconstructing a network from observed data, and it’s known as inverse problem, and it’s a formidable one. The scope of this post is to make evident how current approaches to the inverse problem are:

full of hidden assumptions, often remaining unspecified in the papers
sometimes renaming the procedure, to make it more sounding (than what’s in the following)
advertised as “model-free” or “data-driven”, like if they were able to overcome the fundamental issues that, in science, we have for any inverse problem.

Therefore, in the following, you will not find the solution to your inverse problem. Instead, you will be able to recognize if your solution aligns with what you expect.

The general problem

Often we start with arrays of data, like time series, which is like a table where each row usually represents a node (like a neuron or a gene), and each column represents a point in time (or some other dimension/feature, whatever you like). This gives us a matrix of data for N nodes and T time points, that in the following we denote as

\(\mathbf{D}\)

We can safely name this matrix “the observation”.

Our goal is to determine the network’s structure, represented by the (unknown) adjacency matrix

\( \mathcal{A}\)

which tells us how nodes are connected (if the entry (i,j) corresponding to the pair of nodes i and j is strictly positive there is a connection, if it is zero the two nodes are disconnected). For simplicity, let’s focus only on binary matrices, representing unweighted networks.

Since the observation has been generated by some dynamical process, we have also another object that is unknown: the mechanisms. Often, the mechanisms can be represented as equations for the dynamics, or procedures in an agent-based model, or whatever set of rules that describe how the state of your system changes over time. I want to be general enough on this point, since we will not work with dynamics here. Let’s indicate this set of rules as

\(\mathcal{D}\)

Using the Bayes’ theorem, we can express the probability of a certain network structure and dynamics given the data:

\(P(\mathcal{A},\mathcal{D} \mid \mathbf{D}) = \frac{P(\mathbf{D} \mid \mathcal{A},\mathcal{D}) P(\mathcal{A},\mathcal{D})}{P(\mathbf{D})}\)

That is: the posterior (left-side) is proportional to the product of the likelihood (right-side, numerator, first term) and the prior (right-side, numerator, second term). For any inferential purpose we usually neglect the denominator since it does not depend on parameters and therefore does not alter the procedure to find a solution to our problem.

This is the most general statement that we can do about our inverse problem: given some observational data, our best aim is to infer the structural parameters (ie, the entries of the adjacency matrix) and the parameters of the dynamics (ie, the unknown coefficients in our set of differential equations, or the values to assign to our agent-based model, so forth and so on).

What is the first “issue”? Well, it seems that you need to actually specify a model for both the structure and the dynamics, otherwise you cannot calculate the likelihood term. In fact, I have on purpose neglected to mention that by design we can never know the exact process that has generated the data: we can only make hypotheses (ie, models for the structure and the dynamics) and evaluate to which extent the data supports them.

Your model can depend, or not, on parameters. Usually it depends on a lot of parameters… For instance, in the case of a very simplistic assumption about the structure, like the number of edges, you have one parameter (the probability of wiring two nodes). If you assume that your nodes have a given degree sequence, then you have as many parameters as the number of nodes, and you are in fact working in the framework of a configurational model. We can go ahead for long time, just by adding hypotheses. A typical model which is very powerful is the stochastic block model (and its variants accounting for important details): it comes with other parameters and you might want to visit the blog of my friend and colleague Tiago Peixoto to know more about this. Let’s just agree that, in general, both the structure and dynamics depend on a set of parameters and that the goal of our inferential process is to find those parameters by looking in the space of their possible values and picking the set that maximizes the posterior. It will be our best possibility to describe the observational data given the hypothesis about structure and dynamics.

The above one is not really an issue, it’s how science should proceed: by observing phenomena, collecting data, making hypotheses and validating them. In fact, the above framework is the mathematical counterpart of the scientific method: if you have your hypotheses depending on some parameters, and I have my hypotheses depending on other parameters, we both employ the above procedure and we compare our posteriors to understand who is the best one at explaining the data.

Instead, there are other problems: the space of parameters is usually too large, therefore we need some simplifying hypotheses to go ahead. And here is where our story starts: on the one hand, this is an active research field; on the other hand, I want to discuss about what the widespread habits and how to understand what a lot of papers show.

Assumption 1: focusing on the network structure only

The very first assumption is that the dynamics can be decoupled from the structure, i.e., structure is independent from the dynamics:

\(P(\mathcal{A},\mathcal{D})=P(\mathcal{A})P(\mathcal{D})\)

This assumption makes the problem more manageable, since it allows to effectively map our inferential problem into two independent inferential problems, one for structure and one for dynamics. Since the dynamics is largely unknown and very complex, this assumption allows one to drop anything related to the dynamics and focus just on the structure.

Be careful, there is no free lunch. This corresponds to believe that the detailed dynamics of how nodes influence each other over time, like the exact equations governing neuron firing, are either known or not crucial for our current purpose.

The result is that our problem reduces to maximize just the posterior

\(P(\mathcal{A} \mid \mathbf{D}) = \frac{P(\mathbf{D} \mid \mathcal{A}) P(\mathcal{A})}{P(\mathbf{D})}\)

Assumption(s) 2: pairwise interactions and independence

Well, if we want to reconstruct an adjacency matrix, we are implicitly assuming that interactions are pairwise or, alternatively, we know that interactions might be high-order (2-body, 3-body, etc) but we pretend to focus only on 2-body interactions.

We need another assumption, however: “whether nodes i and j are connected doesn't directly affect whether other pairs are connected”. This is a strong assumption of independence, that allows us to break down the probabilities into products over all pairs of nodes:

\(P(\mathbf{D} \mid \mathcal{A}) = \prod_{i < j} P(\mathbf{D}_{ij} \mid \mathcal{A}_{ij})\)

This means the probability of observing the data is the product of the probabilities of observing each pair’s data, given whether they are connected. Accordingly, the prior reads

\(P(\mathcal{A}) = \prod_{i < j} P(\mathcal{A}_{ij})\)

and the updated belief (the other name for our posterior probability) becomes

\(P(\mathcal{A} \mid \mathbf{D}) = \frac{\prod\limits_{i < j} P(\mathbf{D}_{ij} \mid \mathcal{A}_{ij}) P(\mathcal{A}_{ij})}{P(\mathbf{D})}\)

Cool, isn’t it?

We have literally broken our original huge inverse problem into the product of several approximated small problems, at the price of significant assumptions.

Assumption 3: uniform prior and correlation-based likelihood

Finally, we need to further simplify the problem by assuming that we have no specific prior knowledge, therefore it’s uniform:

\(P(\mathcal{A}_{ij}) = \text{constant}\)

Since the prior is constant, the posterior probability simplifies to:

\(P(\mathcal{A}_{ij} \mid \mathbf{D}_{ij}) \propto P(\mathbf{D}_{ij} \mid \mathcal{A}_{ij})\)

And we are close to the point: we are left just with a product of likelihoods and, without a specific justification for this choice, we assume that a measure of correlation (or a function of it, which by the way does not come from a generative model) between the data corresponding to nodes i and j is a valid proxy to indicate whether they are connected or not. Therefore, we set:

\(P(\mathcal{A}_{ij} \mid \mathbf{D}_{ij}) \propto \text{Correlation}(\mathbf{D}_i, \mathbf{D}_j)\)

This means that the likelihood of a connection between nodes i and j increases with the correlation of their signals.

Note that I didn’t specify which measure of correlation: you can use your favorite one (cross-correlation, spectral coherence, granger causality, transfer entropy, mutual information, convergent cross mapping, … there are literally hundreds). You might think that we are done: we have mapped our original, hard problem, into a set of n small problems easy to solve and we end up with a a network.

The problem is that in no way the result can be interpreted as a proxy for the original network, since there are too many simplifying assumptions and since, overall, the very last step is hard to support. Still, using whatever correlation measure you can think of, the literature is full of papers adopting this procedure (often without making it explicit as we did here).

One could argue that the goal of correlation networks is not to reconstruct the structural network, therefore the above discussion is correct but the aim is different.

“If we assume that correlation networks have non-structural meaning then we have no problems!” — One who argues

Well, I partially agree on the above sentence, but to some extent if we all agree that correlation networks are not structural and have nothing to say about the underlying structure, it would be fine. You might be interested in reading what could mean to work with correlation patterns here:

Spurious patterns and where to find them

Manlio De Domenico

July 23, 2024

Read full story

But if those networks are not structural, they are.. functional? And here we go with the thousands of papers naming functional connectivity what, at the end of the day is a covariance matrix. There are studies trying to understand which structural features can be inferred from a correlation matrix, but they are not enough.

One of the most worrying things is that the studies based on correlation networks assume that they can analyze those graphs as if they were standard graphs. If, at best, we are estimating the correlation (or a function of it that gives back to us some estimate of probability to observe that specific correlation), how is it possible that your node i in the correlation network has degree 3? Or degree 26?

Literally, those numbers have no meaning, since the links attached to that node are statistical correlations and therefore have a probability to exist or not. In such networks, at best we can only claim that a node has a whole distribution of possible degrees, with mean 3 and dispersion.. whatever.

In many practical problems, the structure of a network cannot be directly observed, while signals (e.g., the time course of physical variables) can be measured from its units. After applying a valid reconstruction method as the ones described in Sec. (II), the results can be summarized in a convenient way via a marginal probability of the existence for each pairwise link, at the expense of some loss of information present in the full joint distribution. Note that the set of those probabilities does not represent a weighted or directed network, and it would be wrong to perform any standard network analysis on the corresponding graph. In fact, in this context, single measures at the level of a single node or the whole network are replaced by probability distributions, encoding how likely it is to measure a specific value. For instance, in a network of size N, instead of the degree *k_i* of a single node i one estimate the probability P(*k_i* = k) that that node has degree k, with k = 0,1,...,N. Similarly, the transitivity *c_i* of node i is replaced by the probability P(*c_i* = c) that that node has transitivity c, which might correspond to multiple local configurations rather than a specific one, as illustratively shown in the right-hand side of the figure. Such probability distributions might be obtained within the framework of Bayesian inference used to build the probabilistic network model. Figure from here.

If a single degree for a single node makes no sense, imagine what could be the interpretation of centrality measures or, even worst, of modularity maximization. Modularity comes with some well-known problems even in the case of standard networks: in the case of correlation networks the outcomes are, literally, unexplainable.

There is no ultimate solution to this inverse problem, but there is also no free lunch: before using methods that are easy to import in python or R or matlab, we should really understand what are the basic assumptions behind, and we should make an effort to avoid overselling our results. If you are interested in reading a recent perspective on this matter, you can check one of our latest papers, or can visit Tiago’s blog, especially this post where, in the case of community detection, he shows how every method is inferential when the model is bad enough.

Complexity Thoughts

Spurious patterns and where to find them

Discussion about this post