Survival of the flattest

Probably most of you are familiar with the problem of overfitting in machine learning. This problem occurs when the model learns literally everything, including the noise as well as the signal in the training data, to an extent that it negatively affects its performance. It becomes way too specialized on the training data such that it will lose the ability to generalize. It is obvious that overfitting is a problem, but would you really care about overfitting if all the data you could ever get on your problem was available to you at the very beginning? The model already describes it perfectly anyway. Of course this is not a practical statement to make, but my point is, there is a reason to give up on making the model perfectly fit to the training data : the new data that will come later. You have to settle for the trade off being imperfect and robust rather than being perfect but fragile, if you want to generalize your model for the future.

This is an easy trade-off to observe. As an example, if you plan to stay in the same place forever, there is no harm in overfitting your skills for your environment. Say you plan to work at the same company for your entire life. It is completely fine to be perfect at what you are doing, with the cost of not learning anything else (keep in mind that you have limited time and energy to spend). If you change your job though, which always has a certain probability, there is a high chance that you will be quite useless at the very beginning. Do not forget the fact that you are also in competition with the other employees at your new company, which defines the cost of being perfect before. This immediately raises an optimization question : How much should you invest in more exploration with the cost of less exploitation, given the rate of change in your environment?

There is a nice demonstration of this problem in the world of RNA viruses, which cause diseases like AIDS, Ebola, Polio, Measles, Hepatitis C, Influenza, etc. One way hosts protect themselves from a virus is to develop antibodies to it. Antibodies lock onto the outer surface proteins (which are essential for entering the host cell for invasion) of a virus and prevent it from entering host cells. This creates a selection pressure on the virus, imposed by the hosts immune response. So the way a virus adapts to its environment is to change its outer surface proteins to evade the immune system [1]. But the immune system adapts too. When a new virus infects the host - new meaning that it appears different from other viruses that have already infected the host before - the host has no pre-existing immunity in the form of antibodies to it. In response to this new pathogen, the adaptive immune system produces a variety of new antibodies to find the one that will fit to the surface of this foreign virus, and keeps that in memory for the next invasion (that is also the reason why vaccination works). At the end, it is a race of "who adapts first", and if the virus changes fast enough, it can escape from immune recognition [2].

The reason why RNA viruses are more interesting than DNA viruses becomes more clear at this point. DNA-based life forms are known to support sophisticated mechanisms for error correction, whereas RNA viruses lack such molecular machinery, and their replication is rather inaccurate [2]. Although the way to adaption is through mutation, there is a higher chance for that mutation to do harm than good. Mutations are often deleterious, meaning that a change at a single nucleotide can easily make the new sequence unable to replicate at all. For a population at classical mutation–selection balance, one would expect natural selection to wipe out these unfit mutants, while increasing the frequency of the wild-type. Problem is, this equilibrium can not be maintained above a certain mutation rate, simply because the number of mutant offsprings generated in the population may exceed the number of offsprings identical to the wild-type [3]. When such an equilibrium is reached, selection acts differently. The population attains another mutation-selection balance with abundant assemblage of mutant genotypes and a rare wild-type. Such a heterogeneous group of species is called quasispecies.

The concept of quasispecies is interesting, because in this case the good old "survival of the fittest" is not working anymore. Fitness is measured by the replication rate, and if a strain can replicate faster, it will make use of the scarce resources more, which means others will get less and eventually decrease in frequency. Darwin's "survival of the fittest" is equivalent to "survival of the fastest", suggesting that faster replicators are able to outcompete the slower ones. In the case of quasispecies, where the mutation rate is very high, genotypes which are robust against deleterious mutations may be favoured instead, even at the cost of a slower replication. Fast replicating strains that produce low-fitness offsprings can be outcompeted by slow replicating strains that produce neutral offsprings [3]. This concept is called "survival of the flattest".

To make it more clear, imagine a fitness landscape similar to the one given in Figure 1. The highest peak (pink) belongs to the strain that replicates the fastest, i.e., the fit quasispecies. Any mutant that is close to the fit quasispecies' genotypic space has a low-fitness, which is the reason of this peak also being very pointy. There is also a broad plateau, lower than the pink peak, covering the genotypic space of the flat quasispecies.



Figure 1. Fitness landscape for fit and flat quasispecies.


Since the flat quasispecies are more robust to mutations, back mutations are possible. The reason is, having a reversion actually doesn't matter, because there is not a fitness gradient they move towards to. They just mutate back and forth on a flat plateau. From a network perspective, this indicates a region of genotypic space that is highly connected, demonstrated as the light green islands in Figure 2. On the other hand, it is unlikely for a very low-fitness strain (dark blue / purple coloured nodes) to mutate back to the fit quasispecies (light pink node). It is also unlikely for flat strain to mutate back to a low-fitness one, because most of the mutations will be neutral, and it will be diffusing through its highly connected flat quasispecies island.



Figure 2. The network demonstrating the fit and the flat quasispecies.


The mutation rate which the quasispecies take over, and the population enters into a drift phase, is called the error threshold. Beyond this mutation rate, no genomic information propagation is possible. Apparently, RNA viruses live right at this threshold - which is pretty cool :)

Quasispecies mean field model

I would like to mention both the analytical and the simulated way to calculate this error threshold. If you quickly want to see some pretty plots, skip this section and jump to the next one :)

For the analytical calculation, we will use the quasispecies mean field model [3], which is

\begin{align} \frac{\partial x_0}{\partial t} &= f_{x}^{(0)}x_0-f_{x}^{(0)}\mu x_0- x_0\mathbf{\Phi(x,y)},\\ \frac{\partial x_1}{\partial t} &= f_{x}^{(0)}\mu x_0+f_{x}^{(1)}x_1-x_1\mathbf{\Phi(x,y)},\\ \frac{\partial y}{\partial t} &= f_{y}y- y\mathbf{\Phi(x,y)}.\\ \end{align} Here, \(x_0\) is the fittest (master) strain, the light pink peak in Figure 1, with the self-replication rate \(f_{x}^{(0)}\). \(x_1\) represents the unfortunate mutants of the fit strain, which have very low fitness with a replication rate \(f_{x}^{(1)}\), and the flat quasispecies are represented by \(y\), which have a replication rate \(f_{y}\). Recall that a reversion from \(y\) to \(x_1\), or \(x_1\) to \(x_0\) was quite unlikely. That is why there is no transition between these populations in the equations above. Since \(f_{x}^{(1)}\) is the replication rate of the pool of unfit mutants, \( f_{x}^{(0)} \gg f_{x}^{(1)} \) will hold. Similarly, flat quasispecies are more fit then the unfit mutants, \( f_{y} > f_{x}^{(1)} \), but less fit then the master strain, \( f_{x}^{(0)} > f_{y} \). The term \(\mathbf{\Phi(x,y)}\ = f_{x}^{(0)}x_0+f_{x}^{(1)}x_1+f_{y}y\) just indicates the outflow to keep the population constant. Finally, \(\mu\) denotes the mutation rate.

Lets jump to the interesting part, the fixed point that corresponds to the survival of the flattest scenario, which is \(\{x_0,x_1,y\} = \{0,0,1\}\). Recall that this point is stable if all the eigenvalues of the Jacobi matrix is smaller than \(0\). Jacobi matrix for this system is given as,

\[ J(x_0,x_1,y)= \left[ {\begin{array}{ccc} f_{x}^{(0)}(1-\mu)-2f_{x}^{(0)}x_0-f_{y}y-f_{x}^{(1)}x_1 & -f_{x}^{(1)}x_0 &-f_{y}x_0\\ f_{x}^{(0)}\mu-f_{x}^{(0)}x_1 & f_{x}^{(1)}-f_{x}^{(0)}x_0-f_{y}y+2f_{x}^{(1)}x_1 & -f_{y}x_1\\ -f_{x}^{(0)}y & -f_{x}^{(1)}y &f_{y}-f_{x}^{(0)}x_0-f_{x}^{(1)}x_1-2f_{y}y \end{array} } \right]. \]


Since \(x_0=x_1=0\) and \(y=1\), this matrix becomes,

\[ J(0,0,1)= \left[ {\begin{array}{ccc} f_{x}^{(0)}(1-\mu)-f_{y} & 0 & 0\\ f_{x}^{(0)}\mu & f_{x}^{(1)}-f_{y} & 0\\ -f_{x}^{(0)} & -f_{x}^{(1)} &-f_{y} \end{array} } \right], \] which has the eigenvalues \(\lambda_1 = f_{x}^{(0)}(1-\mu)-f_{y}\), \(\lambda_2 = f_{x}^{(1)}-f_{y}\), and \(\lambda_3 = -f_{y}\). We already know \(\lambda_2 < 0\) and \(\lambda_3 < 0\) because of the way we defined the replication rates (unfit mutants replicate slower than flat quasispecies, and all replication rates are larger than \(0\)). So for stability, \(\lambda_1 = f_{x}^{(0)}(1-\mu)-f_{y} < 0\) has to be satisfied, which gives us the critical mutation rate as \begin{align} \mu_c=1-\frac{f_{y}}{f_{x}^{(0)}}. \end{align} This means that for any \(\mu > \mu_c\), the fixed point \(\{x_0,x_1,y\} = \{0,0,1\}\) is stable, and survival of the flattest will take place.

Bit string model

Bit string model [3] is used to simulate the evolution of strains in a very simplified way. Normally, an RNA molecule is composed of four different nucleotides, and mapping a genotypic change to its fitness value is a very difficult task. On the other hand, bit string model simplifies this whole process and replaces the "nucleotides" with bits, meaning that the strain will be composed of only \(1\)'s and \(0\)'s. Mutation is modeled by simply flipping the bits, and we get to define the fitness value for each strain (just as an example, you can assign a fitness value depending on the number of \(1\)'s in the binary strain).

Ley us denote a binary strain is with \(\mathbf{S}_{i} = \{S_{i}^{1}, S_{}^{2}, \cdots , S_{i}^{v} \}\) where \(v\) is the number of bits per strain. For the simulations, we first need to define a master (the fittest) strain, which is an arbitrary choice. Say the master strain is a vector of \(1\)'s only, such that \begin{align} \mathbf{S}_{0} = [1 1 1 1 1 1 1 1 1 ... 1]. \end{align} Recall that the unfit mutants and the master strain are very close to each other in terms of genotypic space, which is equivalent of having a small hamming distance in between in terms of the bit string model. We can define the unfit mutants such that any of them and the master string has a hamming distance of \(h\) or less in between, where \(h>0\). Any strain that has a hamming distance more than \(h\) will belong to flat quasispecies. The replication rate of the master strain, unfit mutants, and the flat quasispecies are denoted as \(f_{x}^{0}\), \(f_{x}^{1}\), and \(f_{y}\), respectively (same notation used in the mean field model). We also assume that the population size (the total number of strains), \(N\), is constant, meaning that we will replace a random strain with a copy of the strain we want to replicate. Since master strain is the fittest in the population, it will be replicated with probability \(1\), indicating \(\,f_{x}^{0}=1\). Mutation takes place during replication, where each bit is flipped with probability \(\mu_b\). This process is repeated for a finite number of generations. Pseudocode of this simulation is given below.

1 Start with an initial population of master strains
2 \(\mathbf{S}_{1:N}\) = [1,1,1,...,1]
3 for number of generations do
4 for number of strains do
5 Randomly pick a strain \(\mathbf{S}_i\) to replicate
6 Flip each bit of \(\mathbf{S}_i\) with probability \(\mu_b\)
7 Randomly pick a strain \(\mathbf{S}_{j \neq i}\)to replace
8 if \(\mathbf{S}_i\) is the master strain then
9 \(\mathbf{S}_j \leftarrow \mathbf{S}_i\) with probability 1
10 else if \(\mathbf{S}_i\) is an unfit mutant then
11 \(\mathbf{S}_j \leftarrow \mathbf{S}_i\) with probability \(f_{x}^{(1)}\)
12 else if \(\mathbf{S}_i\) is a member of flat quasispecies then
13 \(\mathbf{S}_j \leftarrow \mathbf{S}_i\) with probability \(f_{y}\)
14 end if
15 end for
16 end for

For the simulations, I used \(N=100\), \(v=16\), \(h=5\), \(f_{x}^{(1)} = 0.0025\), and \(f_{y} = 0.25\) to be consistent with the references in this post (and to check my results 😅). Simulations were performed for four different mutations rates, and the results are given in Figure 3. Legends indicate the hamming distance of the current strain to the master strain. You can see that for low \(\mu_b\), master strain is still high in frequency, dominating the population. As \(\mu_b\) increases, it begins to decrease in frequency. For \(\mu_b=0.09\), we see that the master strain and its unfit mutants get lost, and the quasispecies effect takes over.

Figure 3. Simulation of the bit string model for different \(\mu_b\).

Recall that the critical mutation rate derived from the quasispecies mean field model was \(\mu_c=1-f_{y}\,/\,f_{x}^{(0)}\). This means that if we run the bit string model for a range of mutation rates, we have to see a change in the strain frequencies around \(\mu_c = 1-0.25/1 = 0.75\). But this rate is for a strain to be copied at least with one error, whereas \(\mu_b\) is the mutation rate per bit. The probability that at least one bit will be copied with error is \(1-(1-\mu_b)^v\), meaning that the critical error rate per bit will be equal to \(1-(1-\mu_c)^{1/v} = 1-(1-0.75)^{1/16} = 0.0830\).

Strain frequencies for different values of \(\mu_b\) are given in Figure 4. Gray dashed line is indeed the critical mutation rate per bit (0.0830), calculated just above. Since \(h=5\), below the critical mutation rate, master strain and its mutants of hamming distance \(5\) are high in frequency, whereas beyond \(\mu_b=0.0830\), strains with more than hamming distance \(5\) begin to increase in frequency. You can clearly observe the first-order phase transition between selection for replication speed and selection for mutational robustness.
Figure 4. Strain frequencies for different values of \(\mu_b\)


The question that came to my mind while working on this is the following : what if there is also an uncertainty on the mutation rate? For all the models above, the mutation rate is constant through generations. Say the environment is fluctuating, meaning that the mutation rate itself has a probability distribution given time. What would be the threshold for robustness to be favoured more than the replication rate? I will probably try to figure this out, in the meantime, let me know if you find the answer :)

P.S. : Here is the link to the Github project. Hopefully, I will be more organized from now on :D