Papers in computational evolutionary biology

Epistasis and multi-peaked fitness landscapes

leave a comment »

Epistasis is the dependence of the fitness effect of a gene on other genes. It is thought to be a ubiquitous phenomenon: there is no reason to expect that a single gene has a single, clearly defined function which it can perform regardless of the genetic background it is a part of. There are different kinds of epsitasis. Here, the authors consider the reciprocal sign epistasis, a situation where a mutation of one locus can be either deleterious or adaptive depending on another locus, and the same is true of the latter locus as well, with the former now controlling the fitness effect of mutation.

The result reported in the paper is that reciprocal sign epistasis is a feature of any fitness landscape with two or more peaks. The argument (the proof, in fact) is devastatingly simple: find a path between two peaks, and consider the two mutations leading to and from its fitness minumium. If flipping the order of these mutations preserves the location of the minimum, the two loci involved exhibit reciprocal sign epistasis by definition; otherwise proceed to the new miniumum and do the same thing. This procedure necessarily terminates, because every new minimum has a higher fitness value than the previous, and yet is bounded by the fitness of the lower of the two peaks.

The authors proceed to argue that no such local property can characterise (i.e. form a necessary and sufficient criterion) multi-peaked fitness landscapes. The paper is short, well-written, and contains virtually no math. It is good to see that such simpleyet profound insights are still out there to be had.

Poelwijk, F., Tănase-Nicola, S., Kiviet, D., & Tans, S. (2011). Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes Journal of Theoretical Biology, 272 (1), 141-144 DOI: 10.1016/j.jtbi.2010.12.015

Written by evopapers

April 14, 2011 at 09:54

Posted in Uncategorized

Tagged with , , , , ,

Internal coarse-graining of molecular systems

leave a comment »

Feret J, Danos V, Krivine J, Harmer R, & Fontana W (2009). Internal coarse-graining of molecular systems. Proceedings of the National Academy of Sciences of the United States of America, 106 (16), 6453-8 PMID: 19346467, PNAS page, Supporting Information.


Models of molecular dynamics suffer from combinatorial explosion: the phenomenon of an exponential number of combinations arising from  a small set of basic entities. A protein with 10 phosphorylation sites, for example, can exists in 2^10 = 1024 distinct forms (states); if any two of these can form a complex, then the number of distinct molecular species rises to 525312. For a modeller tasked with building a mathematical description of such a system combinatorial explosion is a major problem, for it prohibits explicit representation of every species, and—more importantly—makes straightforward models (i.e. one equation per species) computationally intractable. On the other hand, a simple system like the one described above can reasonably be expected to admit a simple model capturing its essential features. How to build it, then?

One solution is to use rule-based languages, where instead of modelling molecular species, one builds parametrised models of  the biochemical reactions the species engage in. The key idea is that most of the technical differences between species do not matter for their ability to take part in a particular interaction, and hence there are substantially less interaction patterns (a.k.a. rules) than there are species, each pattern being applicable in a large chunk of the species space. In this way rule-based modelling avoids the combinatorial explosion as far as specification of the system is concerned. The execution cost, however, is often still prohibitive.

Feret et. al. offer an ingenious method of reducing the computational cost of the analysis of rule-based models. It is based on the simple observation that while an external human observer may distinguish between two different species, the dynamical system itself may be unable to do so. To quote from the paper (emphasis added):

…an experimental technique might differentiate between SOS recruited to the membrane via GRB2 bound to SHC bound to the EGF receptor and SOS recruited via GRB2 bound to the EGF receptor directly. However, from the perspective of the EGF signalling system, such a difference might not be observable for lack of an endogenous interaction through which it could become consequential. The endogenous units of the dynamics may differ from the exogenous units of the analysis.

The natural consequence of this observation is that one can use the information contained in the rules to infer what species are indistinguishable in the above sense and provide just one equation per cluster of indistinguishable species (called a fragment in the paper). This is exactly what authors do, and the results for their benchmark model of the EGFR pathway are very encouraging. In the case of a simpler model (39 rules), there are 10 times less fragments than species; in the case of the bigger model (71 rules), the methods yields a staggering million million-fold (10^12) dimensional reduction.

It is important to realise that the notion of dynamical indistinguishability of species is not merely a technical device for model reduction. It captures a property that is essential to evolution and dynamical stability of molecular systems, and does it from the semantic rather than syntactic perspective (i.e. by focussing on the equivalence of dynamics rather than equivalence of model descriptions). As such, it is worth investigating in much greater detail. Another important point is that the method is not a statistical heuristic that may fail for special cases. All species lumped together in a fragment are provably indistinguishable from each other. The only sub-optimality is the possibility that two species are in fact dynamically indistinguishable, but the method separates them anyway. These issues are discussed at length in the supporting information, linked above.

Finally, a word of warning: the authors use and develop sophisticated mathematics and computer science, not molecular (nor even theoretical) biology. Readers without quantitative background may struggle to follow the paper.

(Full disclosure: one of the authors is going to act as an examiner of my Ph.D. thesis.)

Written by evopapers

October 18, 2010 at 16:23

Posted in other

Tagged with , , , , , ,

Waddington’s canalization revisited

leave a comment »

Mark L. Siegal and Aviv Bergman Waddington’s canalization revisited: Developmental stability and evolution. PNAS 99(16):10528-10532 PNAS page pdf

Siegal and Bergman build on the earlier work of A.Wagner (reviewed below), who showed that canalisation in (models of) gene networks  may evolve as a by-product of stabilising selection. Recall that in Wagner’s model, a regulatory gene network was represented as a matrix and the phenotype as the stable state of the deterministic, discrete-time dynamical process it encodes.

This setup is retained in the present paper but, crucially, situations where the network does not have (or rather: appears not to have) a stable state are considered as well. This allows the authors to decouple the effect of stabilising selection from that of selection for the existence of the steady state of the network. The result is that, perhaps surprisingly, canalisation can be accounted for by the latter mechanism alone, and therefore is an intrinsic property of stable complex networks regardless of whether their evolution is driven by natural selection.

Siegal’s and Bergman’s model has a number of parameters, most notably the interconnectedness of the network (defined as the number of non-zero entries in the matrix). It turns out that highly connected networks display low initial canalisation, but evolve it rapidly and to a greater extent than relatively sparse ones.

Written by evopapers

July 29, 2010 at 12:00

Posted in other

Tagged with , ,

Does evolutionary plasticity evolve?

with one comment

Andreas Wagner Does evolutionary plasticity evolve? Evolution 50(3), 1996. pdf

The focus is on epigenetic buffering of mutations, the phenomenon called here (perhaps unfortunately) evolutionary plasticity. With the help of a simple computational model of regulatory networks, Wagner shows that the plasticity can increase when the network’s stable state is put under stabilising selection. This is an indication that stabilising selection can alone explain the canalisation observed in real regulatory networks.

A regulatory network is modelled as a discrete-time dynamical system, which in turn is encoded as a real matrix. The matrix together with an initial state determines the steady state (if any), which is treated as a phenotype. Matrices “evolve” through recombination (swapping rows between pairs of different matrices), mutation (random alteration of entries) and stabilising selection (deviations from the target steady state are punished). Epigenetic stability of such networks was assessed before and after 400 rounds of evolution, and found to have increased significantly in the process. In addition, the evolved networks converge to their stable states much faster.

Apart from the valuable scientific findings, the paper is notable for the dilligence with which Wagner (now heading a successful lab in Zurich) sets up and carries out his experiments. For example, networks and their stable states are chosen independently; and stability is assessed with respect to the original mutation constructs and an additional one, which was not used during the simulated evolution. While this is perhaps no more than good practice, it is still good to see these measures taken.

Written by evopapers

April 28, 2010 at 13:18

Posted in other

Tagged with ,

Curvature in Metabolic Scaling

leave a comment »

Tom Kolokotrones, Van M. Savage, Eric J. Deeds and Walter Fontana Curvature in Metabolic Scaling Nature 464:753-756, 2010. Nature page

This paper is not about evolution, but it is short, recent, published in Nature and comes from Fontana Lab, so there is definitely no harm in reviewing it. It deals with metabolic scaling, that is the relationship between an organism’s metabolic rate and its body mass. Experimental measurements seem to indicate that the metabolic rate is proportional to the body mass raised to a fixed power. The actual value of the exponent was first thought to be 2/3, and then 3/4; the latter was also derived by West et. al. from an involved theoretical model of vascular system [1].

Kolokotrones et. al. took a large dataset and showed that instead of a simple power law a more complex expression involving two exponents is a much better fit. When plotted on a log-log scale, the graph of this function is a slightly convex curve, rather than the straight line resulting from a pure power law; hence the title of the paper. Of course by introducing a new degree of freedom you will always get a better fit, but the improvement in this case is considerable, and, crucially, the curve can be approximated in different regions by pure power laws with the well-established exponents. This shows that essentially both the 2/3 and 3/4 hypotheses were correct.

A mechanistic explanation for the 3/4 theory was provided by West’s model, and so the authors set out to modify it to get a two-exponent formula instead. Apparently it is possible by postulating a different moment of transition between the pulsatile and smooth blood flow dynamics. More details can be found in Supplementary Information, if you’re interested (I am not).

Now, it is possible that the curved fit does not represent any underlying biological principle. As mentioned above, the curve can be approximated by two or more power laws acting on different parts of the data. It is conceivable that the relationship is in fact a pure power law, but evolutionary distant families of mammals (the study is on mammals) evolved—for whatever reasons—different exponents. Through phylogenetic analysis, Kolokotrones et.al. show that this is not the case, and that curvature is observed in subsets of data corresponding to closely related species. Other factors, such as habitat and food type were also excluded, suggesting that there is an underlying mechanistic principle at work.

[1] West, G. B., Brown, J. H. & Enquist, B. J. A general model for the origin of allometric scaling laws in biology. Science 276, 122–126 (1997).

Written by evopapers

April 7, 2010 at 17:44

Posted in other

Tagged with ,


with one comment

Marc Kirschner and John Gerhart Evolvability. Proc. Natl. Acad. Sci. USA 95(15), 1998.  PNAS page pdf

This hugely influential paper attempts to uncover the high-level features of complex biological architectures that facilitate their phenotypical variation. The authors analyse several examples of  highly conserved mechanisms they call “core processes” and argue that

the conservation of these core processes for the past 530 million years is related less to the processes’ own constraint, embedment and optimization than to the deconstraint they provide for phenotypic variation of other processes, on the basis of which they are continually coselected.

Now, the obvious interpretation of evolutionary conservation is that the conserved process plays an important role in a crucial function of the organism and/or confers a significant fitness advantage. Kirschner’s and Gerhart’s suggestion that this advantage is in fact evolvability itself goes (in general) dangerously close to invoking group selection, and they acknowledge as much. I do not feel (yet) competent to comment on this, so I will review instead the excellent observations that the authors make about the high-level organisational priciples that contribute to evolvability.

Versatile proteins are pretty much what it says on the tin: proteins that are not very specific, but admit a range of targets. The example given in the paper is that of calmodulin, a prominent player in various calcium-based signalling pathways. Calmodulin usually inhibits the function of the protein it binds to, but because the range of targets it recognises is so broad, the inhibited agent can be an inhibitor itself, or maybe an activator, etc. As a result, calmodulin has great value as an universal negation gate in many different regulatory contexts. Dually, because of the low general specificity, a random regulator protein is presumably just a few mutations away from responding to calmodulin and the emergence of a new regulatory connection. Thus the versatility of calmodulin faciliates phenotypic variation of a regulatory network.

Weak linkage means that “the activity of a process depends minimally on other components or processes”. This is a fuzzy concept to me. Judging by the examples given in the paper, this is yet another face of the flexibility and versatility covered in the previous paragraph and it is unclear to me why the two should be treated separately, other than perhaps the fact that weak linkage refers not as much to the individual components of the system as to the way they are put together. The authors discuss weak linkage in eukariotic transcription and this is perhaps what the paper is known for the most: bringing to the fore the evolution of regulation (as opposed to the evolution of structural genes).

Exploratory processes perform their function relying as little as possible on the particulars of their client/target processes. One example given in the paper is the microtubule cytoskeleton helping to separate chromosomes before cell differentiation: the tubules grow in random directions, but stablise only when they find the chromosome. In this way, the skeleton is built correctly regardless of the initial positions of the chromosomes, cell size and shape, etc. These parameters are thus free to change, and this is why the exploratory formation of the cytoskeleton facilitates phenotypic variation. Another example is the immune system, which randomly generates antibodies until the right one is recognised. The authors also refer to this design principle as “epigenetic variation and selection”.

Compartmentation is just an ugly word for modularity, only that the modules/compartments may be genomic (different genes for different things), temporal (i.e. processes happen in stages), spatial, or even target-spatial i.e. the same process is independently deployed and regulated in different regions of cell/tissue (example: drosophila bristle development). This kind of architecture facilitates phenotypic variation because brakedown of one module does not necessarily entail the brakedown of the whole system. A computer scientist would probably advocate the value of interfacing and hiding at this point.

Written by evopapers

April 3, 2010 at 19:20

Posted in classics

Tagged with , ,

An end to endless forms

leave a comment »

Elhanan Borenstein and David C. Krakauer An end to endless forms: Epistasis, phenotype distribution bias and non-uniform evolution. PLoS Comp. Bio. 4(10), 2008. pdf

The paper analyses a simple model of development: the space 2^n of binary vectors (genotypes) mapped to the space 2^k of binary vectors (phenotypes; k>=n) by a linear transformation coupled with a heaviside function. More precisely, a genotype g is mapped to its corresponding phenotype p by the formula

p = H(D(g))

where D is a nxk matrix whose entries belong to {-1,0,1}, and H(x) is zero when x<0 and 1 for x>=0.

The model recreates the well-known result of the RNA folding studies [1]: the development map is highly degenerate, i.e. there are many genotypes mapped to the same phenotype ,and the distribution of degeneracy levels follows a power law. However, unlike the RNA folding framework, this model considers phenotypes which are not images of any genotype. It is therefore possible to talk about the fraction of realised phenotypes (called visible phenotypes in the paper). Quite as could be expected, it turns out that this fraction is very low, even when measured against 2^n rather than 2^k. The authors vary various properties of their model, such as sparseness of D, but the results remain reasonably robust. The last part of the paper explores the dynamics of neutral evolution of such models, the main result being that increase in the size of D reveals (in absolute, not relative terms) more phenotypes, but  instead of founding new islands of visible phenotypes, they seem to chart preexisting ones with more and more resolution.

This is a very well written, engaging and important paper. It validates the theoretical evo-devo work on RNA, but the setting used is more general and thus provides more general explanations of the causes and properties of the degeneracy of the genotype-phenotype mapping. It would be interesting to see an analysis of the neutral spaces of these models, or, more generally, what an evolutionary meaningful distance function of the development matrices induces on the morphospace.

[1] P. Schuster, W. Fontana, P.F. Stadler and I.L.Hofacker From Structures to Shapes and Back: a case study in RNA secondary structures. Proc. Biol. Sci. 255:279-284.

From sequences to shapes and back: a case study in RNA secondary structures

Written by evopapers

March 27, 2010 at 00:08

Posted in other

Tagged with , ,