Papers in computational evolutionary biology

Posts Tagged ‘harmer

Internal coarse-graining of molecular systems

leave a comment »

Feret J, Danos V, Krivine J, Harmer R, & Fontana W (2009). Internal coarse-graining of molecular systems. Proceedings of the National Academy of Sciences of the United States of America, 106 (16), 6453-8 PMID: 19346467, PNAS page, Supporting Information.


Models of molecular dynamics suffer from combinatorial explosion: the phenomenon of an exponential number of combinations arising from  a small set of basic entities. A protein with 10 phosphorylation sites, for example, can exists in 2^10 = 1024 distinct forms (states); if any two of these can form a complex, then the number of distinct molecular species rises to 525312. For a modeller tasked with building a mathematical description of such a system combinatorial explosion is a major problem, for it prohibits explicit representation of every species, and—more importantly—makes straightforward models (i.e. one equation per species) computationally intractable. On the other hand, a simple system like the one described above can reasonably be expected to admit a simple model capturing its essential features. How to build it, then?

One solution is to use rule-based languages, where instead of modelling molecular species, one builds parametrised models of  the biochemical reactions the species engage in. The key idea is that most of the technical differences between species do not matter for their ability to take part in a particular interaction, and hence there are substantially less interaction patterns (a.k.a. rules) than there are species, each pattern being applicable in a large chunk of the species space. In this way rule-based modelling avoids the combinatorial explosion as far as specification of the system is concerned. The execution cost, however, is often still prohibitive.

Feret et. al. offer an ingenious method of reducing the computational cost of the analysis of rule-based models. It is based on the simple observation that while an external human observer may distinguish between two different species, the dynamical system itself may be unable to do so. To quote from the paper (emphasis added):

…an experimental technique might differentiate between SOS recruited to the membrane via GRB2 bound to SHC bound to the EGF receptor and SOS recruited via GRB2 bound to the EGF receptor directly. However, from the perspective of the EGF signalling system, such a difference might not be observable for lack of an endogenous interaction through which it could become consequential. The endogenous units of the dynamics may differ from the exogenous units of the analysis.

The natural consequence of this observation is that one can use the information contained in the rules to infer what species are indistinguishable in the above sense and provide just one equation per cluster of indistinguishable species (called a fragment in the paper). This is exactly what authors do, and the results for their benchmark model of the EGFR pathway are very encouraging. In the case of a simpler model (39 rules), there are 10 times less fragments than species; in the case of the bigger model (71 rules), the methods yields a staggering million million-fold (10^12) dimensional reduction.

It is important to realise that the notion of dynamical indistinguishability of species is not merely a technical device for model reduction. It captures a property that is essential to evolution and dynamical stability of molecular systems, and does it from the semantic rather than syntactic perspective (i.e. by focussing on the equivalence of dynamics rather than equivalence of model descriptions). As such, it is worth investigating in much greater detail. Another important point is that the method is not a statistical heuristic that may fail for special cases. All species lumped together in a fragment are provably indistinguishable from each other. The only sub-optimality is the possibility that two species are in fact dynamically indistinguishable, but the method separates them anyway. These issues are discussed at length in the supporting information, linked above.

Finally, a word of warning: the authors use and develop sophisticated mathematics and computer science, not molecular (nor even theoretical) biology. Readers without quantitative background may struggle to follow the paper.

(Full disclosure: one of the authors is going to act as an examiner of my Ph.D. thesis.)

Written by evopapers

October 18, 2010 at 16:23

Posted in other

Tagged with , , , , , ,