Source: Moal Quraishi blog, Dec 2020
The “pure” problem of going from a single sequence to structure is the problem that’s been closest to my heart for over a decade, so it’s painful to say it, but it is the truth. It is similar to how the first mathematical proof of a result garners the most interest and accolades, even if subsequent complementary proofs are interesting in their own right.
my interest in the field has always had a practical bent: structure not for its own sake but in service to biology. For this vision to become reality we need data, structural data, which has always been very hard to come by. AF2 is profoundly transformative because it may do for structure what DNA sequencing did for genomics; make it possible. Every question in biology, from the molecular to the cellular to the organismal to the evolutionary, can now be posed and framed in terms of structural hypotheses. We’ve done this with sequence for at least a couple of decades and it has come to define every facet of biological sciences.
Now we get to do it over all again with structure. And while the structure → function dogma never fully rung true with me, it’s certainly the case that having structure > sequence for determining function.
we can speculate. First is the question of function derived from structure. There have been numerous efforts to predict protein function and do so in nuanced ways that reflect their multifunctional reality. Thus far these efforts have largely relied on sequence—now all can be redone using structure.
Especially in prokaryotic biology, where vast swaths of bacterial proteomes are still entirely uncharacterized, this alone may transform our ability to understand and one day engineer them.
the most exciting opportunity of all: the prospect of building a structural systems biology.
In almost all forms of systems biology practiced today, from the careful and quantitative modeling of the dynamics of a small cohort of proteins to the quasi-qualitative systems-wide models that rely on highly simplified representations, structure rarely plays a role.
This is unfortunate because structure is the common currency through which everything in biology gets integrated, both in terms of macromolecular chemistries, i.e., proteins, nucleic acids, lipids, etc, but also in terms of the cell’s functional domains, i.e., its information processing circuitry, its morphology, and its motility.
A structural systems biology would take this seriously, deriving the rate constants of enzymatic and metabolic reactions, protein-protein binding affinities, and protein-DNA interactions all from structural models.
We don’t yet know how much easier, if at all, it will be to predict these types of quantities from structure than from sequence—we need to put the dogma of “structure determines function” to the test. Even if the dogma were to fail in some instances, which it almost certainly will, partial success will open up new avenues.
DeepMind likes well-defined problems with clear objectives and metrics. Science is almost never this way but protein structure prediction actually fit the bill perfectly. There is literally a leaderboard every two years. Scientific problems with this feature are likely to attract DeepMind’s attention.
Third, DeepMind does have a core competency and it is machine learning. By the late 2010s protein structure prediction had turned into an almost exclusively machine learning problem. It required some domain expertise, and the AF2 team composition reflected that a bit, but by and large the hard problems were machine learning ones. This suggests that problems in which machine learning is not the core nut to crack are also less likely to attract DeepMind’s attention.
Fourth, given point three, any problem that machine learning can tackle must have a lot of data, and representative data that cover a large swath of the problem space.