Thursday, July 14, 2011

Bioinformatics: the more we look, the more we find

I'm at a bioinformatics summer course in Poznan, Poland this week.  There are lecturers and students from many places, and Poznan is a very fine setting and this is a very fine program organized by Woitech Makalowski and Elizabeta Makalowska.  The topics cover many areas of the information sciences that try to deal with the huge amount of information being revealed about many different species of animal, plant and bacteria (among others) as a result of high-throughput, automated DNA sequencing and other techniques of similar power.

The old models are falling rapidly.  We now know very clearly that genomes are more than 20-some thousand protein coding units strung together along an otherwise inert DNA sequence.  Instead, many more functions are being discovered, even if their functions are only partly known.  DNA is copied into RNA, and the RNA has many different uses, and is even processed in many different ways.

The bottom line is that for the relatively few and straightforward causal functions in DNA, there is now an expanding array of newly found functions.  Some of these are clear, major, and easy to characterize.  But much of the evidence is for things that seem to have some function (for example, the same elements are found in similar elements of the genome in multiple species, suggesting that they have been conserved by natural selection).  For most of this, bioinformatics provides statistical evidence from reams of data, and some confirmatory experiments support the data-base analysis findings, but what the function is, or how important it is, or how variable it is, are still quite unknown.

Part of the problem is that this ever-expanding amount of complexity, of types we had not at all anticipated, makes general evolutionary sense but makes the idea of prediction from genes to traits much more problematic than we have hoped.  The complexity makes evolutionary sense if you are willing to abandon the hyper-simplistic idea that gene makes protein makes trait and selection definitively removes the bad and advances the good versions of the trait.  Instead clearly selection is very tolerant of variation, that is redundancy all over the place, and each element of a genome has function that depends on what's in the rest of the organism's genome, and its living environment.

This suggests a much less causally definitive view of life than is the general theory, and also reveals how very little most people--even most biologists--are aware of when it comes to the complexity of genomes and their relation to the traits we care about.

There are no solid answers beyond simply noting how complex and internally variable organisms are, and it is not clear even what kinds of answers will be needed for better explanations.  It is clear that most people will propose purely technological answers: generate more data and more sophisticated computer programs to process it....and hope some deeper truths, if there are any, will emerge from the results.

Maybe it will work that way.  But except for a subset of things that follow the simple theories we thought applied to all of life, promises of quick or simple answers don't seem in the offing, no matter how much we may hunger for them.  So, too much to say in any detail in a blog post, but very stimulating and humbling to hear about even more sources of complexity than I was aware of!

No comments: