Friday, December 19, 2014

Survivorship bias and genetics

I was a mathematics major as an undergraduate.  However, not then or since have I been anything that one could call a mathematician.  At least, I hope I learned something about trying to think logically about life even if I never do equations.  But this interest led me to read a new book I was told of, called How Not to be Wrong: the Power of Mathematical Thinking, by Jordan Ellenberg (2014, NY, Penguin Press).

This is a popular rather than technical book, but it shows in interesting and serious ways how mathematical thinking can lead to improved understanding of the real world.  I think it has relevance to an important area in current evolutionary and biomedical or agricultural genetics.  So I thought I'd write a post about it.

Survivorship bias
Ellenberg begins his book with an illustration of how abstract logical thinking can solve important real-world problems in subtle ways.  In WWII a mathematics research group was asked by the Army to help them locate armor plating on fighter aircraft.  The planes were returning to base with scattered bullet holes from enemy fire and the idea was to put some protective plating where it would do the most good without adding cumbersome mileage-eating weight.  The mathematician suggested to put the plating where the bullet holes weren't.  This seemed strange until he explained that this was because the bullet holes that were observed hadn't done much damage: bullets hitting elsewhere had brought the plane down so it was never observed because the plane never returned to base.  The engine compartment was the case in point: a shot to the engine was fatal to the aircraft, but to the wings and body, much less so.

This is a case of survivorship bias.  It can apply widely, and evolution and genetic causation provide instances where it seems likely to be a useful principle.  As geneticists we ask, what the genes whose variation causes variation in adaptive or biomedically interesting outcomes.  This is what genome mapping in its various forms is intended to identify.

Ironically, it seems, when we do experiments involving development or testing of genetic mechanisms by, say, knocking out a gene, or when we observe the major gene-usage switches that occur when some part of an embryo's body are forming, we can identify specific genes that seem to be very important.

Several pieces of evidence can suggest they are important.  One is the finding that the same gene is used in similar roles in very distantly related species (often, even, between humans and flies or even more distant species). It's usage has been conserved.  Secondly, there is usually far less variation within or between species in such genes than in what we believe to be non-functional or marginally functional parts of genomes. This seems to suggest that variation hasn't been tolerated by natural selection.  Thirdly, many congenital diseases in plants and animals including humans have proven to be due to the effects of variants, often newly arisen mutations, in a specific gene.  Most cases of diseases like Cystic Fibrosis, Phenylketonuria, Muscular Dystrophy, or Tay Sachs Disease are of this sort.  Some congenital traits like, say, eye or skin color, are also due to inheriting specific variants in at least relatively few genes.

Such findings at least indirectly fueled the fervor for mapping every trait one can define, with grand promises of discovering the genes 'for' the trait.  Conscientious investigators justified expensive mapping efforts by showing that their trait of interest had substantial heritability, for example, because trait-values were to a substantial extent correlated among close relatives in predicted patterns.  However, for most traits like diabetes, cancer, heart disease, or behavioral characteristics, such findings are few and far between.

Despite a welter of PR spin to the contrary, instead of dramatic findings of the expected (and promised) sort, what was found was that the traits were affected by variation in tens, hundreds, or even thousands of different parts of the genome. Even taking all these together, they typically only accounted for a fraction--usually a small fraction--of the estimated heritability.

What is this 'missing heritability'?

Evolutionary survivorship bias
A central theoretical idea is that a fundamental genomic criterion for showing biological function is sequence conservation.  Most evolution is purifying: what has been put together over billions of years is risky to change.  So most mutations in clearly functional areas of DNA are either neutral or deleterious.  As a result, more variation accumulates in non- or weakly functional DNA than in important genes.

What that means is that the variation we see misses what existed heretofore and hence is not a representative sample of all the variation that arises.  The idea can be that most variation in genomes is of major importance.  As a result, the tendency is to assume that non-conserved areas of the genome are non-functional.  This may be true, but it may be that our belief that conservation equals function is a corollary of a belief in strong Darwinian natural selection in molding traits.  In fact, most genomic variation is not of the highly conserved sort, but our analysis and explanation of functional genomics is biased by our predilection for ignoring less-conserved variation.

This can be seen as a kind of survivorship bias in that we assume that variation in non-conserved genome areas just doesn't survive for very long--isn't conserved because it has no function.  That's a kind of circular reasoning and has been, for example, highly contentious in the interpretation of the ENCODE project's objective to identify all causal elements in the genome, and in questions about whether selectively neutral variation exists at all. The same conceptual bias leads to reconstructions of evolutionary adaptive history that centers on the conserved genes as if they were the genes that were involved.  Finally, important genes that were involved in a trait's evolution to its current state may no longer be involved, and hence not be considered because their role did not survive to be identified today.

Biomedical survivorship bias
The same sort of bias in ascertaining the spectrum of causal variation exists on the shorter life-time scale of biomedical genetics.   There is a big discrepancy between the clearly key role of genes identified in experimental and developmental genetics, and in the deeply conserved nature of those genes, and the general lack of 'hits' in those genes when genomewide mapping is done on traits those genes affect.

How can a gene be central to the development of the basis of a trait, and yet not be found in mapping to identify variation that causes failures of the trait?  Indeed, the basic finding of GWAS and most other mapping approaches is that the tens or hundreds or thousands of genome 'hits' have individually trivial effects.

The answer may lie in survivorship bias.  Like the lethality of bullets to the engine of a fighter, most variation in the main genes, those whose sequence is more highly conserved, is lethal to the embryo or manifest in pathology so clear that it never is the subject of case-control or other sorts of Big Data mapping.  In other words, genome mapping may systematically be inevitably constrained to find small effects!  That's exactly the opposite of what's been promised, and the reason is that the promises were, psychologically or strategically, based on extrapolation of the findings of strong, single-gene effects causing severe pediatric disease--a legacy of Mendel's carefully chosen two-state traits.

To the extent this is a correct understanding, then genomewide mapping as it's now being done is, from an evolutionary genomic perspective, necessarily rainbow-chasing.  Indeed, a possibility is that most adaptive evolution is itself also due to the effects of minor variants, not major ones.  Once the constraining interaction of the major genetic factors is in place, mostly what can nudge organisms in this direction or that, whether adaptively or in relation to complex, non-congenital disease, is based on assembled effects of individually very minor variants.  In turn, that could be why slow, gradualism was so obviously the way evolution worked to Darwin, and why it generally still seems that way today.

Survivorship bias is a kind of mis-understanding of statistics and sampling that careful reasoning can illuminate.  It is so easy to collect biased samples, and so hard to do otherwise, and consequently so easy to make convenient, but erroneous inferences.  Science is a complex business and it's an unending challenge to do it right--even to know when we are doing it right!

Thursday, December 18, 2014

(Other) lessons of the Broad Street pump: understanding causation isn't so easy

The iconic John Snow, often referred to as the "father of epidemiology," is commonly credited with discovering the cause of cholera after his careful, empirical examination of the 1854 outbreak of the devastating disease in the Soho neighborhood of London.  But I think it's only with hindsight that we can say this, and I think it's not quite right.

Snow was nothing if not a detail man.  A physician, he was very much an empiricist, experimenting and observing to test his ideas about health and disease like no one else of his time.  He had developed his waterborne theory of cholera some time before the 1854 epidemic, writing about it in detail in 1849.  The 1854 outbreak, very near his home, was an ideal circumstance for him to try to confirm his theory.

Modified from Snow's map in The Ghost Map; Johnson, 2006

Soon after the outbreak began, Snow began interviewing anyone with a family or household member who had died of the disease to determine the source of their drinking water.  Every case had drunk water from the Broad Street pump.  And, he confirmed that the worst symptoms were intestinal, not respiratory, which meant to him that the cause was something people had ingested rather than inhaled.  He found that there had been no cases among the 70 workers in the Broad Street brewery, because they were all given free beer, and never drank water at all.  From the information he collected, he drew his famous map of the neighborhood which showed that cases clustered around the Broad Street pump.  He concluded the pump was the source of the contaminated water that was making people ill.

He then enlisted the aid of a previously skeptical ally, and eventually convinced an even more skeptical local council to remove the handle from the pump -- to the disgust of many local residents who thought this was a cockamamie idea.  Not long after the removal of the handle, the epidemic was over.  But even Snow recognized that the epidemic had already begun to abate by the time the handle was removed.  That piece of the story is often lost, however; perhaps from the vantage point of 160 years on, when we know that Snow was right, the removal makes a nice tidy ending.

But did Snow identify the cause of cholera?  No, not in the way we would accept today.  We would say he had strong circumstantial evidence, but we'd require the causal organism.  There were multiple competing theories for the cause at the time. An excellent history of the epidemic, The Ghost Map: the Story of London's Most Terrifying Epidemic, and How it Changed Science, Cities and the Modern Worldby Steven Johnson (2006), tells the story in detail. Johnson writes that an editorial in the Times of London in 1849 considered the possible causes of cholera:
• “A … theory that supposes the poison to be an emanation from the earth”
• An “electric theory” based on atmospheric conditions
• The ozonic theory -- a deficiency of ozone in the air
• “Putrescent yeast, emanations of sewers, graveyards, etc.”
• Cholera was spread by microscopic animalcules or fungi, though
   this theory “failed to include all the observed phenomena.”
                                 Source: The Ghost Map, Steven Johnson, 2006,  Riverhead Books
Note that the idea that cholera was spread by "microscopic animalcules or fungi" was deemed empirically deficient by the editors of the Times, and it certainly was, as no organism associated with the disease had yet been identified.  In 1854 Snow himself looked at water from the Broad Street pump under his microscope, and had seen nothing of note.

And, Snow wasn't the only one with empirical, observed evidence for the cause of cholera.  Indeed, each of the alternatives put forth by the Times was entirely plausible, given the current state of knowledge.  Miasmatists were empiricists too: epidemics were localized in poor areas, where air smelled bad, water was filthy and smelled bad, there were more cases in cities, fewer cases in hills, no living organism had been found to suggest they were wrong.  What both Snow and the miasmatists had was circumstantial evidence, correlations, and belief in their preferred theory.  And, at the time, no definitive way to choose between them.

My point here is not to doubt Snow's theory, of course, but to suggest that although we now know that he was right, that was much less obvious at the time.  Indeed, it wasn't really until the organism that causes cholera, Vibrio cholerae, was discovered by Robert Koch in 1883 that Snow's story could be considered conclusive.  (Actually, the organism was first seen in 1854 by Italian anatomist Fillipo Pacini, but this was not well-known at the time.  If it had been, would Snow have had an easier time convincing people that he was right?  I think the germ theory of disease had to get going in earnest before that could have happened, so I think probably not.)

What killed the miasma theory?  One blow was the rise of the germ theory, and the discovery of organisms that caused disease, one after another.  (Though, is the miasma theory in fact dead?  Still today there is some thought that dirty air causes asthma!)

But determining the cause of infectious diseases has its own problems.  It wasn't, and isn't, as simple as seeing live organisms  under a microscope.  Robert Koch was a German physician and microbiologist who discovered a number of causal microbes.  He won the Nobel Prize in Physiology of Medicine in 1905 for his work on tuberculosis.  He proposed a set of postulates, first published in 1890, that were meant to be useful in confirming microbial causes of infectious disease.  

                                                           The Koch Postulates
1.The microorganism must be found in abundance in all organisms suffering from the disease, but should not be found in healthy organisms.

2. The microorganism must be isolated from a diseased organism and grown in pure culture.

3. The cultured microorganism should cause disease when introduced into a healthy organism.

4. The microorganism must be re-isolated from the inoculated, diseased experimental
host and identified as being identical to the original specific causative agent.

Unfortunately, and Koch knew this too, many microbes don't meet these criteria.  There can be asymptomatic carriers of cholera and other diseases; many microbes can't be grown in culture, and so on.  So, when a microbe behaves properly, following the postulates, all is good but when it doesn't, as with, say, HIV, controversy can ensue (see Duesberg).

Another blow to the miasma theory was the birth of a statistical basis for establishing causation.  The American philosopher, logician, and mathematician C.S. Peirce formulated the idea of randomized experiments in the late 1800’s, after which they began to be used in psychology and education.

Randomized experiments were popularized in other fields by R.A. Fisher in his 1925 book, Statistical Methods for Research Workers. This book also introduced additional elements of experimental design, and this was adopted by epidemiology.

Physician and epidemiologist Austin Bradford Hill in 1937, published Principles of Medical Statistics for use in epidemiology.  And, the development of population genetics, which Ken has been writing about this week, and the Modern Evolutionary Synthesis (which showed that Mendelian genetics is consistent with gradual evolution), and discoveries in genetics laid the foundation for approaches to looking for the genetic basis of traits and diseases.

Recognizing that attributing cause to disease needed a more formal approach, Bradford Hill suggested a set of criteria that he thought were at least useful to consider.  The "Hill Criteria," which he published in 1964, are still in use today.  
Strength: The larger the association, the more likely that it is causal
Consistency: Findings should be consistent between observers in different places.
Specificity: The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship
Temporality: The effect has to occur after the cause
Biological gradient: Greater exposure should generally lead to greater incidence of the effect.
Plausibility: Must make sense
Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect
Experiment: "Occasionally it is possible to appeal to experimental evidence”
Analogy: The effect of similar factors may be considered.
         AB Hill, “The Environment and Disease: Association or Causation?,”
                          Proceedings of the Royal Society of Medicine, 58 (1965), 295-300.
Again, even the author knew that only one of these was actually a requirement for causation, as he discussed in the paper proposing the criteria; the cause has to precede the effect.  The others are either vague or just 'would be nice', or in many ways are highly or even purely subjective.  So, when they work, great and we attribute our conclusions to their application, but when they don't, it's not clear whether a possible factor isn't a cause, or just that the criteria aren't adequate for determining it, or our sample inadequate, or some other perhaps unknowable problem.

A set of "molecular Koch postulates" were devised in the 1980's, to determine the role of a gene in the virulence of a microbe, but they, too, have their failings for similar reasons.

And, statistical criteria have become the standard for determining causation, but we know that p-values are arbitrary (see Jim Wood's MT post, "Let's abandon significance tests", on this), that statistics are only as good as the studies that generate them, and studies are prone to biases and missing data and the like, and results can be difficult to replicate even if studies are state-of-the-art.  David Colquhoun has written a lot on this, including here and here.

Why go on about this?
We write frequently here on MT about how important it is to think about how we know what we know.  If we don't, we can get very close to religious territory, where knowledge is based on belief, not observation.  Indeed, even in science, to some of us, every trait is genetically determined, or we've got our favorite cause of obesity, or autism, or diabetes.  The ease with which we might choose to understand cause and effect without questioning how we know reflects two things -- one, belief is alive and well as a way to determine cause, and two, we often don't have demonstrably better ways to do it.

So, we don't know if sugar is the cause of the obesity epidemic or fat, or just overeating; we don't know whether breast feeding or bottle is the cause of the asthma epidemic; whether genes or environmental risk factors are the most important cause of type 2 diabetes, or which ones, and so on.  A lot of work in genetics is still based on the assumption that traits are simple, even though we know the kinds of traits that are likely to have simple explanations (the low-hanging fruit) and we know that they are rare.  We know the kinds of traits that are complex, and that aren't going to have easy explanations of the kind often suggested, and yet 'gene for' thinking is still prevalent in the popular press, and even among scientists.

Ludwik Fleck, a Polish physician and biologist, in 1935 published a book, Genesis and Development of a Scientific Fact, that is now properly recognized as the precursor to Thomas Kuhn's Structure of Scientific Revolutions.  Fleck wrote about "thought collectives" in science, his idea that facts in science are driven by context.  We follow the herd, until in fact the thought collective becomes a thought constraint.

Fleck writes of the development of the Wassermann test for syphilis, meant to determine who had the disease, but instead the thought collective at the time led the test result to define the disease.  It's an excellent short little book and well worth reading, but Ken wrote an even shorter column on Fleck, also worth reading if you're interested in Fleck and the sociology that is an important part of the way science actually operates.

A modern equivalent would be the common de facto practice of defining a genetic disease by genotype -- if a patient has one of the known genetic variants associated with the disease in other patients, he or she has the disease, but if not, he or she doesn't have the disease.  Even though we know that there can be many pathways to a given phenotype (our post last week on phenogenetic drift describes one reason for this).  Such definition, if everyone is aware of its nature, can guide therapy in useful ways -- that is, some genotype-defined subset of a broader disease category may respond to a particular kind of drug. But the changeable landscape of definition based on assumed causal process is an important part of the elusiveness of many conditions, like autism and many others. Too often the assumption that the outcome is 'genetic' defines, steers, or determines the concept of the trait itself. That can distract, and we think regularly does distract, from more realistic approaches to what is currently the very elusive nature of many traits, normal and otherwise, in animals and plants.

Understanding causation is a fundamental issue in science, but the difficulties are often overlooked in the rush to publish.  To the detriment of the science.

Wednesday, December 17, 2014

Are we still doing 'beanbag' eu(genetics)? Part III. Culpably ignored nuances?

Part I of this series was about the particulate view of genes and their role in evolution and the determination of traits that are here because they were screened by evolution.  Many view all traits as being in this category, and genetic determinism of those traits to be very strong and specific.  But the data are less clear by far than the commitment to that idea.

Ernst Mayr criticized the one-gene-at-a-time focus of much of population genetics as 'beanbag' genetics.  Mayr said that this was wrong for reasons we mentioned in Part I.  As we discussed there, JBS Haldane, one of the grand ol' men who developed population genetics, wrote in defense of the field, in response to Mayr's criticism.

Haldane was a highly educated, thoughtful, perceptive British biologist whose life was nuanced in many ways that make telling a clear-cut story difficult.  He was brilliant and exceedingly skilled.  But he was also a product of his times, as are we all.  In the early 20th century he became a Marxist, as did many other British aristocrats, accepting all that implies about what determines the structure of human society.  Marxism was materialist but it was about improvability of individuals--an egalitarian view that claimed that position in a class-based society was due to class, not inherent inferiority of the lower classes, and thus that social inequity could--indeed would be erased by the processes of history.  At the time, the Soviet Union seemed a Great Hope to many in heavily unfair empirical Britain.  That essential malleability was one reason that the Russian plant geneticist Lysenko rejected Mendelian/Darwinian models of genetics in favor of a more Lamarckian mode of inheritance by which plants could be conditioned to have desired properties, and those would then be inherited. That proved in many ways to be a disaster for the Soviet Union.

Nonetheless, Haldane, who was a leading popularizer of science in his day, published a collection of reprinted essays in 1932 entitled The Inequality of Man.  Ironic for a Marxist, but he was not simplistic.  He dealt with, and accepted, the idea of eugenics in those essays, and that was largely what the title referred to.  He acknowledged the major role of environment in making people what they turned out to be. But he stressed that genetics was part of human makeup, too. Rather than a more balanced treatment, at points he lapsed into the aristocratic view about intelligence and in that sense, inherent societal worth.  The upper classes were what they are because of their abilities, and were under-reproducing compared to the lower classes.  He even wrote of society not having the guts to kill its lesser citizens: despite warning about too much stress on inherency, in one article he wrote:
"The danger to democracy to-day lies not in the recognition of a plain biological fact [of inherent inequality] but in a lack of will in certain countries to kill persons who obstruct the declared wishes of the majority of the people."  Further, "The only clear task of eugenics is to prevent the inevitably inefficient one per cent of the population from being born, and to encourage the breeding of persons of exceptional ability where that ability is known to be hereditary."  There should not be a democracy except of a better minority.
There is a mix of views in Haldane's chapters, ranging from the autocratic extreme to something more humane and nuanced.  He discusses social class, race, and intelligence as related to achievement, and even within Europe he makes distinctions about intelligence between (guess who!) northern and southern Europeans.  But he also promotes improved opportunities and acknowledges that we don't know the nature or extent of hereditary control of traits like intelligence.  In these popularized articles on many sociocultural issues, he is a softened genetic determinist. Perhaps this could be a Marxist 'from each according to his abilities, to each according to his needs' view; but that was always paternalistic when pronounced from on high.  Haldane, like many scientists who are given a public forum, strays far and wide beyond what he knows best, and nearly a hundred years on we can see his only too human opinions.  Life is complicated!

In any case, though the rhetoric is generally changed, we see roughly the same spectrum of views today, but that is in many ways implicitly a bean-bag model of inheritance.  In his day, the idea of identifying the genes that cause the traits of interest was technically not possible.  Now, in the excitement of 'omic technologies, the beanbagger approach is more explicit, noting this or that genetic variant that causes some socially relevant behavioral trait.  This viewpoint is widespread, despite some occasional caveats about complexity and even if there are many labs working on more integrative approaches to that complexity.

The difficulties
These are not simple issues.  People are different in physical, metabolic, and behavioral ways and clearly genetic variation is involved.  Depending on one's social politics, that can be a central or an uncomfortable fact.  But let us assume, for the moment and for argument's sake, that all the genetic determinism that has been proposed were perfectly true.  Then what?

The idea in the writings of various authors, from the past and today, is essentially about what 'we' should do to mold society this way or that.  But who are that 'we'?  They're the professors, politicians, and so on, who in positions of influence make the judgments about what 'we' as a society 'need' to do. 'We' want more intelligence and less addiction and crime (as defined by 'us', of course; usually 'we' aren't talking about white-collar crime).  'We' decide what would be 'good' for society and what should be discouraged.  And there is always the temptation to attribute inherent causation to these differences.

So, for example, we decide what do to with (or to?) those of higher and lesser inborn intelligence. This is rather indisputably arrogant and presumptuous, isn't it?  Or, perhaps, one can ask whether it is any different from what has gone on heretofore.

If the minority of the privileged have the power to decide on societal action, it is rather moot whether the criteria used to justify that action are presumed genotypic ones or just the arbitrary wielding of power.  Does it matter whether Divine Right or 'good genes' is credited with the power of the elite, and the subservience of the rest?  The powers-that-be define the value judgments.

Genotypes may have more, or less, determinative roles than is widely being claimed these days. Eugenics was a particular kind of social control, that had regularly dreadful, indeed lethal, consequences for many people for various reasons. But whether that was any worse than religious or other political dominance is an open question.

Does it matter if it's an ISIS member who chops your head off because of your religion, or a Nazi who gasses you because of your ethnicity, or a physician who decides what genotypes need to be screened prenatally and eliminated, or who gets educational resources?

We have our own personal view, which is that the data generally do not support the making of such decisions based on genotypes and their presumed predictive value--and decisions related to those genetic variants that really do have such value should only be made privately, rather than by public policy.  But the public pays for the treatment of genetic disease, so at what point is coercion within the scope of such an idea?

It is not clear whether these issues really ever get 'solved', or whether rational, measured discussion is even possible.  But it does seem clear that questions about how genes control, or don't control, the traits in organisms are worth understanding, rather than action being taken on vague assumptions about inherent causality before the questions are even answered.

Tuesday, December 16, 2014

Are we still doing 'beanbag' eu(genetics)? Part II. History's unlearned lessons?

Yesterday we discussed some of the ways in which particularized views of genomic control and evolution were controversial and that, despite much more knowledge now than when the issues first surfaced nearly a century ago, they are still with us in largely unchanged form--even if with massive amounts of data and lots of chest-thumping about how modern our current view is.

One consequence of a genomic causation as highly deterministic and specific is that one comes to believe that once a person is conceived, his or her genome essentially predicts his/her life so that, in particular, we can (a) work preventive miracles in regard to disease, and (b) think of designing the traits we would like to engineer in our offspring.  But the issues of genomic determinism are not at all new.

A new paper by Donald Fosdyke, a Peer J pre-print ("The relative roles of politics and science: William Bateson, black slavery, eugenics and speciation"), shows how controversies about genomic causation began in the late 1800s, not that long after Darwin's Origin of Species was published. Insufficiently circumspect appropriation of Darwinian ideas that occurred, contained within it the horrors of human abuse that would occur, under the rubric of eugenics, in genetics' and evolution's name.  And those issues, like the ones we discussed yesterday, are very much still with us.  Now, in principle at least, we have a chance to learn from history rather than repeat it.  But the signs that such a benign outcome is likely are not very favorable.

Wm Bateson.  From, on Google images

William Bateson (1861-1926) was a leading biologist in the formative decades after Darwin's ideas of evolution and Mendel's of genetic inheritance were swirling in scientific circles.  Mendel showed how stable, discrete traits were heritable, and essentially determined--to wit, the presence or absence of traits in his peas.  Mendelian inheritance was 'rediscovered' in 1900 and seemed to provide a sound idea of inheritance.  However, discrete Mendelian traits seemed at the time to be inherited without change.  Such stability and discreteness were inconsistent with the apparent nature of adaptive evolution that Darwin had suggested.  His idea was that traits vary infinitesimally among individuals and selection very gradually moves the resulting traits in a population to adapt to environmental change.

What was 'eugenics'?
The idea that society is composed of the ordinary and their betters is not new.  It has long been part of the rhetorical, religious, and material ways in which the minority justify their position and dominance over the masses.  That the role of the upper classes in pursuits like gluttony, debauchery, and warfare might be a danger to them, and hence to society as a whole if it must suffer without those lost in such endeavors, was a concern even to the classical Greek philosophers (well, they mainly worried about the warfare part).

In the years after Darwin's ideas were published, these concerns about what was good in bad in human nature, or who were the good or bad individuals in society, took on the panache of science, replaying, one might say, similar judgments made by invoking the will of God--the Divine Right of the upper classes, the inherent inferiority of non-Europeans that justified slavery, and so on.  If evolution generated the truly-better, and that means genetically better, then the loss of the social elite to disease or warfare deprived society and its future patrimony of the best genes in the gene pool.  And in any case, it is problematic that the masses outnumber the elites, yet have inferior genes.  Or, to be a bit more charitable, it was thought that it was possible to distinguish between individuals who really were lesser--inherently criminal, drug-abusers, amoral, slovenly and the like--vs those who were better.  The latter were the intellectual, scientific, military and other such leaders.

Now that Darwin had showed us how evolution worked, and that its workings were all about Nature making mortal value judgments (survival of the fittest), modern science could be used to further Nature's plan, speed it up, and ensure that bad luck didn't thwart that plan.  The effort to use science for human and evolution's betterment was called eugenics. The key factor, of course, was reproductive success.  Thus, if differences in individual character could be discerned, we could impose incentives to enhance the reproductive efforts of the better and lead the lessers to voluntarily restrain their own proliferation; that was called positive eugenics.  If this didn't work, we could screen the population for those with better and lesser inherent qualities, and use social mechanisms to impose restricted reproduction on the latter.  This was called negative eugenics.

The idea that genes specify who we are, implicitly meaning that even through the fog of culture, environment, and experience, has great appeal.  It's simple.  It leads to effective prediction.  It can be built into policy.  The eugenics movement was an application of Darwinian thinking that assumed many simplistic and/or unverifiable ideas about what Nature 'wanted' and which genotypes (and, of course, their specified traits) were good and which weren't, led textbook authors and research institutes to declare these things and this in turn reflected and/or led to policy imposed by society onto its citizens.  We know what happened in nearly a century of the imposition of 'science' to manufacture such ends.

The temptation was to think of traits as single-gene, or clearly 'genetic'.  Traits that are complex, such as we know many behavioral and common diseases are, can't be attributed to single genes or even a small list of additive contributors (though they can be modeled that way in statistical studies). Because the many components of complex genotypes recombine and shuffle their components among individuals and across generations.  For that reason, thinking of these traits as due to a bean or two from the beanbag is misleading and, essentially, erroneous relative to the underlying causal principles (themselves not yet very well understood).

Whether or not one holds a eugenic view of this sort, the policies in the name of eugenic 'science'--even if that was just a rationalization for what  politicians would do anyway--led to some of the worst horrors in human history, both to individuals and to whole groups.  The lesson was learned, and led to a prevalent environmentalism after WWII, where explicit eugenics was basically itself 'blacklisted'. But memory is short, and the hubris of scientists powerful.  Eugenics, in various new forms, is back.

Neo-eugenics: modernizing a ghastly idea (oh, no harm of course!)
Many readers may have seen the 19th-century-like OpEd in the recent NYTimes ("The downside of resilience," Jay Belsky), that advocated screening all children for a couple of markers of their personality and response to education.  Of course, it was all couched in terms of salubrious value to society--eugenics started out and was often proclaimed that way.  'We' just have to test 'them' (all school children) to find those who need special help to respond to schooling as well as others (another value judgment that the 'we' make about the 'them'), and then we can devote extra resources to those with sub-par performance.  Sure!  That is about as naive as believing in Santa Claus.  What history shows will likely happen is that the well-off will argue, with demagogues in politics at their side, that this is a waste of resources, which should be devoted instead to those like us, who will deliver for society's betterment.  If you think it will be otherwise, then you should go straight to the Mall and tell Santa what you want for Christmas.

In a sense, from our point of view, it doesn't matter if these sub-par-performance traits are 'genetic' or not.  The point is we have zero serious need to 'diagnose' them by genotyping.  We know that even for the vast majority of diseases, actual phenotypes are better predictors by far than genotypes.  So, if a trait is harmful, as disease or other limitations, we need only observe the trait itself or its prodrome--the signs it is coming.  We can learn to identify these things earlier, but at least we only 'treat' those who actually have the trait.

Since such traits are at most only partly, usually slightly genetic, we have no real need to do the genotyping afterwards, either.  Such traits might need therapy the way any disease needs therapy. There is not much gain in knowledge.  A 'beanbag' approach or conceptualization makes policy decisions seem easy but in fact makes much of the inference at best inaccurate to an unknown (perhaps unknowable) degree.

The reason for restraint is that, as history clearly shows , social engineering to protect those in power and influence is typically detrimental to those with 'undesirable' traits.  The value judgments are sometimes, if not perhaps often, based on irrelevant correlates (such as 'race').  But the consequences are that some group of 'we' decides what to do for (or to) some group of 'them'.

As the Fosdyke paper shows, these issues are not new, and even Bateson himself (who coined the term 'genetics') warned about the lack of knowledge of the complexities of biological trait determination, and the tendency towards unjustified eugenics a century ago. Fosdyke quotes Bateson from a 1905 piece of his in The Speaker:
What ... will happen when ... enlightenment actually comes to pass and the facts of heredity are ... commonly known? One thing is certain: mankind will begin to interfere; perhaps not in England, but in some country more ready to break with the past and eager for ‘national efficiency.’ ... Ignorance of the remoter consequences of interference has never long postponed such experiments. When power is discovered man always turns to it. The science of heredity will soon provide power on a stupendous scale; and in some country, at some time, not, perhaps, far distant, that power will be applied to control the composition of a nation. Whether the institution of such control will ultimately be good or bad for that nation, or for humanity at large, is a separate question.
                   W. Bateson, ‘Heredity in the physiology of nations.’ The Speaker, 14th Oct (1905).

Bateson had many different ideas on genetics, but in a sense his approach was rather beanbag in nature, thinking of 'gene' as an independent causal agent (though so far before actual genes or their actual particulate nature were known that 'beanbag genetics' really can't apply to him). He was not convinced that Mendelian factors could even account for evolution.

Things are complex and so were the commenters during the eugenics era. Even JBS Haldane, about whom we wrote yesterday in Part I of this series, was a mix of viewpoints. He was a founder of the genetically based area of population genetics that became and still generally is viewed as 'the' formal theory of evolution. It rests on genetic determinism to a great extent, and the idea that what is here is because the relevant causal agents--the genetic 'beanbags'--were closely scrutinized and favored by natural selection. That idea, which certainly has much truth behind it, makes it complex when it comes to judgments about human traits that affected the eugenicists then (and their descendants today). We'll deal with that in Part III.

Bateson warned us about the issues before the major abuses that led to the disasters of the mid-century--human experimentation, Nazi genocide, forced sterilizations and institutionalization. Of course, in the hubris of the new genetics and evolutionary theories of the time, nobody listened to warnings. Too many listened to pompous, self-assured scientific 'experts' who were, of course, always speaking for the public good. And we all know what happened.

However, it is also fair to say that in the absence of relevant information, many different views were circulating around at that time, as the new ideas and discoveries were being assessed. One must be judicious in giving too much hindsight-based credit to views that seem now to have been prophetic. Nonetheless, when a cycle seems to be repeating, it is proper to note how history unfolded even within living memory.

Are these statements too cautious? Is there no chance of a return of the last century? Maybe. Of course a 'return' will have its own form, its own rhetoric, and its own consequences. Some may be quite good (such as effective genetic prenatal counseling for clearly known devastating genetic disorders). But in science as in politics, religion, or other areas of human affairs, the hubris and excitement of such success historically leads to excesses. There are many things we are not allowed to do, such as falsely shout Fire! in a crowded theater. The restrictions on science an always be revised and where risks may exceed benefit, work should just not be done: there are plenty of less ambiguous ways to invest in science for human betterment.

The time to take care of your horse is before it leaves the barn.

Monday, December 15, 2014

Are we still doing 'beanbag' eu(genetics)? Part I. Some history

Way back in 1964, a famous paper was published in Perspectives in Biology and Medicine (vol 7: 343-359, and reprinted in the International Journal of Epidemiology in 2008; 'A defense of beanbag genetics').  The author was one John Burdon Sanderson Haldane, better known as JBS Haldane.  Along with RA Fisher and Sewall Wright (and, later Motoo Kimura, James Crow and an expanding array of others) Haldane helped found and then develop the field of population genetics.

JBS Haldane (1892-1964), from (on Google images)
Population genetics is the theory of change in genetic variation ('gene frequencies') in populations over time.  Essentially, one main thread of population genetics follows the fate of new mutations in DNA over time, and in that sense is centered around a variant--called an 'allele'--that arises by mutation in a single 'gene'.  It can model the change in that variant's frequency because of chance, population dynamics, natural selection and so on.  It can also model what happens with several such variants.  However, this theory is mute about what the variant actually does in the organism.

In a sense, population genetics is a particulate, molecular theory of change over time, that is, of evolution, that is largely divorced from real biology, but if biological traits are caused by genes, their variation must also be affected by genetic variants, so while the theory is a valid way to follow frequency dynamics, it seems to have judged it irrelevant to consider the organisms themselves.  Nonetheless, in the 1930's, the panache of its mathematical rigor and one might say fashionability as a molecular (and hence 'real' science) focus, led to population genetics being proclaimed 'the' formal genetic theory of evolution--and, really, more than that: the basic underlying assumption was, and has remained, that evolution is fundamentally a genetic phenomenon.  The impression was essentially given that the rest was incidental window-dressing.

Naturally, some biologists objected to this palace coup by a few mathematically skilled theoreticians; this is a natural resentment perhaps, since many biologists choose that field because they were innumerate, as was Darwin. But as importantly, because of the very particulate nature of the theory, relative to the real world, a leading spokesperson for evolutionary biology, Ernst Mayr, denigrated the theory as 'beanbag' genetics: the reduction of real organisms to a set of independent causal particles, the individual genes.  Instead, Mayr insisted, organisms and their evolution were more integrative, interaction-based phenomena.

In his 1964 paper, Haldane objected to this negative caricature of population genetics.  Essentially, he said that the theory allowed many ideas about evolution to be tested at least approximately, and could account in principle for a broad range of evolutionary phenomena.  He discussed the relationships between more nuanced aspects of genetics--interactions among genes, for example, that Mayr stressed--and the theory.

Nonetheless, while population genetics is very useful for putting some plausibility brackets around interpretations of genetic data from populations, it is still largely a one-gene-(or one linkage group of genes)-at-a-time theory; that is, it doesn't concern itself with actual traits or how they are manifest, and so on.  Indeed, leading developmental geneticists have, rightfully in our view, complained about the self-proclaimed theory of evolution's omission of the way that actual organisms are assembled, and evolve, and the role genes play in that.  The evolution of development (or 'EvoDevo') has become a major field of research, which, thanks to many advances in genetic experimental technology and model systems, has been able to relate developmental genetics to the evolution of the genes and the systems they're part of.

The other half of the 'bicameral' brain
There has long been a second thread of the theory, often called 'quantitative' genetics, that deals with the behavior of quantitative traits affected by large numbers of genes not specifically identified, that can predict aspects of traits in populations over time, but does not attempt to enumerate the individual genetic contributors.  These are called 'polygenic' traits (other similar terms are sometimes used), and are the target of many genomewide mapping efforts, about which we have written many times.

In fact, both these strains of thought go all the way back to around 1900 when Mendel's work was rediscovered, and then competed with Darwinian gradualistic ideas about evolution and genetics.  The competition involved squabbles between the Mendelians and what were call the 'biometricians', or quantitative geneticists.  Since that time, what these combined areas of theory and investigation have shown is that there is a spectrum of genetic causal effects.  Variation in traits that are generally very rare in their population are often due to variation in single genes--any number of diseases, usually severe and with very early onset, are in this category.  These behave in a classical 'Mendelian' way, just like Mendel's pea-traits did.  But most traits and most common, later-onset disorders are in the complex polygenic category.  Human thinking often tends either to focus on qualitative 'things', or on quantitative 'measures', and the difference between particulate and quantitative evolutionary genetics reflects that.

You may not be old enough to remember the phrase 'beanbag genetics' but it symbolized the naturalist's view that whole organisms or even ecosystems need to be studied as interaction entities,  rather than trying to understand evolution by particularizing things down to individual genetic variants, even if the latter are an essential part of the story.  That sort of reductionism was missing the point.  But have we long ago learned that lesson?

Where are we today?
In fact, today's Big Data GWAS-y world is conceptually still largely wedded to beanbag genetics.  It is still driven by a reductionistic approach that essentially believes that by enumerating the individual beans in each person's genome, that person's entire nature can be understood or even predicted from the moment of conception. Is this too much of a simplification or overstatement?  Is there a reason other than molecule-worship that the stress is so heavily on individual, particulate entities like 'genes', even though we know the genome is far from so clearly discretized in function?  Look past the caveats and denials offered by the Big Data empire to what they they are mainly doing, look at how they bury or pass over their caveats, and judge for yourself.

Effort is being made by people to study 'systems', such as molecular interaction networks.  This is a recognition of the problem posed by hyper-reductionism.  It is a step in a good direction, but even the systems approach largely seems beanbag in nature, by approaching complex traits as if they were a beanbag of internally interacting systems that can be enumerated and treated as units.  Network interactions are obviously relevant and involved in biological organisms, but it is not so clear, to us at least, that that path will be the best one to understand complex traits sufficiently well. At least, systems approaches force us to consider interactions among components as fundamental to life.

There is an important sociocultural problem associated with beanbag genetics, besides that we're still thinking in essentially the same way as 50 or even 100 years ago despite vastly more knowledge.  The problem goes beyond the promises to use the enumerative causal approach to develop 'personalized genomic medicine', which sounds so laudable.  Based on what we know today, those promises are highly exaggerated and misleading, even if they will work for clearly causal genomic 'beans' and even if every lesser finding will be trumpeted as a justification for the effort.  But one, if not the most immediate, consequence of that is that they eat up lots and lots of funds that could be spent in other ways, already known, that could yield vastly more improvements to public health (health is, after all, the promise being made).

However, beanbag thinking casts another, far more ominous shadow that also goes back to the early days of genetics, and that will be the subject of Part II of this series.

Thursday, December 11, 2014

Phenogenetic drift

Many people think that biological traits are due to specific genes and that variation in a trait is due to variation in that gene.  So it would follow that if a gene variant becomes more frequent, the trait variant it codes for becomes more frequent.  We get this idea from Mendel, but it explains only a small fraction of traits; most traits are due to many genes, and this fundamentally changes the specificity of the genotype phenotype connection.  To complicate this further, a house may be made of bricks, but the bricks can all be swapped for different bricks, and you've still got the same house.  This applies equally to genes and traits.

It may sound mystical to suggest that biology is not "molecular'' at its core the way physics and chemistry are.  How can it not be?  Life consists of molecules undergoing biochemical reactions that must follow physical laws.  We think we understand those laws, but if not, and someone were to discover that they needed revising, life would follow whatever revisions to those laws we came up with.  But no one seems to think that currently.  Which is not to imply that our understanding of how life works--how those molecular principles apply--is necessarily correct.

In particular, genomes are molecular entities, and the prevailing theory about life is that genomes and their variation drive life and its properties and variation.  Under a rather shallow interpretation of Darwinism, genomes bearing specific genotypes that succeed by proliferating themselves because of their functional success, and hence their specific sequence details, are represented with increasing frequency over time.

But suppose it is not a genome per se that is especially conserved by evolution. Suppose the trait, the ephemeral phenotype that is refreshed each generation by new embryos and persistent over time, is really what we need to understand.  A phenotype, an individual, is an 'emergent' result of genotypes that is, at present at least, only very imperfectly predictable from its genotype.  Since we know that similar phenotypes can be generated by a variety of genotypes, individual genes would then be "only'' the meandering spoor left by the process of evolution by phenotype.  Over time a very different set of genotypes might generate a favored phenotype, compared to the genotypes that did so at some time in the past.  This phenomenon is called phenogenetic drift.

Perhaps biology has hidden behind the Modern Synthesis, and the idea that all the action is in gene frequencies, for too long. Life is ultimately about phenotypes, the result of interactions by large numbers of genes and other molecular factors, and a better theoretical basis for understanding the dual evolution of phenotype and genotype--the tempo and mode of phenogenetic drift--is needed.

Biology struggled for much of this century to achieve respectability in the pantheon of science, and by mid-century found its "atoms'' in genes. If atoms are 'it', biology had it made!  But evolution is less specific and determinative than prevailing, if elegant, molecular and evolutionary theory suggest. Evolution works by phenotypes, whole organisms that reproduce or don't, not genotypes. A phenotype may a vague and ephemeral notion that is a poor excuse for an atom, but it may be the basic "unit" of biology nonetheless, and one we should strive to understand, on its own terms, with the many new methods that now exist.

This is yet another reason to be more circumspect about genomic determinism than many currently are.  It is why the determinism of this person's phenotype by his/her genotype may be unique.  It's why Darwin's necessary focus on the whole organism rather than just its 'gemmules' was insightful, despite his totally wrong theory of inheritance, and even in an era of rapid and fashionable excitement about the nature of molecules, and specifically genes, as the fundamental physical particles in Nature.

There are probabilistic aspects to gene action, but it may be more important to realize that all genomic functional units are susceptible to variation through mutation, and it is the interaction of countless such units that generates traits, many of whose actual values depend on the environmental factors that interact with DNA's and its products. It is far easier to think of genomes as consisting of beads on a string, than of them as 'mere' contributors to emergent interactions, but the latter is closer to the truth, as countless experiments have by now clearly shown.  But, if there is such a thing as a good theory of 'emergence', we don't yet have it, and clearly don't know how to apply it to the eons of ad hoc events, mutational and selectional, that have generated what is here today, not to mention the same sorts of 'slippage' that intervenes between genomes--a person's born DNA sequence, and the person's traits.

Phenogenetic drift, which includes the undeniable equivalence among many different genotypes in terms of the trait values with which they are associated, shows why when we use reductionism to dig below the level of the organism or its traits, to its genome, we pass through the very organizational phenomenon we are trying to understand.

Wednesday, December 10, 2014

The "disturbance business"

The Reith Lectures, podcast and broadcast by the BBC (Radio 4), are an annual series of talks initiated in 1948 by the BBC's first director general, John Reith.  Each year an influential thinker is invited to give four separate lectures on a given topic.  This year's lectures are by Atul Gawande, a surgeon, Harvard Medical School professor, and a prolific writer of New Yorker pieces and popular OpEds about medicine. He is very persuasive and readable, and thoughtful.

Gawande's lectures are still being aired, with one still to be given, but we wanted to comment on their spirit, as described by Dr Gawande in an introductory interview in the first podcast.  We think that MT readers who are interested in his important subject, the status of medicine today, would enjoy listening to them, too.

Gawande says that he wants to dig into the complexities of reality, and what we really face, pulling back the veil to see if we can't find a way through these complexities.  As he puts it, he wants to be in the "disturbance business."  He wants to disturb "all of us".  This view is consistent with what he has written in those of his books and essays with which we're familiar.

"Disturbance" is a good term, one that obviously appeals to us, because in a sense we, too, in our very modest way, have been in the disturbance business.  We recently retired from running a molecular genetics lab, though we are still doing at least some research and (hopefully) original writing.  More importantly, for several decades we've been part of the system we write about and we believe we at least have some relevant thoughts about its complexities, and about how the system works and how this relates to its objective, the search for truth itself and its application to human betterment.

To some, now and always, any critique of a current way of doing things is seen as just cheap talk by cranks taking pot-shots without consequences, threatening a comfortable system.  Or maybe these cranks are simply wrong about what is current in the given field.  This is not an unusual reaction, not even in science which flatteringly believes itself to be 'objective'.  But science these days is a large industry with many interests both intellectually legitimate and materially vested, and any field that becomes well-endowed and institutionalized will have aspects that deserve examination.  Response to this kind of critique is one significant way, other than huge abuse and collapse, that such systems are driven, to change--whether in science, politics, economics, religion, or even more abstract academic fields.

Done right, the "disturbance business" is a necessary part of any field, though of course one always has to wonder the extent to which anyone can unrestrainedly see, much less really or openly dig into the problems in his or her own area.  Critics might be outsiders or losers, venting sour grapes or settling one score or another.  This sort of ulterior motive is obviously part of the human story, as groups, viewpoints, and so on vie for resources or attention.  We grow into our careers to become dependent on ways we know and where we have influence and the like, and defensive reaction to challenge is only natural.

At the same time, there must be criticism, in the proper 'evaluative' sense of the term, and it can be done by people without a personal ax to grind.  Critiques rarely have much effect unless they identify legitimate issues that can then resonate with newcomers, supporters, or even established people in any field.  In the case of evolution and genetics, the science is as tribal and polarized as any field of human endeavor that controls a lot of resources can be expected to be.

Atul Gawande's critiques are often softballs, not very assertive or penetrating--he is, after all, a Harvard professor, but that is at least partly a matter of style.  It can be contrasted with, for example, some of the most strident atheism voiced by some evolutionary biologists.  They can be rather abusive in their assertion of the rectitude of their view.

But in genetics and evolution, if you read back into its not very long history, you can see that many of the same issues, perspectives (and vehemence) being debated now were already the subject of heated debate at the inception of these fields.  The fact is that we still don't have a clear handle on genetic variation, ways it evolves, its nature of causal determinism, the role of probability, its status as 'information' that can be used predictively, and so on.  And it is also a fact that very large amounts of money are being spent in pursuing goals with uncertain payoff, but with minimal, often quickly elided, caveats about the problems.

A common argument is that those in the disturbance business ought to just shut up about the problems and show everybody what to do instead, since criticism is cheap and real new answers not so.  It is argued that we're doing our best and that's all we can do.  After all, genetics isn't as costly as nuclear submarines or as dangerous an issue as, say, whether, how, or how legally we indulge in torturing prisoners during warfare.  Genetics and evolutionary biology are garnering large amounts of resources, but not as much as, say, is being spent on Mars or other space missions with essentially no public payoff beyond some jobs and exciting news stories.  So leave the genetics alone!

But that kind of dismissal of critiques is too self-serving.  With limited resources, the issue is not whether this or that investigator gets a big grant and maybe doesn't really find much but maintains a job and jobs for his/her lab staff--and, down the line, for people who make DNA sequencers, and DNA extraction kits, and all the chemicals and everything else required to run a genetics lab.  That's because there could be other ways of spending the same resources that could have, at least in the short-term, immensely more positive impact than what is being done instead.

We think that evolutionary biology and genetics pose questions every bit as interesting as the origin of the cosmos, and of importance to human edification.  But we also think not enough hard attention is being paid the many unanswered questions, compared to the attention being paid to the current technologies and careerism.  Just running expensive machines because we've got them, and that's what we know how to do isn't really answering the questions, or perhaps more accurately, diverts attention from even asking the right sorts of questions.

So, there is a need for the disturbance business--so long as it can keep from being its own sort of self-perpetuating system.