Population Genetics
From SkepticWiki
Contents |
[edit] Introduction
Population genetics, as the name suggests, is the study of the genetics of populations, and of how we should expect the genetic makeup of populations to change in response to natural selection and genetic drift. Its methods include the construction of mathematical models or computer models designed to reflect the operation of Mendelian genetics; and the accumulation of quantitative data about the genetics of real populations.
In this article we shall outline the sort of methods required for the study of population genetics. For the results of applying these methods, we refer the reader to the articles linked to below.
If you are not familiar with the concepts and terminology of Mendelian genetics, we suggest that you first read our article on Mendelian Genetics; otherwise, you will find the rest of this article bafflingly incomprehensible.
[edit] Methods: Mathematics and computation
Rather than attempt to describe the methods of population genetics in the abstract, we shall show how they apply to a simple example.
Suppose we have two alleles of a gene, A and B, such that the relative fitness of the alleles AA, AB and BB is in the ratio 12 : 10 : 11.
This notion of relative fitness is a useful one, and we should explain it. When we say, for example, that the relative fitness of the genotypes AA and AB is in the ratio 12 : 10, what we mean is that the proportionate genetic contribution of an average AA zygote (newly fertilized cell) to the next generation, as compared to the genetic contribution of an average AB zygote, are in the ratio 12 : 10.
So, for example, if the average AB zygote grows up to produce 1.5 offspring (remember that this is an average!) then the average AA zygote will grow up to be the parent of 1.8 children, since the ratio 1.8 : 1.5 is the same as 12 : 10.
In this case, we have chosen the relative fitnesses of AA, AB and BB to be in the ratio 12 : 10 : 11. If we found such a case in nature, this would be rather surprising, since it models a case where the B allele is harmful, but where a double dose of the B allele is less harmful than a single copy of the allele. The unlikelihood of such a case does not prevent us from finding out what would happen should it ever arise.
One method of understanding the situation is to use our common sense to perform a qualitative analysis. When, in our example, the proportion of A alleles is small, an A allele is likely to find itself in an AB genotype, whereas a B allele is more likely to find itself in a superior BB genotype, so natural selection would favor B alleles. If, on the other hand, the proportion of A alleles is large, then a B allele is most likely to find itself part of an AB genotype, whereas an A allele is most likely to find itself in a superior AA genotype. So we should expect that when the proportion of A alleles in the gene pool is large, natural selection will tend to increase the proportion of A, whereas when the proportion of A is small, natural selection will tend to decrease it.
For some purposes, this level of insight is sufficient, but we may wish for a more quantitative result: for example, we might want to know just how "large" does the proportion of the A allele have to be in the gene pool before natural selection will favor it?
There are various ways that we can tackle such questions. One is to use a computer to simulate a population of these creatures mating and reproducing at random, with the odds of their survival and reproduction reflecting the relative fitnesses, and with reproduction reflecting Mendel's laws. In more sophisticated simulations, involving more than one gene, we might go further and simulate the existence of chromosomes, recombination, and linkage.
Such programs are fairly easy to write for anyone with a basic knowledge of computer programming. They have one advantage over some of the more rigorous mathematical methods in use, in that they incorporate genetic drift (the random element in changes in the gene pool.)
However, such programs also have disadvantages. It may not be practical, for considerations of time and computer memory, to simulate populations of a realistic size: no computer could accurately simulate the E. coli population of a single human gut. Also, such simulations usually need to be run again and again in order to build up a picture of the typical, average way in which the gene pool will change: in that sense, the fact that such models incorporate genetic drift is a weakness as well as a strength. Finally, using such a model may tell you what will happen to a population under certain circumstances, but in doing so it gives no insight into why the population should behave in such a way; to find that out it may be necessary to do the mathematics.
Here is how we may do so in our hypothetical case. Remember that the relative fitness of the alleles AA, AB and BB is in the ratio 12 : 10 : 11. Let the proportion of the A allele in the gene pool be a, and the proportion of the B gene be b, at the point where the population is about to reproduce. Assume that mating is random, rather than, for example, showing a tendency for AA types to seek out other AA types as mates. Then using Mendel's laws, we find that the AA, AB and BB zygotes will be in the ratio a2 : 2ab : b2.
Before they can reproduce, they must undergo selection as specified by the relative fitnesses. So by the time they reproduce, which is the point in the life cycle at which we started, the genotypes AA, AB and BB are in the proportions we find that the AA, AB and BB zygotes will be in the ratio 12a2 : 20ab : 11b2.
We can express these as proportions of the gene pool by dividing though by the sum of these figures. That is, the proportion of the gene pool with AA alleles will be 12a2/(12a2+20ab+11b2); the proportion with AB alleles will be 20ab/(12a2+20ab+11b2); and the proportion with BB alleles will be 11b2/(12a2+20ab+11b2).
We can now easily find the proportion of allele A in the gene pool, since this will just be the proportion of AA genotypes plus half the proportion of AB genotypes, i.e. (12a2+10ab)/(12a2+20ab+11b2).
As every allele is either A or B, it follows that we can write 1 - a for b in this expression. So we may write the equation:
- New value for a = (12a2+10a(1-a))/(12a2+20a(1-a)+11(1-a)2).
This equation allows us to take the value of a at the point of reproduction, and find the value that a will have exactly one generation later. Clearly we can iterate this process to find out the value of a in subsequent generations.
There are a couple of things we can do with this equation. One is to use the mathematics associated with difference equations to try and come up with some explicit function which takes as arguments a0 (the starting value of a) and n (the number of generations that have passed); and which returns an (the value of a after n generations have passed).
The other thing we can do with it is, as we have already suggested, to find out what would happen by starting off with a = a0 and iterating the equation given above. Getting a computer program to do this is trivial; and doing it with paper, a pencil, and a calculator, though laborious, is not actually difficult. The graph to the right shows the results of this procedure with various different starting values of a.These methods have one obvious disadvantage. They totally ignore the effects of genetic drift, behaving instead as though the statistical laws that govern genetics are deterministic. Another way of looking at this is to say that it corresponds to the situation where the population is infinitely large! For the Law of Large Numbers guarantees that the larger the population, the more closely we should expect the results obtained to be in line with Mendel's statistical laws. By the same token, we should expect the deterministic equations to be reasonably close to reality whenever the population is reasonably large.
However, we should add a caveat. Consider the very simple case in which a gene has two alleles, V and W, such that neither has an advantage over the other (so that the variation is said to be neutral); and suppose that the W allele makes up 2% of the genes present in the gene pool, with the other 98% being V.
What will happen? Well, according to the model with an infinite population, the proportion of W alleles will stay at 2% for ever, since there is no selective pressure to change this. In the real world, however, where there is genetic drift, it can be shown (as we show in our article on Genetic Drift) that the W allele has a 2% chance of becoming the only allele of the gene present in the population, driving the V allele extinct, and a 98% chance of going extinct itself, leaving the V allele to hold the field.
Note that the infinite population model gives us the mean average of what we should see in the real world (since 2% = 2% × 100% + 98% × 0%); but it represents an outcome which itself can never ever happen in reality, since in real life one or the other allele must inevitably go extinct in the end.
So before using an infinite population model, we need to consider whether we care about the effects of genetic drift, and how it will affect our model if we overlook them.
This brings us on to another method. The mathematics of probability is, after all, well understood. Why not set up a set of equations that take into account the probabilistic nature of evolution?
We shall use such reasoning in our analysis of neutral variation and genetic drift. When it comes to tackling more difficult questions, the formulae (which, by the way, bear a sort of family resemblance to the physical formulae describing the diffusion of gasses) become difficult to work with.
In general, even if we overlook the probabilistic element and go back to the infinite population model, in complicated cases involving several alleles of several genes, it is one thing to write down the equations describing how the population changes over time, and another thing to solve those equations. In such cases, we can fall back on the other two methods we have mentioned: simulating a population, or iterating difference equations.
So far, we have only mentioned reasoning from the relative fitness of alleles to the behavior of the gene pool. It is possible to reverse this reasoning. For example, suppose we have a gene with two alleles, A and B, and we know that the genotype AA has a selective advantage over BB. Suppose further that natural selection does not remove the B allele from the gene pool. Then we can deduce that AB must have a selective advantage over AA, otherwise it would be flushed out of the gene pool. Furthermore, if we know the proportion of B alleles in the gene pool, we can use this to calculate the relative fitness of AB with respect to AA and BB: the relevant mathematics will be given in the article on Heterozygote Advantage.
[edit] Methods: Natural History
Use of these mathematical techniques can give us theoretical insight even when we lack good data: it is possible, for example, to show what would happen in the case of heterozygote advantage without knowing whether or not such a thing exists in nature.
However, in order to get quantitative predictions out of our equations, we need to put accurate data in. For example, in pigeons there are two alleles of the gene that makes the protein tranferrin, called Tf A and Tf B. When pigeon eggs are exposed to microbial infections, 46% of the Tf ATf A eggs survive, as against 64% of Tf ATf B eggs and 52% of Tf BTf B eggs.
We might wish to know whether these alleles have any effect on the fitness of adults. We could reason as follows. Suppose that these alleles did not have any effect on adult fitness. Then we know the exact relative fitnesses of the genotypes Tf ATf A, Tf ATf B, and Tf BTf B, which must be in the ratio 46:64:52 --- the survival rates per hundred of the eggs. We could then use the equations developed in the article on heterozygote advantage to calculate what the frequencies of Tf A and Tf B should be in the pigeon population.
And then we could compare this figure with reality, by taking genetic samples from a statistically representative number of pigeons, and seeing if it matches our prediction. If it does, then this is consistent with the hypothesis that the effect of the variation in the gene is mainly on the eggs; but if, for example, we found that there was much more Tf A and much less Tf B in the gene pool than predicted, then we could deduce that in some way the Tf B allele imposes a selective disadvantage on pigeons once they've hatched --- and then go back into the field to find out in what way this allele is a handicap.
It is clear that this reasoning would be futile in the absence of accurate data, and that much of the work in population biology must involve the collection of such data. The exact methods used to accumulate data will of course depend on what it is we need to measure.
[edit] Articles on population genetics
The SkepticWiki contains a number of articles on subjects in population genetics.

