Genetic Code
From SkepticWiki
Contents |
[edit] Introduction
The genetic code is the set of rules relating codons in messenger RNA to amino acids in proteins during the process of translation.
The reader not familiar with these terms should consult our article on DNA, especially the section on translation; for those readers who merely need a brief reminder, we summarize the process of translation in the next section.
[edit] Transcription and translation: a recap
Genetic information is transcribed from DNA to messenger RNA (mRNA), where it takes the form of a single chain of nucleotides, with each nucleotide having a side chain consisting of a base of either adenine (A), cytosine (C), guanine (G) or uracil (U). For the purposes of this article, you may simply think of messenger RNA as consisting of a long string of the letters A, C, G and U. RNA has two distinguishable ends, the 5' end and the 3' end, and the sequence of bases is written from the 5' end to the 3' end of the strand.
These strands of messenger RNA are then translated into polypeptides by particles called ribosomes. A ribosome moves along the strand of mRNA from the 5' end to the 3' end, looking at the bases of the nucleotides three at a time, each group of three bases being called a codon. It interprets each codon as an instruction to tack an amino acid onto the polypeptide that it's manufacturing, with the identity of the amino acid (of which there are twenty) being determined by the codon. So, for example, when the ribosome reads the codon CAG, it tacks the amino acid glutamine onto the end of the polypeptide it's making, and moves three bases further down towards the 3' end of the mRNA, where it may, for example, encounter the codon GUA, in which case it will tack the amino acid valine onto the end of the polypeptide it's making, and moves three bases further down towards the 3' end of the mRNA ... and so forth.
The genetic code, then, is the set of rules relating the codons to the amino acids, i.e. specifying that CAG codes for glutamine, GUA codes for valine, and so forth.
[edit] The standard genetic code
To the right we present the standard genetic code as it is normally written in tabular form.To find the translation of some codon of mRNA, written in the 5' → 3' direction (let us say, for example, AUG) we take the first base, in this case A, to determine which of the four major rows we should be looking at: so in this case, the A tells you that the amino acid lies in the row containing isoleucine, methionine, threonine, asparagine, lysine, serine, and arginine. The second base, in this case U, tells you which column you should be looking at, in this case the first one: so the amino acid is either isoleucine or methionine. Finally, the third base, G, tells you which row-within-a-row you should be looking at: AUG codes for methionine, whereas a final U, C, or A would have indicated isoleucine.
In this table we have chosen to show the relationship between the bases of mRNA and the amino acids; we could instead have shown a table relating the bases of the coding strand of DNA to the amino acids; this would look exactly the same, except that it would have T wherever the table for mRNA has U.
The colors in our table indicate the more fundamental chemical properties of the side chains of the amino acids. Those in pink are hydrophobic: they avoid water. The others are hydrophilic: they are attracted to water; of these, we have colored the amino acids with polar side chains blue, those with acidic side chains green, and those with basic side chains purple.
Three of the codons, UAA, UGA, and UAG, don't code for any amino acid, but instead function as stop codons: that is, when the ribosome reaches such a codon on the mRNA, it will take this as an instruction to finish translating. These codons are sometimes called "nonsense codons", but this is really a misnomer: they mean "stop".
The codon AUG, besides coding for the amino acid methionine, also doubles up as a start codon: until the ribosome reaches the first instance of AUG on the mRNA, it doesn't begin polypeptide synthesis.
Variations on this standard genetic code exist, as will be discussed below: for this reason we have used the term "standard genetic code", rather than "universal genetic code", a phrase you will often see used.
[edit] Redundancy
As there are only 20 amino acids to be coded for, plus a stop signal, and there are 64 codons, this means that several codons may code for the same thing, as you can see in the table of the genetic code given above. Indeed, only methionine and tryptophan have unique codons coding for them, while arginine and leucine have six each.
Looking at the table, you can see that the assignment of amino acids to codons is by no means random. Most noticeably, changes in the third base of each codon often have no effect, especially if the change of base is between A and G (the two purine bases of RNA) or between C and U (the two pyramidines).
You can see from the table above how the various chemical types of amino acid also cluster together. Consider, for example, the codon GUC, coding for valine. There are nine single nucleotide substitutions (mutations changing a single base) that might affect this codon, of which three result in a codon that still codes for valine, and a further five that will at least code for a hydrophobic amino acid. The result of such an arrangement of the genetic code is to reduce the potential of mutations to disrupt the functioning of proteins.
[edit] Variation and evolution of the genetic code
There are, as we have noted, variations in the genetic code. At first it is hard to see how such variations could evolve.
There is no difficulty in seeing how a mutation could occur that would change the genetic code. For the chemicals, such as transfer RNA and aminoacyl tRNA synthetase and so forth, that physically instantiate the genetic code by translating mRNA into proteins --- these chemicals are themselves described on the DNA of the organism, and a mutation affecting those stretches of DNA can and will change the genetic code.
What is, however, surprising, is that any organism so affected could survive. Consider: if the genetic code is changed so that, for example, CAC codes for arginine instead of histidine, then this will affect thousands of proteins: in every protein-coding section of the organism's DNA, wherever the codon CAC appears, there will, correspondingly, be a protein which has arginine where previously, ancestrally, there was histidine. It would only take one of these thousands of changes to protein chemistry to be lethal (or, indeed, the combined effect of several of them) and the organism would not be viable.
It would seem incredible, then, that such variations could thrive. And as so often in biology, we find that the Argument from Incredulity is completely, utterly wrong. For the fact is that many such viable variations in the genetic code have been observed arising in laboratory populations.
The best-studied among these variations are the so-called amber suppressor strains of the bacterium E. coli, in which the codon UAG, which is a stop codon in the wild type, codes for an amino acid instead (which particular amino acid varies from strain to strain). One clue as to how they survive this is that this change need not be all-or-nothing; some such variations have only a weak effect, and leave the organism translating UAG as a stop codon most of the time, and so produces the original ancestral form of the protein most of the time. So there can be intermediate stages between two genetic codes, where there is ambiguity between the two.
So such mutations can certainly arise and be viable: indeed, given the large populations of bacteria such as E. coli, this must literally happen all the time. Still, it would seem unlikely that such variations would be favored by natural selection: and indeed, with E. coli, we find that they are not: however often amber suppressor variants arise, the wild type of E. coli sticks with the standard genetic code and translates UAG as a stop codon.
For a new version of the genetic code to become the standard for a taxon, the variation needs to be beneficial, or at least neutral. There is at least one set of circumstances one can envisage, where such a change might be beneficial. Consider the way viruses operate: they cannot translate their own genes into proteins, but rather rely on hijacking the cellular machinery of their host cell to do it for them. The result is that a change to the genetic code of an organism will cause problems for viruses trying to infect it, since the genes of the virus would be adapted to translation using the old genetic code. Under such circumstances, if the increased viral immunity was of greater value to the organism than the probable deterioration in the functioning of its own proteins, then such a variation in the genetic code would be favored by natural selection. Once such a change has taken place, natural selection would then favor adaptation of the genome of the species to the new genetic code.
A list of the known variations in the genetic code can be found here. It is interesting to note that the huge majority of variations occur in the codes used by mitochondria and plastids, tiny symbiotic organisms which live in advanced cells such as those found in plants and animals, which have few genes of their own, and which might be supposed to be better able to resist changes to their genetic code.
Outside of these simple symbiotic organisms, almost all the changes involve the recruitment of a stop codon to code for an amino acid, as in the amber suppressor strains described above, or they involve some codon besides AUG also functioning as a start codon. However, in one related group of species of yeast (which is a relatively advanced organism, being a eukaryote) CUG codes for serine instead of leucine[1]; doubtless other examples remain as yet undiscovered.
[edit] The genetic code and common descent
The fact that there is a standard genetic code, with very few variations, that is used by all species, suggests an argument that life had only one common ancestor.
The genetic code, it is argued, is to some extent arbitrary. It is true that the redundancy of the code has convenient features, as we have noted above, but there would be many other way of selecting a code that also had these useful properties.
So, it is argued, if creatures now living are not descended from a common ancestor, then the same genetic code must have originated twice. Since the code is arbitrary, this could not be a result of convergent evolution, but would also have to involve a massive coincidence. The more probable explanation, then, is common descent from a single common ancestor.
The minor variations in the genetic code from species to species do not particularly invalidate this argument so long as it is possible for such variations to evolve (which they can, as we have seen in the previous section). For the massive similarities that remain after we have acknowledged the existence of these minor variations are still too great to be reasonably accounted for by coincidence alone.
We should note that this argument for a common ancestor is only relevant to people who are already convinced, on other grounds, that life evolved from (in Darwin's words) "a few forms or one", and wish to know whether it was one form or a few: it is an argument from and not for evolution.

