The Changing Face of Pseudogenes

Pseudogenes are DNA sequences that resemble protein-coding genes but they are not transcribed to a messenger RNA (mRNA) in a way that could then be translated into some functional protein. Many have suggested that pseudogenes are simply molecular fossils that illustrate and provide evidence for evolutionary history. Implicit in this argument is that pseudogenes are genetic relics that have lost their original protein-coding function which had been possessed by some ancestral creature. In support of this argument, evolutionary scientists point to the fact that pseudogenes are scattered throughout the genomes of all higher species (animals and plants) and, in particular, there are many similar pseudogenes found in all primates.

For example, identical beta-globin pseudogenes have been found in both humans and chimpanzees and this fact has been used as an argument by Professor Kenneth Miller in his book "Only a Theory" for the common ancestry of the two species [1]. In his own words: Hemoglobin is the oxygen-carrying protein that makes blood red. A single molecule of hemoglobin consists of two copies of a molecule called alpha-globin and two of another called beta-globin. The genes for beta-globin, of which there are five functional copies, are found on human chromosome 16. Right in the middle of this hardworking group of genes, however, is a broken one, which geneticists call a pseudogene. Its DNA base sequence is nearly identical to that of its neighbors, so it's easy to recognize it as beta-globin, but it contains a series of errors in its sequence that keeps it from working. One of them prevents the gene from ever being copied into RNA (an essential step for a gene to work), one would prevent any RNA that did get made from directing the synthesis of a protein, and the remaining four would completely disrupt any protein that somehow managed to get produced anyway … Our genome was originally designed with six working copies of the beta-globin genes, and therefore the loss of one of them from such molecular errors is no big deal. Therefore all humans alive today are descended from individuals in which those mistakes first cropped up. Fair enough. But guess what: We're not the only organisms with a set of beta-globin genes, and we’re not the only ones with a pseudogene right in the middle of the set … Gorillas and chimpanzees have them too and they are arranged in exactly the same way … [with] exactly the same set of molecular errors.” [Only a Theory pages 101-102].

Biologists have identified two distinct types of pseudogene which are often described as "processed" and "unprocessed". As a general rule, processed pseudogenes are located on different chromosomes from their corresponding functional protein-coding gene. Most biologists believe that they were created by the retro-transposition of the mRNA transcripts from the functional parent gene. The evidence for this is the fact that processed pseudogenes lack introns. Introns are sequences that are scattered throughout the DNA of a protein-coding gene and which are transcribed into the original mRNA. Introns (unlike exons), however, are edited out of mRNA by specific enzymes before protein translation begins. The function of introns remains unclear although there is increasing evidence that they contain vital information and are involved in parent gene regulation. Remarkably but very appropriately, this has been shown to be the case for the beta-globin gene [2]. 

Processed pseudogenes also lack regulatory sequences that control the expression of protein-coding genes and these include promoters and specific sequences which bind to enhancers and inhibitors. These regulator sequences are usually found “upstream” of the protein-coding gene (before the start sequence). Processed pseudogenes also have poly-adenine tails which are characteristic of the terminal end of a messenger RNA. In addition, the pseudogenes are usually flanked by repeat sequences of DNA which is characteristic of mobile genomic elements.

All of this evidence is very suggestive of the processed pseudogene being derived from an mRNA which has been re-located and reverse transcribed back into DNA. This mechanism is somewhat similar to the incorporation of a viral RNA into the host genome at specific sites also characterised by regions of repetitive DNA. The retrovirus (e.g. HIV) may possess its own reverse transcriptase enzyme to override the host’s own genetic machinery.

Processed pseudogenes may be complete or incomplete copies (i.e. fragments). Unprocessed pseudogenes, however, are usually found in close proximity to their corresponding protein-coding gene usually on the same chromosome. As a general rule, and unlike processed pseudogenes, they possess introns and other associated upstream regulatory sequences. Nevertheless, it is believed that the expression of these “genes” is prevented by mutations, deletions and/or insertions of “correct” nucleotides. These genetic changes may lead to premature termination or may introduce “frameshifts” that render the message apparently meaningless.

It is suggested that unprocessed pseudogenes might arise by gene duplication. Inevitably, the duplicated gene would be in close proximity to the parent but would then be free to accumulate random mutations without actually harming the organism, as it would still possess the original functional copy. For example, the beta-globin pseudogene mentioned by Miller could be an example of an unprocessed pseudogene as it has been suggested that the pseudogene was initially produced by the duplication of the gamma-A-globin gene because of the high degree of homology (sequence similarity) between the two genetic sequences [3]. In primates, the beta-globin pseudogene has no start codon (AUG) as well as several stop codons. Thus there can be no mRNA transcribed and no protein made. This scenario, however, may not be the complete picture. For example, the goat embryo apparently retains a beta-globin pseudogene which has been shown to functional in-vitro [4].

Are some pseudogenes functional?

Conservation of similar genetic sequences between species may indicate that pseudogenes (or any other non-protein coding sequence) might possess important biological function even though we might not know what that function might be. For example, Miller is very keen to point out the similarity between the beta-globin pseudogene in primates when he says: 

Gorillas and chimpanzees have them too and they are arranged in exactly the same way … [with] exactly the same set of molecular errors. [Only a Theory, pages 101-102 with emphasis added].This is a bit like “shooting yourself in the foot”! The very fact that the beta-globin pseudogene appears to be conserved in humans, chimpanzees and gorillas speaks eloquently of the fact that this DNA has some important biological function. Genetic sequences are conserved and maintained when any mutation would render them non-functional (or less functional) and when any loss of activity is damaging the organism’s prospects of survival. Such sequences are said to be under purifying (or stabilising) selection which means that deleterious mutations are removed from the gene pool restricting genetic diversity. It is probably the most common mechanism of action for natural selection and leads to the maintenance of genetic integrity. It is certainly not the driving force behind evolutionary change. According to the recent review by Sasidharan and Gerstein: Although pseudogenes have generally been considered as evolutionary 'dead-ends', a large proportion of these sequences seem to be under some form of purifying selection - whereby natural selection eliminates deleterious mutations from the population - and genetic elements under selection have some use [5].

In the case of the beta-globin pseudogene, Wanapirak et al. have reported amazing conservation in the fine structure of the DNA with identical super-helical twists in the human, mouse, bovine, rabbit and chicken genomes [6] . It needs to be remembered that maintenance of the genetic integrity of these structures is biochemically costly. It takes energy to duplicate DNA. The replicating machinery in the cell has built-in proof reading and excising enzymes that constantly check for mutation and damage. Numerous repair mechanisms have been identified to correct genetic damage and to excise incorrect sequences [7]. Thus, genetic conservation is one clear indication that pseudogenes may have important roles but what other evidence is there? 

Balakirev and Ayala have published two recent reviews (although one is in Russian) on the potential functions of pseudogenes [8, 9]. In these articles, they describe examples of pseudogenes that are involved in gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. In their own words: Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over non-synonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles.

Scientific publications describing functional pseudogenes are now appearing at very regular intervals. For example, in 2007, Lin et al. described the activity of one embryonic stem (ES) cell gene that appears to be a pseudogene in the adult mouse [10]. They reported that ES cell-specific expression of Oct4 maintains the pluripotency and versatility of stem cell until they begin to differentiate into other tissues. They concluded that Oct4 may be a functional pseudogene with a unique and specific role in ES cells. It is important to note that all cells will carry this particular gene in their DNA but the gene will be permanently switched off (and not expressed) in anything other than pluripotent ES cells. Maybe the once functional gene in ES cells has become a pseudogene in differentiated tissue. 

Another piece of evidence that suggests that pseudogenes might be functional at specific times in life is from the work of Choi et al. This research group extracted RNA from the ovarian tissues of 23 pre-menopausal patients and measured cyclin D2 and pseudogene cyclin D2 mRNA production in-vitro. They showed that the expression of mRNA from the gene and its pseudogene correlated with the age of the subject. In particular, they concluded that the expression of the pseudogene in the human ovary increases with age and suggested that this expression might be used as a novel marker for decreased ovarian function associated with the aging process [11].

One of the great surprises of the large-scale study of 1% of the human genome (ENCODE) was the remarkable and unexpected finding that vast regions of non-coding DNA (formerly known as junk) were transcribed into RNA [12], This included a significant number of pseudogenes [13].

One function suggested for pseudogene-derived RNA is parent gene regulation by interference. For example, Oliver Tam and his colleagues have very recently demonstrated that a subset of pseudogenes could generate endogenous small interfering RNAs (endo-siRNAs) in mouse oocytes (eggs) [14]. These endo-siRNAs are often processed from double-stranded RNAs that could be formed by the hybridisation of the usual mRNA transcript from a protein-coding gene to RNA transcribed from the complimentary anti-sense DNA sequence of the pseudogene Another role of regulatory RNA is to targets mRNA for degradation and also to control of movement of mobile genetic elements which are commonly known as transposons [15]. No doubt there will be many more published examples of pseudogene functionality in the days to come.


The non-protein coding genome was once described by the now redundant term “Junk DNA”. Nevertheless, it is becoming increasingly apparent that non-protein coding DNA including the pseudogenes may perform important biological roles. Thus, it has been somewhat premature to suggest that pseudogenes are simply genetic fossils. This is not to say, however, that there will never be an example of a pseudogene that is a defunct copy of protein-coding gene which has lost its activity due to mutational damage. Eventually, it may be necessary to redefine the term “pseudogene” to distinguish between those genes that are truly broken from those genomic elements that possess important roles in gene regulation.


1. Miller, K.R., Only a Theory: Evolution and the Battle for America's Soul. 2008, New York: Viking, 244 pages. 

2. Zhang, J., et al., Intron function in the nonsense-mediated decay of beta-globin mRNA: indications that pre-mRNA splicing in the nucleus can influence mRNA translation in the cytoplasm. RNA, 1998 4:801-15. Full paper here.

3. Harris, S., et al., The primate psi beta 1 gene. An ancient beta-globin pseudogene. J Mol Biol 1984 180:785-801. Abstract here.

4. Shapiro, S.G. and J.B. Lingrel, Identification of a recently evolved goat embryonic beta-globin pseudogene which retains transcriptional activity in vitro. Mol Cell Biol, 1984. 4:2120-7. Full paper here.

5. Sasidharan, R. and M. Gerstein, Genomics: protein fossils live on as RNA. Nature 2008 453:729-31. Full article here.

6. Wanapirak, C., et al., Conservation of DNA bend sites with identical superhelical twists among the human, mouse, bovine, rabbit and chicken beta-globin genes.DNA Res 2000 7:253-9. Full paper here.

7. Altieri, F., et al., DNA damage and repair: from molecular mechanisms to health implications. Antioxid Redox Signal 2008 10:891-937. Abstract here.

8. Balakirev, E.S. and F.J. Ayala, Pseudogenes: are they "junk" or functional DNA? Annu Rev Genet 2003 37:123-51. Abstract here.

9. Balakirev, E.S. and F.J. Ayala, [Pseudogenes: structure conservation, expression, and functions]. Zh Obshch Biol 2004 65:306-21. English abstract here.

10. Lin, H., et al., Stem cell regulatory function mediated by expression of a novel mouse Oct4 pseudogene. Biochem Biophys Res Commun 2007 355:111-6. Abstract here.

11. Choi, D., et al., The expression of pseudogene cyclin D2 mRNA in the human ovary may be a novel marker for decreased ovarian function associated with the aging process. J Assist Reprod Genet 2001 18:110-3. Abstract here.

12. Birney, E., et al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007 447:799-816. Full paper here.

13. Zheng, D., et al., Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 2007 17:839-51. Full paper here.

14. Tam, O.H., et al., Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 2008 453: 534-8. Abstract here.

15. Durand-Dubief, M., et al., The Argonaute protein TbAGO1 contributes to large and mini-chromosome segregation and is required for control of RIME retroposons and RHS pseudogene-associated transcripts. Mol Biochem Parasitol 2007 156:144-53. Abstract here.