ENCODE Excitement

Friday, 07 September 2012 09:54

If a week is a long time in politics, five years is an eternity in molecular biology. On 6th July 2007, we posted a news blog on the TiS website that was entitled “Regulatory DNA: Junk no Longer”.

The article introduced our readers to the ENCODE (Encyclopedia Of DNA Elements) project which was established in order to estimate how much of the genome is actually functional. For many years, we have understood about protein-coding genes. These are the regions within our DNA that hold information that is translated into protein structure. This complex process is described in detail in chapter three of the TiS publication “Origins: Examining the Evidence”. This book is available here. However, protein-coding genes only comprise a maximum of 2% of total genomic DNA. The question for molecular biologists was to determine whether the rest was junk without any function whatsoever.

Because of the complexity of this project, the initial research of the ENCODE project concentrated on just 1% of the total DNA. The findings were published in the journal Nature in June 2007. This article is still freely available here. As we described in our blog at the time, there were surprising findings. Firstly, it was discovered that the vast majority of the DNA in this 1% was transcribed into RNA. This seemed to indicate that even non- protein coding DNA was functional. The second surprising finding was the lack of conservation between species when the human non-protein-coding regions were compared with 22 other mammals. This data was published in Genome Research. The paper is freely available here. We concluded this blog as follows.A picture is gradually emerging of levels of encrypted genomic information far more complex than that originally considered. Maybe the term “junk DNA” is now nothing more than junk itself. We will keep you posted.

Yes, five years is an eternity in molecular biology but this week 30 different academic papers have been published in the journals NatureGenome Research and Genome Biology. These papers more than justify the predictions made in 2007. The latest estimate of functionality is in the region of 80% of the genome. This may still be an underestimate – only time will tell. Many of these papers are extremely technical but there are some excellent overview articles. One entitled "Genomics: ENCODE explained" can be found here. In particular, we encourage you to look at the Nature video entitled “Voices of Encode”. In this video, the ENCODE's lead coordinator, Ewan Birney, and Nature editor Magdalena Skipper talk about what they have learnt about the human genome. You can view this video here.

The implications of ENCODE are profound. We now know that the genome contains encrypted information of such complexity that is virtually impossible to comprehend. The vast majority of this information is involved in the highly ordered processing and expression of those few percent of protein-coding genes which is both species-specific and cell-specific within each species. As cells multiply and differentiate into different tissues, genes are rapidly turned off and turned on in a process that is obviously under very tight control. We still know very little of these highly complex mechanisms but what we do know is that many diseases are caused by the disruption in these highly ordered processes. It is this finding that excites many of the researchers. As Barroso has stated in the article entitled "Genomics: ENCODE explained":The ENCODE project provides a detailed map of additional functional non-coding units in the human genome, including some that have cell-type-specific activity. In fact, the catalogue contains many more functional non-coding regions than genes. These data show that results are typically enriched for variants that lie within such non-coding functional units, sometimes in a cell-type-specific manner that is consistent with certain traits, suggesting that many of these regions could be causally linked to disease.Even though 5 years is a long time in molecular biology, we predict that it may be many decades before much of this information becomes accessible. However, we may be increasingly confident that the superficial paradigm of Neo-Darwinism, with its total dependence on random mutation and natural selection, is totally inadequate to explain the nature and diversity of Life.