A supercomputer designed to root out genes that have been conserved over millions of years of evolution has discovered 300 new human genes.
The Human Genome Project may have sequenced the billions of bases that make up our genetic code but the 20,000 protein-producing genes identified remains an incomplete list. Adam Siepel at Cornell University and his colleagues, there and at other institutions, set out to find the missing genes by looking at genetic code conserved through evolution.
The discovery of 300 new genes is significant to our understanding of the human body and may reveal new targets for drug research, even if it doesn't drastically change the overall total.
However, the idea of using evolution to find missing genes is novel and the researchers claim the results show there could still be a large number of genes that are missed out using current biological methods.
These methods are very effective at finding genes that are widely expressed but may miss those that are found only in certain tissues or at early stages of embryonic development, according to Siepel, who is an assistant professor of biological statistics and computational biology at Cornell.
"What's exciting is using evolution to identify these genes," he said. "Evolution has been doing this experiment for millions of years. The computer is our microscope to observe the results."
The team began with publicly available sequence alignments where stretches of genetic code are very similar across two or more species. Using large computer clusters, including an 850-node cluster the researchers ran three different algorithms to compare these alignments between human, mouse, rat and chicken in various combinations.
The computers were then used to predict how a gene sequence might subtly change during evolution, without preventing the protein product from doing its job - the protein coding part of the genome changes in a different manner to the non-coding regions. The algorithms then search for matches to the predicted sequences. Previously discovered genes predicted by the model are then eliminated and the remainder are then tested in the laboratory.
The researchers managed to work out the function of some of the protein products of the new genes through comparison with databases of known proteins, revealing roles in motor activity, cell adhesion, connective tissue and central nervous system development.
The study has been published in the online version of the journal Genome Research .