Research Project

Structural and Functional Annotation of the Human Genome for Disease Study

Lead Investigator(s): 
Robert Hegele
Funding: 
$21.5 M
Institution: 
University of Western Ontario
Start Date: 
October 1, 2006
End Date: 
June 30, 2011

Website: http://www.sfahgds-gc.ca/

Summary

With the Human Genome sequence now complete, a necessary next step is to characterize, or annotate, the genome by identifying the structures and functions of genes, as well as to define levels of variation in genome structure and gene products.  By enhancing the human genome sequence with this additional biological information, new insights may be gained into a wide range of human diseases, including breast cancer, diabetes, and heart disease.

The important task of thoroughly annotating the human genome is the goal of this groundbreaking project led by Dr. Robert Hegele, an endocrinologist and Scientific Director of the London Regional Genomics Centre at the Robarts Research Institute.  Collaborating with Drs. Stephen Scherer (SickKids), Ben Blencowe, Brenden Frey, and Tim Hughes (University of Toronto), Dr. Hegele aims to characterize the widespread and clinically relevant large-scale genomic variations (copy-number changes, deletions, duplications, insertions, and rearrangements), profile the range of gene product variants arising from alternative splicing events, and identify previously unknown genes and other functional elements.

The project team will then apply the map, with its rich trove of new biological information, to unravel the genetic basis of human disease.  The data from the project will be made freely-available on the internet in order to accelerate biomedical discovery, including the diagnosis and treatment of common diseases.

This project includes integrated GE3LS research on the meaning and understandings of terms used in genomics research. For more information, click here.

Significant Outcomes to Date

  • The Database of Genomic Variants has been developed to catalogue structural variation in the human genome. It currently has approximately 50,000 entries and is visited over 12,000 times per month.
  • The project has developed a novel tool named eFISH (electronic fluorescence in situ hybridization), a BLAST-based program that facilitates the choice of appropriate clones for FISH and CGH experiments, as well as interpretation of results in which genomic DNA probes are used in hybridization-based experiments.
  • The establishment of the most comprehensive and detailed map of variation in the human genome to date. For further information, see Conrad et al. Nature 2009 (full citation below).
  • In partnership with the J. Craig Venter Institute, the project has collaborated to characterize the first complete diploid human genome.  For further information, see Levy et al. PLoS Biol. 2007 (full citation below).
  • The project has developed new tools for splice detection: an algorithm to identify tissue-specific splicing patterns from high throughput data; an algorithm to identify novel and known cis elements and genomic features relevant for tissue-specific splice patterns; an algorithm to combine these features into a computationally defined regulatory code that can predict tissue-specific splice patterns directly from sequence; and, a computational technique called the Generative model for Multi-path Exon Splicing Analysis (GenMESA) to address the concern of falsely detecting splice junctions.
  • The project has identified approximately 10,000 new splice junction sequences, many of which represent novel tissue-specific alternative splicing events.  For further information, see Pan et al. Nat Genet 2008 (full citation below).
  • Through a genome-wide computational and expression profiling strategy, the project has identified a tissue- and vertebrate-restricted splicing factor, the neural-specific serine/arginine-related protein of 100 kDa (nSR100), which they have shown regulates an extensive network of brain-specific alternative exons enriched in genes that function in neural cell differentiation. For further information, see Calarco et al. Cell 2009 (full citation below).
  • A new human disease, its phenotypic features and its genetic basis have been discovered. The causative mutation for the novel human multisystemic neonatal syndrome, which was given the name ‘endocrine-cerebro-osteodysplasia’ (ECO), was identified through SNP array screens and high throughput resequencing in DNA samples from infants in an Old Order Amish pedigree from Ontario. The findings from this study also suggest that the gene affected by this mutation (the gene encoding intestinal cell kinase, ICK) plays a key role in the development of multiple organ systems. For further information, see Lahiry et al. Am J Hum Genet 2009 (full citation below).
  • The project has developed a ‘splicing code’ algorithm, which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing.  For further information, see Barash et al. Nature 2010 (full citation below).

Selected High-Impact Publications
Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ. 2010. Deciphering the Splicing Code. Nature. 465(7294):53-9.  

Luco RF, Pan Q, Tominaga K, Blencowe BJ, Pereira-Smith OM, Misteli T. 2010. Regulation of alternative splicing by  histone modifications. Science. 327(5968):996-1000.

Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J; The Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME. 2009. Origins and functional impact of copy number variation in the human genome. Nature. 464(7289):704-12.

Ali-Khan SE, Daar AS, Shuman C, Ray PN, Scherer SW. 2009. Whole genome scanning: resolving clinical diagnosis and management amidst complex data. Pediatr Res. 66(4):357-63.

Calarco JA, Superina S, O'Hanlon D, Gabut M, Raj B, Pan Q, Skalska U, Clarke L, Gelinas D, van der Kooy D, Zhen M, Ciruna B, Blencowe BJ. 2009. Regulation of vertebrate nervous system alternative splicing and development by an SR-related protein. Cell. 13:898-910.

Hegele, RA.  2009. Plasma lipoproteins: genetic influences and clinical implications. Nature Reviews Genetics 10: 109.

Lahiry P, Wang J, Robinson JF, Turowec JP, Litchfield DW, Lanktree MB, Gloor GB, Puffenberger EG, Strauss KA, Martens MB, Ramsay DA, Rupar CA, Siu V, Hegele RA. 2009. A multiplex human syndrome implicates a key role for intestinal cell kinase in development of central nervous, skeletal, and endocrine systems.  Am J Hum Genet. 84:134-147.

Cook Jr., EH, and Scherer, SW.  2008. Copy-number variations associated with neuropsychiatric conditions.  Nature  455: 919-923.

Pan, Q, et al. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.  Nature  Genetics  40:  1413.

Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC.  2007. The diploid genome sequence of an individual human. PLoS Biol. 5:e254.

Scherer S et al., 2007. Challenges and Standards in Integrating Surveys of Structural Variation,   Nature Genetics 39:S7-15.

Frey BJ and Dueck D. 2007. Clustering by Passing Messages Between Data Points., Science. 315: 972-976.  

Blencowe BJ.  2006. Alternative splicing: new insights from global analyses. Cell 126:37-47

Daar AS, Scherer SW, and Hegele RA. 2006. Implications of copy – number variation in the human genome: a time for questions. Nat Rev Genet 7:414.