Research Project

Software Tools to Simplify Gene Function Prediction

Lead Investigator(s): 
Quaid Morris and Gary Bader
Funding: 
$1.9 M
Institution: 
University of Toronto
Start Date: 
April 1, 2008
End Date: 
March 31, 2010

Summary

In the late 1990s, Google brought the power of the web search to the average internet user by introducing a simple, intuitive interface backed by powerful analysis software and a massive data warehouse. By extending existing software prototypes to build the ‘Google of Biology’ - a software system that lets people use all existing genomics and proteomics data to answer specific biological questions about gene function - this current research project will bring the genomics revolution to the average biologist.

Until now, this revolution has been both a blessing and a curse. While it is now possible to collect data about every gene in the genome, the public repositories that store these data have grown exponentially in size and complexity, making the data difficult for biologists to use without expertise in statistics and computation.  Yet computational analyses have revealed that these data may offer valuable insights into how cells work and how biological systems fail in disease.  Since few biologists have such expertise, only effective navigating tools could prevent repetition of earlier efforts, which wastes some of the massive international investment in genomics and proteomics data.

The project’s research team is building computer software to accelerate genomic and proteomic research. This advance is essential:  the average biologist cannot participate in and rarely benefits from advanced genomics technology, and collecting and analysing genome-scale datasets is expensive and time-consuming. Surprising, both problems have similar solutions. For the typical biologist, the team will build a user-friendly website through which he or she can predict gene function using all available data from genomics and proteomics.  For the functional genomics researcher, it will construct a decision-support system that enables evaluation of various data-collection strategies by comparing the quality of data and the information content of preliminary data with published data in gene function, genomics, and proteomics.  Both these tools rely on the same underlying software architecture, which the project team will make freely available over the web. To ensure that these new software tools prove effective among diverse biological users, the team will develop and disseminate extensive user-training resources.
The team will develop a series of web-crawling software agents to automatically collect and maintain a data warehouse of genomics and proteomics data that will support its website and decision-support system.  Both tools will also harness an advanced network-visualization interface that will help users browse and understand the results of their queries.

Through the ‘Google of Biology,’ this project is increasing the value of genomics and proteomics data by making them accessible to all biological researchers.  Although it will need two years to write and test the software, its automatically updating website and decision-support system will require minimal upkeep, so the benefits of the work will be lasting.

Significant Outcomes to Date

  • The GeneMANIA website developed by this project integrates several powerful bioinformatics tools for both scientific and lay audiences.