ProHits Next Generation: A flexible system for tracking, analyzing and reporting functional proteomics data

Overview

Human cells are built from tens of thousands of different proteins that perform most of the activities necessary for life. To gain insight into the cause of a disease and to develop new approaches to treat disease, it is necessary to understand how proteins interact with and modify each other.  Mass spectrometry is now being used to identify proteins and their modifications and interactions.

Drs. Anne-Claude Gingras, Mike Tyers and team aim to develop innovative ways to analyze the data generated by mass spectrometry and to increase the amount of information about protein interactions and modifications. Their research will improve the analysis of protein interactions and increase understanding of the effects of disease states and drug treatments.

MedSavant: An integrative framework for clinical and research analysis of human genomes

Overview

Physicians will soon be able to use patients’ whole genome sequence to search for information about the person’s risk of developing a disease, thereby improving clinical decision-making. This promises significant medical and economic benefits, including early detection and treatment of high-risk patients and eliminating multiple genetic tests.

Integrating whole genome sequencing into clinical practice requires software that will allow clinicians to identify relevant genetic variants in patients. Drs. Michael Brudno, Gary Bader and team aim to improve health care by developing broadly shared software that will prioritize the genetic variants in patients who may require medical attention.

Genomic Epidemiology Application Ontology (GenEpiO)

Overview

Infectious disease outbreaks have significant impacts on human health, agri-food production, animal health and the economy. Ineffective public health responses can result in outbreaks that spread diseases like the Zika virus and food-borne illnesses, with enormous impacts on health and high economic costs. DNA sequencing provides the complete “fingerprint” of a microbe, enabling an unprecedented tracing of how infectious diseases spread. When outbreaks become global, however (think SARS, or microbes resistant to antimicrobials) data needs to be shared across public health organizations securely and efficiently. Unfortunately, data is often held in institution-specific formats, making it difficult, time consuming and costly to do so.

Drs. William Hsiao (UBC), Andrew G. McArthur (McMaster University) and Fiona Brinkman (Simon Fraser University) will improve data integration and sharing of infectious disease and antimicrobial resistance information across public health agencies, with the Genomic Epidemiology Application Ontology (GenOpiO). The platform will enable public health workers to share outbreak-related information faster and to perform more powerful analyses, helping to reduce the negative health and economic impact of disease outbreaks.

Rapid prediction of antimicrobial resistance from metagenomic samples: Data, models, and methods

Overview

Antimicrobials (antibiotics) have been central to combating infectious disease for nearly a century. However, their effectiveness is slipping due to the increase in antimicrobial resistance (AMR). There is an increasingly urgent need to know more about AMR to better understand its consequences and monitor its presence in the environment, agri-foods industry, individual patients, and on a population level.  Being able to analyze the genomes of resistant microorganisms is essential, but slow and costly to do one at a time. Metagenomics allows genetic profiling of microbes as a community, but datasets are huge and contain much irrelevant data. Currently, there is no software designed to specifically predict AMR profiles directly from metagenomic data, which would enable more rapid AMR profiling and aid prioritization of candidate genes for further research.

Drs. Robert Beiko of Dalhousie University, Andrew G. McArthur of McMaster University, and Fiona Brinkman of Simon Fraser University are leading a project to develop new software and database tools that will provide a near-instantaneous picture of AMR organisms in a sample, aiding AMR research and responding to AMR threats impacting both agri-food production and public health.

Rapid, accessible genome assembly using long read sequencing

Overview

DNA sequencing technology has progressed from sequencing single reference genomes at great cost and time, to the current era of inexpensive, high-throughput short read sequencing. The emerging “third generation” of DNA sequencing technology offers the prospect of putting long read genome sequencing in the hands of more researchers and enabling new applications, through portable instruments that will decentralize sequencing technology.

Dr. Jared Simpson of the University of Toronto is developing robust and efficient genome assembly software that is easy to use, to match the capabilities of these emerging sequencing instruments. The software will target biologists and other end users of sequencing who don’t have substantial bioinformatics expertise.

ePlants pipeline and navigator for accessing and integrating multi-level ‘omics data for 15 agronomically important species for hypothesis generation

Overview

In the past five years alone, huge amounts of data have been generated for 15 plant species important for Canada, including poplar, maize, rice, barley, wheat, soybeans and tomatoes. Being able to efficiently use these data will be key to improving and managing these crops to feed, shelter and power a world of 9 billion people by the year 2050.

The ePlant Framework, developed under a previous Genome Canada grant, permits researchers to easily see where and when a gene is “active” and whether there are natural genetic variants that might allow it to do its “job” better; populated only with one species, it now needs data from more species. Lead researcher Dr. Nicholas Provart (University of Toronto) plans to develop an ePlant Pipeline to facilitate the ability to create any ePlant, based on genomic or exome sequence data. The ePlant Navigator will permit cross-cultivar and cross-species comparisons, supporting robust hypothesis generation. Easy access to these data sets will enable researchers to explore genetic diversity, gene expression, and other data for important genes towards crop improvement.

Kamphir: A versatile framework to fit models to phylogenetic tree shapes

Overview

Phylodynamics is a new and rapidly growing field that combines epidemiology and computational biology to combat infectious disease outbreaks. The field stems from the concept of phylogeny, in which a tree represents how different populations – of virus infections, for example – are related through a series of common ancestors. The genetic similarities among populations are used to reconstruct these ancestral relationships back in time. This is particularly important for viruses, which evolve so quickly that each infection becomes genetically unique within weeks or months of being transmitted from the previous host. Consequently, the virus phylogeny can be used to estimate how the infections spread through the host population. Phylodynamics has already had an enormous impact on our understanding of outbreaks including HIV, hepatitis C virus, and Ebolavirus. Further progress is stymied, however, by simple models that can’t accommodate large data sets.

Dr. Art F.Y. Poon of Western University, Ontario, is developing a completely new approach to phylodynamics that adapts a method from pattern recognition to enable computers to “see” the shared features of different tree shapes. This approach will have an unprecedented capacity for more realistic models and larger data sets, improving global public health initiatives for infectious disease management and eradication.

Dockstore: A platform for sharing cloud-agnostic tools with the research community

Overview

An unintended consequence of the development of genomics has been the proliferation of massive datasets, making analysis increasingly difficult. A further problem is the lack of standardization in how analysis tools are packaged, described and executed across computer environments. Drs. Vincent Ferretti and Lincoln Stein of the Ontario Institute for Cancer Research, in collaboration with Dr. Brian O’Connor of the University of California, Santa Cruz, have developed a web application called the Dockstore, which addresses the challenge of encapsulating and sharing bioinformatics tools so that they can be moved from environment to environment.

Now the researchers are adding key features to the Dockstore to continue to enhance and evolve the platform. They will also integrate bioinformatics tools and workflows from the Global Alliance for Genomics and Health (GA4GH) for redistribution to the larger research community and will work with collaborators to facilitate the registration of their high-quality tools into the Dockstore. Finally, the researchers will work with other projects to enable sharing of tools across genomic repositories. These activities will drive increased usage of the Dockstore, thereby increasing tool sharing among scientists in fields as diverse as agriculture, energy and human health.

Consolidated epigenetic landscape for congenital, developmental and childhood disorders

Overview

Epigenetics is the study of both genetic and external factors, such as environmental exposure or lifestyle choices by parents or grandparents, which affect gene expression. Epigenetic disruptions play a key role in disease. Finding epigenetic biomarkers, however, is complicated by the complexity of epigenetic signaling in cells or tissues, as well as the fact that many different genetic disorders, such as pediatric developmental disorders, can show similar clinical symptoms. Despite the wealth of data being generated by new technologies, there is a dearth of diagnostic tools that can consolidate epigenetic data collected by diverse groups using different experimental platforms. These tools are essential to relate molecular patterns to clinical presentation.

Drs. Michael Brudno and Rosanna Weksberg of Toronto’s Hospital for Sick Children are developing a novel web-based resource for analyzing epigenetic datasets together with complete clinical information, focusing on developmental disorders such as intellectual disability and autism. Their system will provide a rich context for exploring epigenetic dysregulation in a growing number of childhood epi-genetic diseases.

Enhanced and automated visualization of complex data

Overview

Modern genomics research generates massive amounts of data. But these data sets are too big and complex to be useful on their own. Researchers must first analyze and interpret biological data to better understand them and turn them into meaningful information. This information can then be used to help solve real-world problems, such as developing new tools or strategies to better diagnose and treat patients, increasing crop yields or monitoring the environment. Increasingly, the ability of the human end-user to interpret the data is the key factor limiting researchers from delivering these much-needed solutions more quickly.

Dr. Paul C. Boutros of the Ontario Institute for Cancer Research is leading a team developing ways of making “big data” results more easily understood by improving the way it is visualized and interpreted. The team will create interactive visualization tools that will integrate tightly with databases scientists already use routinely. The team will use crowdsourcing to capture the best visualization ideas from a broad community of scientists, graphic designers and citizen-scientists. The project will build on the human brain’s ability to interpret images, to make the conclusions of biological data more readily accessible and accelerate the rate of biological discovery and innovation.