Cancer Genome Collaboratory

Ontario Institute for Cancer Research, Toronto, Ontario
What the facility does

Provides access to compute and storage cloud-based research resources for the extensive genomic holdings of the International Cancer Genome Consortium.

Areas of expertise

The Cancer Genome Collaboratory is an academic compute cloud built by the Ontario Institute for Cancer Research. It allows researchers to run complex analysis operations across large International Cancer Genome Consortium (ICGC) cancer genome data sets. It also hosts the world’s largest and most comprehensive cancer genome dataset, the Pan-Cancer Analysis of Whole Genomes (PCAWG).

The Collaboratory is also open to other research work that would benefit from a cloud computing environment.

The Collaboratory is home to the data holdings of the International Cancer Genome Consortium, a global collaboration involving more than 70 projects and 40 countries/jurisdictions created to sequence the genomes of 25,000 tumours and their matched normal tissues across 50 major cancer types. The Collaboratory is also home to the Pan-Cancer Analysis of Whole Genomes (PCAWG) data which uniformly analyzed more than 2,600 cancer whole genomes from the International Cancer Genome Consortium (ICGC). Collaboratory users have fast and easy access to these unique datasets.

Alternatively, Collaboratory users can upload their own datasets for analysis.

Using the Collaboratory’s computational facilities, researchers can run complex data mining and analysis operations across large datasets, such as the ICGC and PCAWG datasets. Using advanced metadata tagging, provenance tracking, and workflow management software, researchers can execute complex analytic pipelines, create reproducible traces of each computational step, and share methods and results. Instead of spending weeks to months downloading hundreds of terabytes of data from a central repository before computations can begin, researchers can upload their analytic software to the Collaboratory cloud, run it, and download the computed results in a secure fashion.

Research services

Self-service virtualized compute environment (OpenStack cloud), virtual machine specifications customized for whole genome analysis, software defined networking with self-service firewall rules, high performance internet connectivity, object storage accessible through S3 and Swift API, self-service disk space provisioning, web, API and CLI interfaces, self-service image repository of popular GNU/Linux distributions pre-loaded with bioinformatics tools, access to International Cancer Genome Consortium (ICGC) and Pan-Cancer Analysis of Whole Genomes (PCAWG) data through high bandwidth networking, registry of bioinformatics Docker containers.

Sectors of application
  • Healthcare and social services
  • Life sciences, pharmaceuticals and medical equipment
Specialized labEquipmentFunction
Collaboratory Compute CloudCompute cloud infrastructureScalable self-service compute, storage and networking infrastructure.
Collaboratory Dockstore WorkflowsDockstore platformPackaged workflows for structural variation calling, somatic calling, CNVs, indels and copy number and bam alignment.
  • Center for Data Intensive Science (University of Chicago)
  • Global Alliance for Genomics and Health
  • McGill University
  • The University of British Columbia
OICR’s Cancer Genome Collaboratory wins 2018 OpenStack Superuser award for contributions to the cancer research community.…
The Ontario Institute for Cancer Research (OICR) wins the Superuser Award at the OpenStack Summit Vancouver.
OpenStack Summit 2018.
Ruckus Case Study - Transforming cancer research with a new network model.