Cancer Genome Collaboratory

University of Toronto, Toronto, Ontario
What the facility does

Provides access to compute and storage cloud-based research resources for the extensive genomic holdings of the International Cancer Genome Consortium

Area(s) of Expertise

The Cancer Genome Collaboratory is an academic research cloud the Ontario Institute for Cancer Research is building. The compute cloud-based resource enables research on the world’s largest and most comprehensive cancer genome dataset.

Using the Collaboratory’s facilities, researchers can run complex data mining and analysis operations across a large repository of cancer genome sequences and their associated donor clinical information. Using advanced metadata tagging, provenance tracking, and workflow management software, researchers can execute complex analytic pipelines, create reproducible traces of each computational step, and share methods and results. Instead of spending weeks to months downloading hundreds of terabytes of data from a central repository before computations can begin, researchers can upload their analytic software to the Collaboratory cloud, run it, and download the computed results in a secure fashion.

The Collaboratory is home to the data holdings of the International Cancer Genome Consortium, a global collaboration involving more than 70 projects and 40 countries/jurisdictions created to sequence the genomes of 25,000 tumours and their matched normal tissues across 50 major cancer types. Collaboratory users have fast and easy access to this unique dataset.

Research Services

Self-service virtualized compute environment (OpenStack cloud),virtual machine specifications customized for whole genome analysis, software defined networking with self-service firewall rules, High Performance Internet connectivity, object storage accessible through S3 and Swift API, self-service disk space provisioning, web, API and CLI interfaces, self-service image repository of popular GNU/Linux distributions pre-loaded with bioinformatics tools, access to ICGC data through high bandwidth networking, registry of bioinformatics Docker containers (

Sectors of Application
  • Healthcare and social services
  • Life sciences, pharmaceuticals and medical equipment

Name of specialized lab

Name of equipment in use

Description of function

Collaboratory Compute Cloud

Compute cloud infrastructure

Scalable self-service compute, storage and networking infrastructure

Access to Compute Canada's HPC resources.

Collaboratory Dockstore Workflows


Packaged workflows for structural variation calling, somatic calling, CNVs, indels and copy number and bam alignment


  • Center for Data Intensive Science (University of Chicago)
  • Global Alliance for Genomics and Health