Data Analysis Tools

Gathering childhood cancer cohorts into a single database and analysis environment with user-friendly tools

Childhood Cancer Data Initiative Ideas

To accelerate childhood cancer research, scientists need tools to access and analyze genomic data that are easy for bench scientists to use and powerful enough to allow bioinformaticians to test ideas. Gathering childhood cancer cohorts into a single database and analysis environment with user-friendly tools will facilitate this. This system needs to have secure access controls and user roles. This childhood cancer genomic data portal needs scalable storage and computing resources, to allow both routine and sophisticated analysis, and be able to grow with the user community. There are several commonly used genomic data types, and the database should support this multiomic data, and allow analyses that combine data types. This multiomic analysis is proving a powerful method for understanding cancer.

 

The genomic data needs to be integrated with relevant genomic reference databases with information on phenotypes, genes, and variants. Databases with genetic disease information, population genetic data, all the other data biologists need to refine and understand the results of analyses. The reference data should ideally be separate from the genomic data to facilitate regular updates, and system should allow researchers to dynamically link these reference sources to the genomic data. Also, the system should include a genome browser as part of a suite of visualization tools.

How will your idea make a difference

Childhood cancer cohorts should have secondary analysis done with a common set of pipelines to make the datasets comparable and allow researchers to compare results between them and combine them for additional statistical power, and then hosted in a scalable cloud-based genomic database, for example the GORdb, that gives researchers responsive, real-time access to the information for exploration, analysis, and collaboration from anywhere in the world. The GORdb system has the most commonly used analysis tools built in and can be extended with third party analysis programs. This system should also provide APIs for advanced users to access the database for advanced analysis and to build interacting tools, multiplying the utility of the childhood cancer cohorts. This genomic analysis environment should also include rich clinical and sample information to allow users both to build and refine sample sets for analysis, and also incorporate this non-genomic data in analyses.
Notes:
-AWS cloud-based hosting of the genomic and clinical data.

-Make the data available through a GORdb, a scalable, cloud-based genomic database to allow for real time analysis by NCI researchers across the globe.

-Easy for biologists to use.

-Adaptive AWS compute resources scale to the number of users and analysis load.

-Run the datasets through a standard set of pipelines and integrate into a single GORdb to allow comparisons between different cohorts and cancer types.

-Collect multiomic data for these datasets into unified GORdb instance to facilitate rich data analysis of these childhood cancer cohorts. Multiomic data allows for connections between gene expression, methylation state, non-coding RNAs, and genomic alterations to be identified.

-Make clinical, sample, and longitudinal data available to researchers for sample selection, defining sub-cohorts, and integration of clinical and genomic data in analyses. WCNC has a sample.

Voting

0 votes
Ideate
Idea No. 983

Attachments

- Show all