Currently, large amounts of data exist for childhood cancer. The biggest of these is the data generated by the Children's Oncology Group. In order for data to be used, it must be formatted and structured appropriately. This starts with a common data model. Incorporating something similar to the data model used by PCORnet and applying it to childhood cancer data (COG and others) would maximize its utility.
The initiative aims to build a connected data infrastructure to enable sharing of childhood cancer data from multiple sources. Many factors need to be considered before developing the data infrastructure, starting with a review of existing data repositories and the development of policies associated with data access, long-term patient follow-up, collecting and storing sensitive patient data, and returning research results to patients.
Answer one or more of the following questions in your response:
- What gaps are there in existing childhood cancer data repositories?
- What are the opportunities for linking existing and new data repositories?
There should be stringent standards in place, starting with data creation, to ensure that only high quality data (physical and digital) is submitted and stored. The Children's Brain Tumor Tissue Consortium has developed a set of data standards for each type of data to shepherd the collection and submission process so that researchers know they can count on a reliable resource, which speeds the research process. Data... more »
Swifty Foundation encourages CCDI to build on the successes of CBTTC rather than beginning at ground zero. In 2014, just three years after its launch, CBTTC became the first and largest clinically annotated biospecimens repository with real time querablilty. With the launch of CAVATICA, its genomics analytic platform, it became the first brain tumor consortia to solve cloud-based, global WGS analysis, and was recognized... more »
A lot of work are ongoing in integrating genomics data with the clinical data. However, if we want to strive for a more complete and comprehensive data infrastructure, it's important to target other -omics data. We need to think through what are the best ways to incorporate radiomics and imaging data into this data infrastructure. This is a very critical component of cancer studies and not very well implemented yet.
Clinical data on patient care at the NIH Clinical Center is collected in CRIS, but data needs to be collected separately for research purposes. Data can be exported from CRIS to BTRIS, but it isn't clear whether all types of data are accessible (whether this is a technical or resource problem). I suggest that the NIH Clinical Center be included as a site to develop Natural Language Processing of electronic medical records,... more »
An overarching theme of the CCDI Symposium was the "need for an infrastructure to enable federation among disparate pediatric data repositories" and "thinking big". Our big idea is to create a National Virtual Childhood Cancer Registry which includes every childhood cancer patient in the nation. This can be created now, employing emerging demonstratively successful technologies to connect existing and populated national... more »
Siloed data repositories are often not interoperable, therefore, it is important to think beyond a traditional data repository and consider other types of data infrastructure solutions that can accommodate diverse data. APIs and principles set forth by communities such as the Data Biosphere and GA4GH are a few examples of how researchers are utilizing technologies to link data repositories. NCI should further define requirements... more »
One of the most pressing needs to advance childhood cancer research is a comprehensive, modern infrastructure for storing, integrating and sharing all data types collected longitudinally from cancer patients, especially for the integration of patient-reported outcomes (PROs) with clinical, environmental and genomic data. While there are efforts to amass large multiomic datasets, such as the St. Jude Cloud and Treehouse... more »
Web-based resources are available offering information on pre-clinical, clinical, genomic and theoretical aspects of cancer, including not only the comprehensive cancer projects such as the International Cancer Consortium (ICGC) or the Cancer Genome Atlas (TCGA), but also less-known and more specialized projects on pediatric diseases such as the Pediatric Cancer Genome Project (PCGP). However, in case of data on childhood... more »