An Interactive Resource to Generate and Provide Integrated Knowledge of the Human Pancreas
Contact PI: Kyle Gaulton, PhD, University of California San Diego (U24 DK138512)
Jason Flannick, PhD, The Broad Institute, Inc.
Noel Burtt, The Broad Institute, Inc.
Anna Gloyn, PhD, Stanford University
Benjamin Voight, PhD, University of Pennsylvania
Start Date: February 1, 2024
Abstract
Type 1 Diabetes (T1D) is characterized by autoimmune destruction of insulin-producing beta cells in pancreatic islets where there is currently no prevention or cure. The processes driving T1D initiation and progression in the pancreas are poorly understood, and improved understanding of these processes in the pancreas is critical to gain new insight into disease mechanisms and identify novel biomarkers and therapeutic targets. A wealth of data has been generated in human pancreas donors in initiatives supported by the Human Islet Research Network (HIRN) which can be used to address open questions in T1D, yet these data are currently both under- utilized and in formats inaccessible to many researchers which prohibits insights. To address this gap, we have assembled a team of highly accomplished researchers to create a pancreatic knowledge base PanKbase leveraging our expertise in computational biology and data science (Gaulton, Flannick, Voight), type 1 diabetes (Rich, Atkinson, Anderson, Gauton, MacDonald), islet biology (Gloyn, MacDonald), immunology (Anderson, Atkinson), knowledge base engineering (Flannick, Gaulton), engagement and outreach (Burtt, Westley), and rigor and reproducibility (Grethe, Martone).
For this proposed project we will in Aim 1 develop a database that comprehensively aggregates and harmonizes data in HIRN repositories and other repositories containing human pancreas data based on our existing CMDGA platform. We will further derive high-quality summary resources from these harmonized data that are of value to the community. In Aim 2 we will implement an analytics library of tools that address impactful questions in T1D by performing statistical modeling and machine learning of data and resources, and that extrapolate knowledge learned from these data into large, independent datasets, as workflows in multiple formats including Github repositories, Jupyter notebooks, WDL pipelines, and pre- computed results. In Aim 3 we will create an open science platform accessible to any user that provides user- friendly interfaces to customizable workflows that address impactful questions, an analysis sandbox to run advanced workflows and pipelines, and APIs and code repositories for full customization, based on the HuGeAMP and Terra platforms. In Aim 4 we will form collaborations and working groups with investigators leading HIRN repositories to develop standards and pair experts with data scientists to integrate domain knowledge into resource creation and machine learning applications using our expertise coordinating consortia. In Aim 5 we will establish engagement and outreach programs for the broader research community to improve data, workflows, and user experience in PanKbase and identify new key questions in the field by applying our extensive expertise in developing outreach for AMP-CMD and the digital platform the (sugar)science. Together this proposal will provide a knowledge base of the pancreas which will enable researchers across all expertise levels to extract novel insight and relationships from data in HIRN and other pancreas donor repositories, which will expedite the development of biomarkers, therapies, and prevention strategies for T1D.