Eric Ravussin, Ph.D. | Louisiana State University | Improving FAIR-ness and TRUST-worthiness of the Pennington/Louisiana NORC Biorepository In 2018 the Pennington/Louisiana Nutrition Obesity Research Center (NORC; P30 DK072476-16) established a repository of human subjects’ data and biospecimens of nutrition and obesity research. Currently, the repository includes data and biospecimens from 213 studies and 13,787 unique participants (68% women and 38% with obesity) funded by the National Institutes of Health, Department of Defense, United States Department of Agriculture, American Heart Association, American Diabetes Association and other government and non-profit organizations. In September 2020, an online portal (https://my.pbrc.edu/NORC/NORCRepository/Landing) was opened to allow people to independently search the cadre of available data. As we transition to increase usage, it is imperative that we align with the FAIR and TRUST principles and to ensure we can appropriately track usage, utility, and impact. In response to NOT-OD-21-089, we have developed a comprehensive but conservative one-year project to achieve these goals. The overarching objective of this administrative supplement is to improve upon the “FAIR”-ness and “TRUST”-worthiness of the Pennington/Louisiana NORC Biorepository and its online portal. In aim 1, we will improve “FAIR”-ness by adding existing data and increasing metadata and establishing metrics for tracking and usage. In aim 2, we will improve “TRUST”-worthiness by promoting and demonstrating the methods used for data collection. Finally, aim 3, will explore the possibility for certification. This unique repository provides unique data on nutrition and obesity which seeks to benefit researchers across the country for years to come. | https://my.pbrc.edu/NORC/NORCRepository/Landing |
Chris Rorden, Ph.D. | University of South Carolina | Public Sharing of the Aphasia Recovery Cohort The Center for the Study of Aphasia Recovery (C-STAR, P50-DC014664) explores recovery from language impairments following stroke. The center acquires a broad range of magnetic resonance imaging (MRI) modalities as well behavioral measures from stroke patients experiencing language impairments. Like all modern large scale NIH grants, C-STAR has a resource sharing plan that guides dissemination of curated data (following an embargo). However, in addition to this data, the C-STAR team has been acquiring images from people with aphasia using our Siemens 3T MRI scanner since 2006. This trove of data, which we refer to as the ‘Aphasia Recovery Cohort’, or ARC, is the product of both internally supported and NIH funded awards that did not require resource sharing plans and includes data from 250 stroke survivors scanned during 5776 unique sessions. In addition to the imaging sessions, ARC patients participated in numerous treatment and assessment sessions providing a rich range of behavioral measures. The current proposal seeks to curate these data to provide an anonymized public database and search tool (ARCquery). This will allow data scientists from around the world to apply their expertise to this archival dataset, providing new insights into brain function as well as identifying predictors of recovery. This repository will provide a Findable, Accessible, Interoperable, and Re-usable (FAIR) dataset. | https://github.com/rordenlab/AphasiaResearchCohortQuery |
Molly A. Bogue, Ph.D.; | The Jackson Laboratory, Bar Harbor, Maine, USA | Mouse Phenome Database: Making it More FAIR-compliant and TRUST-worthy The Mouse Phenome Database (MPD; https://phenome.jax.org) is a widely accessed NIH-supported Biomedical Data Repository focused on primary mouse phenotype data from genetic studies of complex traits in strains and populations. For over 20 years, MPD has been a community resource developed at The Jackson Laboratory, an independent non-profit research institute, that has disseminated mouse genetic data and resources to the biomedical community since its founding. MPD, listed in the Trans-NIH Biomedical Informatics Coordinating Committee registry, accepts submission of relevant data from the community to store, organize, validate, archive, preserve, and distribute the core data from phenotyping experiments to end users in an increasingly FAIR compliant manner. We have discovered challenges and opportunities in meeting evolving requirements in enhancing MPD’s FAIR capabilities and also meeting TRUST-worthy standards. Our overarching goal is to make MPD more FAIR-compliant and TRUST-worthy while providing better metrics to evaluate usage, utility, and impact of data in MPD. Our specific tasks for this Supplement are to: 1) Implement database changes to support emerging metadata standards for the description of primary experimental data, 2) Refine our API so that it exposes data to external systems using emerging metadata standards, 3) Refine the user experience and self-curation of data so that it meets emerging metadata standards and provides intuitive data submission, and 4) Develop traceability methods for data and document user’s workflow to enhance reproducibility, tracking, and reporting of data and analytic tool usage. By simultaneously addressing our challenges, we will improve data exposure and utilization globally through integration with a modernized informatics infrastructure. Funding provided by NIH DA028420 | |
Nadine Martin, Ph.D., CCC-SLP | Temple University | Translation and Clinical Implementation of a Test of Language and Short-term Memory (STM) in Aphasia: The CORE-APHASIA Collaboratory: Advancing Robust Data Science & Sharing (CARDS) Aphasia is an impairment of language, affecting the production or comprehension of speech and the ability to read or write, resulting from stroke, head trauma or other neurological condition. Research to improve health related quality of life for individuals with aphasia has resulted in rich datasets and knowledge but rely on inefficient data structures that do not fully leverage efficient research and all that could be learned from existing datasets. This proposed project will align our existing data platform for aphasia research, CORE-APHASIA, with the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User Focus, Sustainability, and Technology) principles to improve the standardization, interoperability, and shareability resulting in a modernized CORE-APHASIA data resource and platform. Our proposed methods and approach follow guidance and processes addressed in the NIH Data Science Strategic Plan. | https://hope2.monqcle.com/ |
Vikash Gilja, Ph.D. | University of California, San Diego | Data Repository for “CRCNS: Avian Model for Neural Activity Driven Speech Prostheses” Understanding the physical, computational, and theoretical bases of human vocal communication, speech, is crucial to improved comprehension of voice, speech and language diseases and disorders, and improving their diagnosis, treatment, and prevention. Meeting this challenge requires knowledge of the neural and sensorimotor mechanisms of vocal motor control. Our project directly investigates the neural and sensorimotor mechanisms involved in the production of complex, natural, vocal communication signals. Our results will directly enhance brain-computer interface technology for communication and will accelerate the development of prostheses and other assistive/augmentative technologies for individuals with communications deficits due to injury or disease. We will develop a vocal prosthetic that directly translates neural signals in cortical sensorimotor and vocal-motor control regions into vocal communication signals output in real-time. Building on success using non-human primates for brain computer interfaces for general motor control, the prosthetic will be developed in songbirds, whose acoustically rich, learned vocalizations share many features with human speech. Because the songbird vocal apparatus is functionally and anatomically similar to the human larynx, and the cortical regions that control it are closely analogous to speech motor-control areas of the human brain, songbirds offer an ideal model for the proposed studies. Beyond the application of our work to human voice and speech, development of the vocal prosthetic will enable novel speech-relevant studies in the songbird model that can reveal fundamental mechanisms of vocal learning and production. As a critical component of the project, we are collecting a large dataset of simultaneously recorded neural activity from implanted multielectrode arrays (e.g., Neuropixels) along with vocalizations and additional behavioral data. These multimodal data are collected over multi-hour sessions and behaviors are spontaneous and heterogeneous. To enable effective dissemination of these data to the research community our team will, in alignment with the principles of FAIR and TRUST, develop a comprehensive data schema that meets community standards, build software tools to enable broad data reuse, develop a queryable data repository, and will provide detailed tutorials. These efforts will contribute to existing active open-source projects utilized by the neuroscience community, including Neurodata Without Borders (NWB:N). We believe that by investing in the development of this data repository, the impact of the data produced by our studies will be significantly augmented. Additionally, the software engineering tools developed will have a broader impact on data-intensive neuroscience studies of complex behaviors including and beyond speech and vocalization. | |
Joshua Orvis, M.S. | Institute for Genome Sciences, University of Maryland School of Medicine | Advancing FAIRness and TRUST in the gEAR portal Discovery in biological sciences has shifted to increasingly rely on high throughput multi-omic data. The advent of single cell (sc) transcriptomics further revolutionized research in the ear field, given the intricate structure of the inner ear organs consisting of numerous distinct cell types that function in concert to properly sense hearing and balance. The value of the wealth of existing and future multi-omic, multi-modality and multi-species data is limited, both by their size and their relative inaccessibility, to biologists not trained in bioinformatics. The gEAR, gene Expression Analysis Resources (umgear.org), is a cloud-based ‘one-stop-shop’ where inner ear-related multi-omic data can be viewed and analyzed by biologists, without requiring programming skills (Orvis et al, 2021). The gEAR has become an invaluable resource for the ear research field, with over 1,300 users, 850 datasets (with over 160 organized in thematic profiles in the public domain) and thousands of monthly visits. The gEAR proposal named “The gEAR portal - Advancing Data Sharing, Analysis and Discovery for Hearing and Balance Research” was recently funded by a 5-year R01 from the National Institutes for Deafness and other Communications Disorders. Furthermore, the gEAR has been cloned to support the BRAIN Initiative and Neuroscience community through NeMOAnalytics.org. While the gEAR portal and planned work as part of the parent R01 address many of the needs of the field, it only partially complies with the important principles of FAIR-ness and TRUST-worthiness. This is important because appropriate compliance is necessary not only for portal sustainability but for enhancing the user-experience and portal reliability. We have surveyed the gEAR’s capability and proposed work plan in the context of the principles of FAIR-ness and TRUST-worthiness, and identified three main deficiencies, as well as several other easy-to-resolve opportunities for improvement. The three main areas for improvement are: (1) use of standardized metadata and ontologies; (2) data security; and (3) application of unique dataset identifiers. With this administrative supplement, we will address these shortcomings with a focused 12-month work plan and a dedicated team of data scientists that will work collaboratively to accomplish these goals. | www.umgear.org |
Brian MacWhinney, Ph.D. | Carnegie Mellon University | Improve the Compliance of the CHILDES Project Database with the FAIR and TRUST Principles The goal of this supplement is improve the compliance of the CHILDES Project database with the FAIR and TRUST principles. To improve compliance with FAIR and TRUST principles, we will pursue the following Specific Aims in this supplement: Specific Aim #1: Findability/Transparency: Metadata Improvement and Policy Documentation Specific Aim #2: Accessibility/User Community: Metrics Extraction, Publication Tracking, and Surveys of User Needs Specific Aim #3: Interoperability/Technology: interoperabiity with PLAY, CLARIN-FCS, CoNLL, and OpenNeuro Formats Specific Aim #4: Reusability/Sustainability/Responsibility: Version-tracking and Data Lineage, and Sustainability through Diversification | https://talkbank.org |
Carl Kesselman, Ph.D., M.Eng. | University of Southern California | FaceBase Data Hub: Enhancing TRUST-worthiness of the FaceBase Research Data Hub Craniofacial dysmorphia is one of the leading causes of birth defects, and in recognition, the major goal of the FaceBase III project is to advance research by creating comprehensive datasets of craniofacial development and dysmorphologies and to disseminate these datasets to the wider research NIH research community. By design, FaceBase is constructed so that its data be FAIR: Findable, Accessible, Interoperable, and Reusable. However, as the size and scope of FaceBase data grow, the impact of the resource can be enhanced both within the Craniofacial research community and in other communities that can leverage FaceBase data if the repository also followed the more recently identified principles of Transparency, Responsibility, User-focus, Sustainability, and Technology, or TRUST. While FaceBase already complies with some elements of TRUST, there are gaps that would prevent FaceBase from receiving TRUST certification. The goal of this project is to evolve FaceBase in such a way that TRUST certification can be obtained, with the goal of expanding the usefulness of FaceBase for the NIH research community. | https://www.facebase.org |
Linda Brzustowicz, M.D., FAPA | Rutgers University | Enhancing Alignment of the NRGR with FAIR and TRUST Principles The goal of the NIMH Repository and Genomics Resource (NRGR) is to further the understanding of the genetic and environmental etiologies of mental disorders. The NRGR receives raw biosamples, such as blood, from NIMH-supported research projects. The NRGR processes these samples to DNA, RNA, cDNA, or cell lines, which can then be used for genomic analyses. The NRGR also receives, curates, and harmonizes clinical/phenotypic data for each subject. Results of genomic analyses on samples in the NRGR are either directly deposited in the NRGR or are linked to a deposit in another public repository. After a proprietary period, the clinical data, genomic data, DNA, RNA, cDNA, and cell lines are made available to all NIMH-approved researchers through a secure web portal. This sharing of uniformly processed biological samples and curated clinical and genomic data from many cohorts leverages the NIMH investment in genetic studies. It provides critical research power by making a very large body of data available for study of the genetic bases for individual mental disorders. Since October 1998, >250K subject samples have been submitted to NRGR and >615K DNA and >15K cell lines have been distributed. There have been >1,700 distributions of clinical and genomic data to >1,000 investigators, resulting in >1,000 publications using NRGR samples and data. This supplement request is to strengthen aspects of the NRGR to enhance alignment with the FAIR and TRUST principles. We will pursue two project areas; the first will strengthen our system security by pursuing FISMA compliance and implementing a more robust backup/failover system, and the second will enhance broad data use through the development of a new web portal to facilitate easy access to item-level responses in detailed clinical dataset. Both projects are extensions of work proposed in the parent grant application and do not alter the original scope of the approved application. | https://www.nimhgenetics.org/ |
Dalane Kitzman, M.D. | Wake Forest University School of Medicine | Enhancing an Integrated Data Bank for Aging Studies The presentation focuses on the plan for improving the data repository, Integrated Aging Studies Databank and Registry (IASDR). Initially developed by the Wake Forest Pepper Old Americans Independence Centers (OAIC), and currently hosted by the OAIC Coordinating Center (CC), the IASDR is a repository designed for sharing data across the NIH-funded national network of OAICs, which currently includes 14 geographically dispersed OAICs across the U.S. These centers form a major network of collections for resources on aging, both locally and nationally. The CC aims to foster collaborations that produce synergy within the OAIC network and the aging research community, and to develop the younger generation of aging researchers. The IASDR provides an important shared collection for achieving these goals. As a domain-specific repository, the existing IASDR contains multiple data sets generated from aging studies conducted at the Wake Forest OAIC (WF-OAIC), as well as a biospecimen bank for samples collected from some of the included studies. The IASDR functions as a biomedical data repository—investigators from the OAIC network can adopt a two-step process to access data and biospecimens in the repository: (1) use the link (https://www.peppercenter.org/public/dspIASDR.cfm) to explore the registry (metadata) component of the IASDR and identify data sets and/or biospecimen samples, and (2) request the identified data and/or biospecimen. However, there are important gaps when the IASDR is evaluated against modern repository requirements. Historically, the infrastructure underlying the IASDR was developed after the inception of the CC in early 2000 by a team of CC computer scientists and programmers, and the development process did not necessarily strictly adhere to contemporary biomedical-data repository principles such as the FAIR and TRUST principles. Thus from the perspective of desirable attributes of data repository (e.g., data discoverability, interoperability, and reuse), deficits exist in the current IASDR. These deficits include areas such as data not being sufficiently described with rich metadata (FAIR F2 and R1), and the repository lacks clear and accessible data usage license information (FAIR R1.1.) Importantly, barriers are present for users in accessing key information for decision support. For example, currently summary statistics such as mean, standard deviation, frequency for variables and association information between variables, which are often critical in helping users to decide whether or not to pursue actual data, are not present in the IASDR metadata. In the presentation, we will discuss how we plan to address gaps and limitations of the IASDR as related to its “FAIR”-ness and “TRUST”-worthiness, along the following two specific directions: (1) Developing metadata features for the IASDR to efficiently provide users with key information—both visual and text—for decision support. Both traditional statistical methods and machine learning methods will be used to derive the necessary information. (2) Enhancing the IASDR along the Desirable Characteristics of Repositories (DCR) 5 criteria. We will evaluate the result of the enhancement, and seek certification of the IASDR repository. | https://www.peppercenter.org/public/dsplASDR.cfm |
Carol Bult, Ph.D.;\ | The Alliance of Genome Resources and the Mouse Genome Database The Jackson Laboratory | Aligning the Alliance of Genome Resources with FAIR and TRUST principles The Alliance of Genome Resources (Alliance) is a ‘knowledge commons’ for expertly curated information about model organism genetics, genomics, gene function, phenotype, and models of human disease. The community model organism databases for six major model organisms (mouse, rat, zebrafish, fly, worm, and yeast) and the Gene Ontology Consortium (GOC) are the founding members of the Alliance. The Alliance consortium provides unique and transformative support for comparative genomics through common user interfaces designed for both human readability and computational accessibility. The Alliance mission is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health, and disease. The expertly curated annotations and data from the Alliance are also critical for research focused on developing machine learning approaches for data-driven predictive biology. The resource development standards of the Alliance reflect the characteristics expected for a modern data ecosystem that operates under data management principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) in support of rigor and reproducibility. These standards include, unique persistent identifiers, well-structured metadata, expert curation, methods for quality assurance, open access, common data formats, provenance, and clear use guidelines. To strengthen the position of the Alliance as a community resource we will (1) address any issues related to the FAIRness of the data sources that contribute to the Alliance noted in a FAIRShake assessment conducted under the auspices of the NIH Common Fund Data Ecosystem Project, (2) complete the Alliance’s application to obtain Core Trust Seal certification, and (3) update information about the data and annotation licensing terms at the Alliance hosted on the (Re)usable Data Project (RDP). Finally, we will work with the broader community of data resource developers to establish common standards and guidelines for evaluating the usage and impact of community resources. | http://informatics.jax.org https://alliancegenome.org |
Jeffrey Grethe, Ph.D. | University of California, San Francisco | Open Data Commons for Traumatic Brain Injury (ODC-TBI) Trauma to the spinal cord and brain collectively affects over 2.5 million individuals in the US, with an annual economic impact of $80 billion in medical and loss-of-productivity costs. Despite the prevalence of neurotrauma, the precise patho-physiological features predicting recovery remain poorly understood. This gap in knowledge limits scientific reproducibility, therapeutic translation, and biomarker discovery for precision medicine applications. Part of the problem is that CNS trauma is intrinsically complex, involving heterogeneous damage to the most complex organ system in the body. Spinal cord injury (SCI) and traumatic brain injury (TBI) result in multifaceted syndromes spanning heterogeneous data sources and multiple scales of analysis. In addition, these injuries often occur at multiple sites within the CNS, with graded severities producing heterogeneous injuries with diverse outcome trajectories. Making sense of this complexity requires pooling data across multiple injury severities and types, and grappling with complex correlations among diverse scales of analysis ranging from molecular, anatomical, physiological, and behavioral levels. Large-scale data resources and big-data tools have potential to help. By pooling and harmonizing diverse data at a granular level, it becomes possible to make neurotrauma data “Findable, Accessible, Interoperable, and Reusable” (FAIR). FAIR neurotrauma data can then be harnessed using modern analytics to uncover novel relationships between functional recovery and underlying pathophysiology, directing discovery of biomarkers and optimizing outcome assessments for translation. The TOP-NT award is harmonizing preclinical data across animal models to build a novel, federated preclinical neurotrauma repository – the Open Data Commons for Traumatic Brain Injury. This supplement will work on improving the FAIRness and TRUSTworthiness of the ODC-TBI through the following activities: 1) Develop the necessary elements within the ODC-TBI to better support the TRUST principles; 2) Develop the necessary elements within the ODC-TBI to improve the FAIRness of data within the ODC-TBI; 3) Enhance the documentation on the ODC-TBI in support of FAIRness and TRUSTworthiness; and 4) In collaboration with project partners, take part in and contribute to planned workshops. | https://odc-tbi.org |
Darrell Hurt, Ph.D. | National Institute of Allergy and Infectious Diseases, National Institutes of Health | Increased Interconnectivity for Database of Antimicrobial Activity and Structure of Peptides (DBAASP) Leverage standards, APIs, and modern cloud architecture to connect unique, manually curated antimicrobial peptide data to large, international standard repositories. The Database of Antimicrobial Activity and Structure of Peptides (DBAASP) is a leading source of data on antimicrobial peptides and their activity against pathogens. The project is a joint effort between the NIAID Office of Cyber Infrastructure and Computational Biology (OCICB) and the Laboratory of Bioinformatics of the Ivane Beritashvili Center of Experimental Biomedicine (IBCEB) in Tbilisi, Georgia. This supplemental funding will enable several enhancements to better align the database with FAIR and TRUST principles. Data will become easier to find with the creation of a new identifier system for entries that can be easily referenced by outside resources and publications. External identifiers from Uniprot and maybe other major repositories will also be added for matching protein entries to increase cross-site interoperability. Additional enhancements to front- and back-end technologies will provide a more responsive user interface, both on desktop and mobile platforms, as well as a more powerful API and improved site security. We expect these FAIR and TRUST improvements to provide a more fruitful user experience and to help DBAASP continue to be a valuable resource for the research community for many years to come. | https://dbaasp.org |
Meghan McCarthy, Ph.D. | National Institute of Allergy and Infectious Diseases, National Institutes of Health | Making 3D Data “FAIR” with NIH 3D Improving information architecture and data management of digital 3D models related to bioscience and medicine for use with 3D printing and advanced visualization technologies. The NIH 3D Print Exchange (3DPX; https://3dprint.nih.gov) is a community-driven portal launched by NIAID in 2014 to support sharing of digital designs for 3D printing of models related to bioscience and medicine.1 The repository includes over 10,000 entries and over 100,000 3D model files. In addition, we host free web tools that convert raw data from molecular structures, microscopy images, and medical imaging data into 3D-printable formats. A major rebuild of the application is already underway and will relaunch in 2022 as “NIH 3D.” The expanded scope includes additional tools, file formats, and optimizations that will encourage increased sharing and creation of 3D models for web-based visualization and virtual and augmented reality in addition to 3D printing. This supplemental funding will be used in support of our objectives to better align with FAIR and TRUST principles: (1) identify and implement taxonomies and ontologies to improve organization and findability of models with domain-specific standards; (2) incorporate digital object identifiers and citation download tools; (3) improve the technical implementation and interface for tracking design versions to ensure that entries can be updated while also maintaining a record of changes that are visible to the user. | |
Quan Chen, Ph.D. | National Institute of Allergy and Infectious Diseases, National Institutes of Health | Project: ‘FHIR’ing up ImmPort: Improving Interoperability of ImmPort Data To improve the interoperability of ImmPort shared data with other clinical resources, we propose to make FHIR (https://hl7.org/FHIR/index.html) formatted data available from ImmPort. As a first step toward enhancing interoperability of ImmPort shared data, we plan to develop and publish documentation describing attribute mappings between FHIR resources and the ImmPort data model. This work will focus initially on existing data in ImmPort with consideration given toward growth in new datasets from FHIR data providers. This information creates a knowledge foundation on which subsequent development and implementation efforts will build and strengthen the interoperability of ImmPort data. Next, we plan to execute the data transformation process for applicable ImmPort shared data. The focus of the transformation process will be studies with clinical data content including COVID-19 related datasets that have already been shared by ImmPort. Provision of ImmPort clinical data in FHIR format will enable the connection of ImmPort shared data with other FHIR data resources to increase the interoperability of ImmPort data. | https://www.immport.org/home |
Rebecca Rodriguez, Ph.D. | National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health | Applying FAIR and TRUST Principles for Enhanced Resource Sharing and Sustainable and Reliable Repository Operations The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository (NIDDK-CR) was established in 2003 with the intent to expand the usefulness of multi-center clinical study’s generated resources by providing access to a wider research community beyond the end of the study. It has continually expanded its portfolio of studies, data, and biospecimens over time and currently includes 157 studies with over 200 data packages, 138 biospecimen collections with over 14.5M samples that include over 30 studies with genetic samples. Its visibility to the research community, and by extension, the user base of researchers requesting data and biospecimens, has increased accordingly. While the NIDDK-CR continues to make significant system and process improvements, both the NIDDK-CR and its users can benefit from additional steps that streamline these processes, improve data and specimen annotation, and encourage wider recognition of its achievements by the scientific community. In addition, new technology advances and new informatics capabilities developed since the NIDDK-CR’s inception have made it possible to increase the efficiency of NIDDK-CR resource promotion. Such improvements offer an opportunity to expand the research community’s awareness of the existence of the NIDDK-CR and increases the potential for reuse of NIDDK-CR resources in novel research. Through the ODSS supplement we propose securing Core Trust Seal (CTS) certification for the NIDDK-CR. CTS certification will demonstrate the quality, transparency, and trustworthiness of NIDDK-CR’s processes to data stakeholders achieving the following goals: - Implement the relevant portions of the Desirable Characteristics of Repositories (NOT-OD-21-016) with the aim to strengthen the adoption of the FAIR and TRUST principles by pursuing metadata standardization requirements and improved data integrity and validation processes as well as mapping to and implementation of schema.org.
- Become involved in efforts to help adopt, enhance, or contribute to community-based metrics standards or best practices of metrics that evaluate the usage, utility, and impact of the data repository throughout its life cycle by outlining a CTS roadmap and obtaining CTS Certification.
- Contribute to community-based standards, best practices of metrics that evaluate usage and impact of the data repository through plans to implement Common Data Elements (CDEs) and standard metadata and using advanced technology to increase impact by developing plans for using standard terminology and CDEs for assets annotation.
We are confident that improving the NIDDK-CR system to the level of CTS certification will improve NIDDK-CR resources’ usability and continue to improve alignment with ODSS’ mission to modernize data repositories within the NIH data ecosystem. | https://repository.niddk.nih.gov/home/ |
Jennifer Fostel, Ph.D | National Institute of Environmental Health Sciences, National Institutes of Health | Ensuring FAIR and TRUST for High-dimensional Environmental Study Data In this project, we will work with an external contractor, BioTeam Inc., with the following objectives: BioTeam Inc. will assist NIEHS in developing a framework that will inform the desired future state of NIEHS databases by using the Chemical Effects in Biological Systems (CEBS) database / knowledgebase as a model for sustainable management of environmental health data in a distributed data ecosystem. BioTeam Inc. will provide advice about the planned future state of CEBS. CEBS contains data produced by the Division of the National Toxicology Program (DNTP) over the past 45 years, moving towards an integrated Data Warehouse and permitting cross-cutting questions to be asked. DNTP is also identifying high-value datasets to host in public CEBS Data Marts for direct user query. In addition, DNTP has collected high-dimensional expression data from some subjects and currently houses these in the SRA and GEO repositories and may apply the same model to microbiome and metabolomics data in a CEBS data ecosystem such as GEN3. BioTeam Inc. will advise NIEHS on the following: sustainability and governance considerations; FAIR data sharing and TRUSTworthy repositories; interoperability with NIH data systems; common standards for measuring data use and utility; suggestions to model storage and personnel cost, and to enhance and partially-automate curation; considerations of complexity of storage of high-dimensional data in government repositories while integrating the data for analysis. Finally, BioTeam Inc. will suggest how these solutions might differ if applied to a data system that accepts environmental health data from the public in addition to the data provided by the DNTP. | https://cebs.niehs.nih.gov/cebs/ |