BD2K Program Management Working Groups

Big Data potential and challenges in biomedical research cut across many domains. The project management working groups bring together individuals interested in supporting research and investigation into solutions to accelerate the translational impact of biomedical data science and to ensure a sound foundation for the future of biomedicine as a digital enterprise.  These essential and dedicated groups gather information to support BD2K programmatic directions, analyze and interpret that information, propose and plan funding opportunities, shepherd grants through the review and funding process, and manage funded programs.

BD2K Centers


BD2K has been established to develop new approaches, methods, software, tools, and related resources and provide training to advance data science in the context of biomedical research.  BD2K Centers are large-scale efforts that take on those challenges that are not feasible to address with the standard R01 grant. The twelve centers form a Consortium, in which each interacts and collaborates with the other centers in the consortium, as well as interacting and collaborating with other domestic or international efforts in data science.  The BD2K Centers work in areas spanning data science, producing tools and resources from early-stage to mature development for the biomedical research community.  The results and products of the BD2K Centers should be useful and generalizable to meet the needs of the broad biomedical research community


LINCS-BD2K Perturbation Data Coordination and Integration Center

The LINCS-BD2K Center is funded to provide Big Data-based research, method and tool development, and data science training activities for the NIH Common Fund supported Library of Integrated Network-Based Cellular Signatures (LINCS) project. This project is producing massive quantities of biological information about the responses of cells and tissues to perturbation by drug and small molecule treatments.  In addition to collaboration with the other LINCS program grantees, the LINCS-BD2K Center is a member of the BD2K Centers Consortium.

BD2K Centers of Excellence for Big Data Computing

The 11 BD2K Centers of Excellence for Big Data Computing are large-scale projects that aim to develop new approaches, methods, software tools, and related resources, as well as to provide training to advance Big Data science in the context of biomedical research. The Centers form a consortium which also includes the LINCS-BD2K Center and the other BD2K awardees. This enables them to perform collaborative projects in addition to their individual research.


Mark Guyer, consultant,
Lisa Brooks, Program Director, NHGRI,
Vinay Pai, Program Director, NIBIB,



Identify actionable steps that NIH can take (alone and with others) to facilitate research use of clinical data by funded investigators. The WG will develop proposals for BD2K activities that can create the knowledge, infrastructure, and tools needed to support improved use of clinical data in research, and will maintain regular surveillance of relevant initiatives being conducted by groups outside BD2K to identifying unmet needs that might be filled by BD2K. Clinical data includes data collected in clinical research (e.g., in interventional and observational studies), in clinical care (e.g. as recorded in Electronic Health Records (EHRs)); and in patient registries. It also includes newer forms of personal health data, such as Personal Health Records (PHRs); direct-to-consumer tests; mHealth apps; “smart” devices; environmental data; and social media data.


Enabling Research Use of Clinical Data

This workshop convened a multidisciplinary group of stakeholders with interest and involvement in the use of clinical data for research.  Recommendations from this workshop fell into the following  conceptual areas:  Improve Access ; Increase the Quality and Quantity of clinical data available for research ;Spur innovation in analytical methods and tools for research involving clinical data; and  Facilitate effective uptake and use of resulting research findings to improve health and health care.

EHR Data Methodologies in Clinical Research: Perspectives from the Field 

This workshop convened a small number of experts to address methods for optimizing the robustness and use of data from the Electronic Health Records (EHR) for a variety of clinical research purposes that fall within NIH’s domain.


Leslie Derr,
Jerry Sheehan



A shared and interoperable environment intended to facilitate access and catalyze the use, reuse, interoperability and discoverability of shared digital research objects.


A number of pilots are currently  being developed to test the Commons concept
Credits business model : distribution of the commons credits and payment for cloud computing  services
Projects to test the Commons framework will be forthcoming.


Vivien Bonazzi,
George Komatsoulis

Resource Indexing

Software Discovery and Sustainability


Approaches for Discovering, Citing, and Tracking Biomedical Software


Software Discovery Workshop

The workshop was organized around three major sessions: Finding and Tracking Software; Software Citation and Other Incentives; and Software Reproducibility.

The Software Discovery workshop explored the challenges and opportunities associated with citing, tracking, and sharing biomedical software. Interest was in understanding approaches for making software easier to locate via computer-readable meta-data, digital identifiers, and other innovative methods. In addition, the workshop focused on identifying the needs of biomedical software users and developers as they seek to find, cite, and use these tools in biomedical research. Potential barriers and incentives to adoption and use of these different discovery, citation, tracking methods were discussed.


Ishwar Chandramouliswaran,,
Vivien Bonazzi,

Data Discovery Index


The Data Discovery Index Working Group supports the BD2K effort to enhance the discoverability, access, interoperability and re-use of biomedical big data through the formation of an index that will operate in the Commons as a key element of the NIH Digital Enterprise.


Biological and HealthCare Data Discovery and Indexing Ecosystem (bioCADDIE)

bioCADDIE seeks to develop a prototype DDI that will enable finding, accessing and citing biomedical big data. bioCADDIE has a Community Engagement mandate that seeks to work with the broader biomedical community to better identify data, and other digital objects, so that they may find shared data in ways that allow for extracting maximal knowledge. BioCADDIE is the recipient of a grant under the Data Discovery Index Coordination Consortium FOA


Ronald Margolis,,
Jennie Larkin,


Community-Based Data Standards


The Community-based Data and Metadata Standards Workgroup recognizes that implementation of high quality data and metadata standards is essential for promoting data access and reuse and for fully capitalizing on the explosion of biomedical ‘big data. The workgroup aims to (1) establish an internal NIH framework of policies, governance, administrative procedures, and funding to routinely support community-based data and metadata standards efforts; 2) use that framework to provide catalytic extramural research support for particularly opportune efforts under BD2K, that are broadly relevant to NIH research; and 3) integrate the framework for supporting community-based standards efforts into other BD2K activities to identify and capitalize on potential synergisms.


Workshop of Community-based Data and Metadata Standards Development: Best Practices to Support Healthy Development and Maximize Impact

This workshop will bring together various stakeholders in standards development to recognize common technical, social and financial pain points and possible NIH assistance mechanisms to better support data standards development.


Cindy Lawler,

NIH Standards Information Resource


The BD2K NSIR Working Group is working towards establishing a coordination and information center focused on biomedical data and metadata standards -- it will bring together information about the diverse standards relevant to biomedical research. The NSIR will work closely with national and international standards bodies and resources to be complementary to other on-going efforts.  In the future, grant seekers creating data management plans will be pointed to this NSIR for guidance on which standards may be appropriate for their research.  Standards referenced in NSIR will link to other relevant BD2K and NIH resources.


Sherri de Coronado, NCI;



With growing datasets and flat research budgets around the world, the challenge of sustaining repositories is straining the capacity of current funding approaches to fulfill the need. It is essential to develop new, innovative business models. The Sustainability Working Group explores economic, technical, policy and administrative approaches toward enhancing long-term sustainability of biomedical data repositories.


Request for Information:

Supplements to Support Interoperability of NIH-Funded Biomedical Data Repositories


Allen Dearry,

Targeted Software Development


The Targeted Software Development Working Group supports the development of innovative analytical methods and software tools with the objective of addressing critical current and emerging needs of the biomedical research community for using, managing, and analyzing the larger and more complex data sets inherent to biomedical Big Data.


Development of Software and Analysis Methods for Biomedical Big Data in Targeted Areas of High Need (U01)

BD2K Think Tank: Game Developers and Biomedical Researchers

This think tank explored the opportunities in and began to address challenges of how these two communities currently collaborate, exchange data science & visualization expertise, and develop games for enabling and performing biomedical research that addresses important science and health issues that affect everyone.


David Miller,



The primary functions of this working group are to 1) identify training needs, 2) develop initiatives to solicit applications to meet those needs, 3) counsel potential applicants, 4) make recommendations regarding applications once peer-reviewed, 5) manage the award process and progress, and 6) evaluate how well the initiatives respond to the BD2K training goals. This working group has developed a suite of 10 BD2K Training FOAs to help meet the challenges of biomedical Big Data for diverse audiences.


Training of Biomedical Data Scientists

Through predoctoral training programs and career development awards, biomedical data scientists are being trained to develop new methods and tools for biomedical Big Data.  These trainees gain knowledge and skills in the computational, quantitative (math and statistics), and biomedical sciences.

Short courses and Open Educational Resources in Data Management and Data Science

To expose trainees to basic introductory material in data management and data science, the development of open educational resources (such as MOOCs, modules, etc.) is supported through the R25 mechanism.  To complement online didactic learning, grants are being awarded to support in-person courses and research experiences, with a particular focus on enhancing diversity.


Erica Rosemond, Program Director, NIMH, Co-Chair, BD2K Training Subcommittee,

This page last reviewed on September 5, 2018