With the biomedical sciences transitioning to a digital enterprise, the need for data science (computational and quantitative) skills in research is becoming increasingly apparent. While a few individuals may possess all the skills needed to appropriately analyze the large amounts of data being generated, most projects will require teams of specialists – both data science and biomedical – as equal collaborators. These teams need individuals with enough overlap in knowledge and skills to be able to communicate effectively with one another. Examples of such teams exist, but they are few in number.
The NIH is interested in fostering the development of individuals and teams with the right skills to tackle the biomedical problems of the future. Development often requires resources including money, time, and a personal commitment. Funding calls are incentives that address money and time, to the extent that time can be reallocated through “buying out” of other commitments. A personal commitment must come through internal motivation, which is harder to address. Biomedical researchers might be motivated by the knowledge that solving interesting biomedical problems requires working in a collaborative biomedical data science team. Data scientists might be motivated by the ability to contribute to a lofty mission that benefits all of humanity, and they may be incentivized to work with a particular team by being treated as equal intellectual collaborators.
Creating an environment in which biomedical data science teams can thrive requires embracing not just what data science can do for science but how to get there – how to build a team and how to function well as a team. These added dimensions are among the issues addressed by the Science of Team Science (SciTS). As the field that studies processes to improve team-based research, it produces practical resources such as toolkits [www.teamsciencetoolkit.cancer.gov] to facilitate the forming of teams. SciTS toolkits address issues such as finding potential collaborators, developing a shared language, setting expectations, and measuring outcomes. We will focus on the first two since they are general, rather than project-specific.
To form a team, first the collaborators need to find one another. This sounds simple, but it is often highlighted as a problem by data scientists and biomedical scientists alike. Data science can help solve the problem of matching biomedical scientists with data scientists as potential collaborators. Data science already helps find potential soulmates match.com and eHarmony.com – why not use those same algorithms to help find potential teammates? This pre-screening of team members could be based on customizable criteria such as scientific expertise, physical location, and goals, both for the individual and the team. Final selection, of course, is highly personal and requires an investment of time and energy, but data science can be applied to help screen and suggest potential teams based on the information given. Although the usefulness of the technology depends greatly on the data and information put in, data science algorithms can improve the efficiency with which biomedical data science teams form.
The collaborators, who are each specialists in some area, may have complementary areas of expertise with little or no overlap. Well-functioning teams require the specialists to have some amount of overlap in knowledge. This way, the team members can speak the same language, understand the interface between their work and that of their teammates, and ultimately interpret results correctly, recognizing the complexities and subtleties inherent in the data, its analysis, and the science behind the data. If this overlap in knowledge does not already exist, individual scientists, whether biomedical, computational, or quantitative, need to acquire it. A wide variety of short courses exist to acquire a working basic knowledge of scientific topics – both NIH-funded and otherwise, both in-person and online. With such a large number of courses, it is a challenge to find the ones that are right for the individual. Some work in this area has already begun, for example, through TESS, ELIXIR’s portal for discovering training materials. Individualizing discovery of training materials is an area where applying data science can be helpful. Retailers such as Amazon continuously recommend new books and products. Training courses can be considered a product – a course could be recommended based on an individual’s current knowledge, what he/she wants to know, and how well the material fits with his/her learning style. Recommendation engine technology can be adapted to all types of biomedical training materials with appropriate minimal metadata. One day, data science technology will help biomedical data science teams gain overlap in expertise and a shared language.
To attain both goals – finding potential collaborators and developing a shared language among collaborators – applying data science can be useful. The ultimate goal, however, is not the application of data science for its own sake, but to facilitate the development of collaborative teams of data scientists and biomedical scientists to tackle the data-driven problems that will move biomedical science forward. Some collaborative teams already exist at the interface of biomedical science and parts of data science, e.g. in areas such as computational biology and systems biology. Since data are increasing in size and complexity, the size of this interface will also increase, and hence biomedical data science will become the norm rather than the exception. Data science can be used to improve the process of forming collaborative biomedical data science teams, and these teams in turn will play a key role in driving biomedical science forward.