For more than 25 years, researchers in disciplines such as high-energy physics and mathematics have been able to access the very latest research findings in the online repository known as arXiv (pronounced “archive”).
Here, researchers deposit their preprints—complete and public drafts of scientific documents, not yet certified by peer review—to:
- ensure their findings are quickly and widely disseminated;
- establish priority of their discoveries; and
- invite feedback and discussion to help improve the work.
Unfortunately, despite these benefits, researchers in the life sciences have been slow to share preprints. While the arXiv holds over 1.2 million articles, the number of preprints shared in the life and biomedical sciences is estimated to be less than 25,000. However, this disguises significant growth over the past two years, which has been aided by the work of ASAPbio, a scientist-driven initiative to promote the productive use of preprints in the life sciences.
As adoption of preprints has grown, so has the number of platforms that researchers can use to post their preprints. These include the Quantitative Biology section within the arXiv, Peer J Preprints, bioRxiv, and SocArxiv. We anticipate this space will become even more crowded, with the emergence of new discipline-specific preprint servers (e.g., ChemRxiv) and publisher-led services.
While it’s positive that preprints are becoming a recognized part of the scholarly communications ecosystem, the downside is that it’s becoming more difficult for researchers to discover relevant content and to know, for example, which preprints have been subject to some initial screening to weed out ethically questionable or unscientific content.
Call to action
To address these issues, the National Institutes of Health (NIH) is working with an international group of research funders to explore the value and feasibility of establishing a Central Service for preprints.
This is a unique opportunity to encourage sharing of preprints in the life sciences and to support the development of a core infrastructure to ensure the benefits of pre-printing are fully realized.
ASAPbio has published a Request for Application (RFA) to identify potential suppliers to build a Central Service for Preprints in the Life Sciences.
The service would seek to aggregate content from multiple sources, such as the preprint servers listed above, and provide new ways for researchers and machines to search, access and reuse this content.
In addition to NIH, the consortium includes the:
- Alfred P. Sloan Foundation
- Canadian Institutes for Health Research
- Department of Biotechnology (India)
- European Research Council
- Helmsley Trust
- Howard Hughes Medical Institute (HHMI)
- Laura and John Arnold Foundation
- Medical Research Council (MRC)
- Simons Foundation
To help clarify our intent, the consortium has set out a series of principles that such a service would need to adopt. These cover issues such as governance, licensing, access, usability and sustainability.
Benefits of a Central Service
Just as PubMed provides researchers with an easy way to navigate peer-reviewed biomedical literature, irrespective of where it was published, we want the Central Service to play a similar role for preprints, guiding researchers to the most relevant research.
Unlike PubMed, the Central Service will hold the full text of the preprints, enabling a much richer suite of services for the research community.
This could include integrating preprints with the underlying research databases (e.g., nucleotide sequence databases, chemical compounds, and so on) to allow researchers to seamlessly link the literature to the data and vice versa. As these value-added services develop, it will in turn encourage more researchers to post their preprints.
As funders like Wellcome, HHMI, MRC and others develop policies encouraging researchers to post preprints, it will be increasingly important to understand whether these preprints meet certain standards, in terms of screening mechanisms, ethics and metadata.
The Central Service will only ingest content from third-party preprint servers which meet agreed quality thresholds. In short, the ability to cite a Central Service ID will be a guarantee that the preprint meets an agreed standard.
We envisage an independent, research-led board will define these standards and direct development of the Central Service.
Uncovering new knowledge
In addition to researchers, the Central Service will provide machine access to the preprints through an open Application Programming Interface (API).
As ingested content will be converted to a common standard—most likely XML—computers will be able to crawl and mine the material. This will help uncover new associations, stimulating discovery and opening up novel avenues of research and innovation.
Preservation and sustainability
The Central Service will act as a long-term archive, ensuring copies of ingested preprints are available in perpetuity.
Over time we expect the majority of preprints to migrate to peer-reviewed articles. But for those who don’t have access to the published literature—or don’t have access in a way which facilitates machine access and re-use—a long-term and stable archive of preprints will be critical.
In recent years there has been a growing focus on the need for funders to work together to develop and sustain key community resources of this type. We anticipate that international funding partners would adopt a joint funding model for the Central Service if:
- demand from the community remains high and
- a suitable supplier can be identified to deliver the service in line with the principles and at a reasonable cost.