DataScience@NIH

Driving Discovery Through Data

9 A Primer on the Certifications of a Trusted Digital Repository (TDR)
/ 04.20.17

Data sharing and reuse is a concept that’s rapidly gaining ground. But, researchers remain understandably cautious about the quality and trustworthiness of data collected by others.

To move forward with the grand visions for Big Data, we must overcome this suspicion and find ways to ensure the data we store and make available are trustworthy.

But, what does “trustworthiness” mean in this context?

Trustworthiness encompasses both the quality of the data and sustainable, reliable access to it. Together, both will enhance scientific reproducibility by ensuring the data are selected, collected, organized, and stored using agreed-upon, established criteria.

But, what are those criteria?

The European Framework for Audit and Certification proposed three levels of certification for a Trusted Digital Repository (TDR). Each level has different requirements to address different needs. The three certification levels are Core, Extended, and Formal, also referred to as Bronze, Silver, and Gold.

The levels are not meant to convey a hierarchy or superiority/inferiority as much as to satisfy the minimum standards for different types of data, such as basic research data vs. human health data vs. financial transaction data.

The major assessment areas are the same for all three levels and include:

  • Organization
  • Management of intellectual entities and representations
  • Infrastructure
  • Security

The differences reside in how the audit is performed and the number of factors evaluated in that audit.

The following table summarizes the three levels and how their certifications differ:


LEVEL
CORE
EXTENDED
FORMAL

Organization(s)
WDS: ICSU World Data System
DSA: Data Seal of Approval
DIN: German Institute for Standardization
ISO: International Organization for Standardization
No. of Requirements
16
34
100+
Audit Process
Self-assessment + independent peer review (2)
Self-assessment + independent peer review (2)
ISO certified audit with accredited auditors
Certifiction Cost
Free
€500
$10,000
Designation
World Data System logos or Data Seal of Approval
nestor Seal for Trustworthy Digital Archives
TBD
Certification lifespan
3 years
Indefinite
3 years
No. of Certified Repositories
130+ (WDSDSA)
Coming Soon

Ultimately, the choice of certification depends on how much a repository is willing to invest in its perceived prestige and good operational practices. Certification costs include both the certification fees themselves—which range from free of charge to $10,000 per year—and the time or personnel costs spent on preparation. The latter can be substantial because certifying a repository can take months, depending on its maturity and audit readiness.

A certification’s lifespan also varies. The Extended level of certification is valid indefinitely, but will need to be updated to stay relevant, while the Core and Formal levels are valid for three years each. Changes in technology and user needs can also drive re-certification.

Currently, due to the time needed to train auditors, the ISO has yet to certify a repository at the Formal level. 

A fourth option, outside the European framework, arose from the Center for Research Libraries (CRL), a consortium for academic libraries in the United States and Canada. The CRL’s approach to certifying data repositories uses the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC). TRAC is based on the concepts of Open Archival Information System (OAIS | ISO 14721:2002), the precursor checklist to ISO 16363:2012. While the CRL is not an accredited certifying body, this group has used TRAC to audit digital repositories since 2005.

Where does this leave us?

Two decades in, we are still early enough in the Big Data era that what constitutes “trustworthiness” and how to identify and evaluate those attributes—like the certification options above—are still evolving. The validity and helpfulness of certifications themselves are also under debate, as some experts highlight the potential of them being misused or misunderstood.

As the conversations continue, NIH is committed to working with the community to determine the best ways to validate and certify data. Last year, NIH issued a Request for Information (RFI) on Metrics to Assess Value of Biomedical Digital Repositories, and the 2016 BD2K All-Hands Meeting focused on the topic during a session on sustainability. Based on stakeholder feedback, we know there is a clear and immediate need to evaluate the value and performance of data repositories. TDR certifications offer an interesting and potentially useful avenue to address that.    

-------

Acknowledgments:

The author would like to thank Allen Dearry (NIEHS), Susan Gregurick (NIGMS), and Gabriel Rosenfeld (NIAID) for insightful feedback, edits, and stimulating discussions. David Giaretta, Mustapha Mokrane, Christian Keitel, and Marie Waltz helped review and edit the factual content of the certification descriptions for ISO, WDS/DSA, DIN, and TRAC, respectively.

About the Author:

Dr. Dawei Lin is the Associate Director for Bioinformatics and the Senior Advisor to the Director of the Division of Allergy, Immunology, and Transplantation (DAIT) at the National Institute of Allergy and Infectious Diseases (NIAID). In this capacity, Dr. Lin leads a Data Science Group in developing and administrating scientific and infrastructure programs, including ImmPortInformatics Methodology and Secondary Analyses for Immunology Data, and the Statistical and Clinical Coordinating Center (SACCC) for DAIT-sponsored clinical research. This Data Science Group also uses Big Data based approaches in grant portfolio analysis and advises DAIT on data sharing issues and policy. In his trans-NIH efforts, Dr. Lin is the Program Officer for bioCADDIE, NIH's Big Data to Knowledge (BD2K) Data Discovery Index initiative and is actively participating in discussions regarding sustainability issues for data repositories.

Comments

"The levels are not meant to convey a hierarchy or superiority/inferiority as much as to satisfy the minimum standards for different types of data, such as basic research data vs. human health data vs. financial transaction data."

The European Framework for Audit and Certification states, "The framework will consist of a sequence of three levels, in increasing trustworthiness." The Framework also indicates that Formal certification can be achieved via full external audit and certification to DIN 31644.

Hi Mark, thank you for pointing out the original definition. I have heard different opinions about the evolving landscape of data repository certifications. It seems that the three levels have evolved themselves into independent entities, which are administrated by different organizations. There is little evidence that those organizations are coordinating to form a continuous ladder of trustworthiness. Therefore, I used a more pragmatic definition and hoped to reflect the current status. However, there might be better way to describe the purposes and usefulness of certifications. In addition, things are continued to change. For example, Mustapha Mokrane of WDS told me at the recent RDA meeting that WDS and DSA are going to merge into one certification in the near future instead of two right now. The purpose of this post is to solicit input and to stimulate the discussions on the important topic. Thank you for starting the conversation.

The fourth option, TRAC, arose from a cultural heritage community-wide effort co-chaired by Robin Dale, then from the Research Libraries Group (RLG) and myself, representing the National Archives and Records Administration (NARA). In July 2006 CRL acquired RLG, thereby becoming a co-host of the effort. Since NARA, as a federal agency, does not copyright its products, and RLG no longer existed, CRL copyrighted TRAC. CRL played no role in the development of TRAC beyond employing Robin Dale and allowing her to complete the written report. TRAC is one aspect of the cultural heritage community's effort, beginning in the 1990s, to develop a universal standard for evaluating digital preservation efforts that culminated in ISO 16363 as mentioned.

Hi Bruce, thank you for filling the historic information about the TRAC development. Do you have any document to recommend for reference?

As the director of a data repository (Inter-university Consortium for Political and Social Research), I became a strong supporter of TDR certification. ICPSR was certified by both DSA and WDS (before they merged), and ICPSR was audited in 2006 as part of the development of TRAC, which became the ISO standard. On one hand, TDR certification was a way to demonstrate our commitment to ICPSR's research community. Data producers, users, and sponsors (like NIH) have a right to expect that their data will be FAIR and persistent over time. On the other hand, the certification process promotes self-awareness and organization development. Data repositories are often created by researchers without formal training in archival science. The certification process helps a repository to develop its policies and practices in a focused and coherent way. For example, when ICPSR applied for data security authorization, we already had many policies required by FISMA because of our compliance with TRAC.

I would also emphasize the importance of TDR certification as a demonstration of sustainability. Digital objects are easily lost if they are not continuously managed. In my opinion, all data repositories should have a succession plan that details how the data will be preserved if the repository is forced to close. One model for succession is in Article 9 of the Data-PASS MOU, which has been signed by 8 social science data repositories. (See http://data-pass.org/sites/default/files/Data-PASS_MoU_201504.pdf)

Hi George,
Thank you for sharing your perspectives. Ingrid Dillo pointed out a report to me about the benefits of DSA certification based on a survey of ~50 repositories in 2016. http://www.ncdd.nl/wp-content/uploads/2016/10/201611_DE_Houdbaar_Report_...

Hi
As Mark points out, the original aim of the European Framework was a hierarchy of increasing trustworthiness. This was guided by a part of the EU which was funding projects about digital preservation. The aim was to avoid having completely unrelated certification systems, which would make inter-comparisons very difficult. That was a worthy aim, but as Dawei says, that European Framework really does not exist in that they have been evolving and specifically the funders are no longer pushing integration.

Therefore perhaps it would have been better not to mention that Framework at all, and yet clearly there is a hierarchy in terms of the number and depth of the requirements and the process behind each of the 3 (by the way I agree with Bruce's comment about TRAC, it was always meant to be a step along the way to ISO 16363). Therefore I can understand Dawei's use of that Framework.

As is often joked, the great thing about standards is that there are so many to choose from! But the important truth is that there is a choice, both for the users and the creators of the standards.

I chair the group that created ISO 16363 and we made a choice many years ago that we should go with the process which underpins so many of the things we depend for our safety on, from food to environment to technology and so on, i.e. the ISO process and specifically ISO 17021 as the way to audit. It seemed to us that digitally encoded information was so important that it needed the same level of trustworthiness as all those other tangible things.

Repository managers, and their funders, have a choice which will be determined by many factors; I hope that they make the right decisions otherwise society may be significantly poorer in future.

One last point in the context of Big Data; the 3 different systems you discuss now have a clearer common link to OAIS (ISO 14721), in that understandability and usability are included in the requirements but I am sure that ISO 16363 has the stronger, clearer and deeper roots in that standard, which was aimed at the preservation of all types of digital objects, including scientific data.

Hi David,
Thank you for sharing your perspectives. I heard people more and more talked about that the three-tier structure was the original intent and is not the focus of current certification development.

Response from the DSA and WDS: the CoreTrustSeal Board.

The WDS-DSA Common Board (now the CoreTrustSeal Board) which is the successor to the Data Seal of Approval and World Data System certifications thanks Dr. Dawei Lin for his ‘Primer on the Certifications of a Trusted Digital Repository (TDR)’ and would like to share a few thoughts in response.

We understand that many long standing institutions which steward data, including research data, are trusted by their data depositors, data users and beyond. The certification process is intended to go beyond these established trust relationships to confer a more formalised notion of ‘trustworthiness’ against some set of agreed criteria.

Dr Lin’s post notes, accurately, that certification options are evolving. He also notes community questions about the validity, helpfulness and potential for misunderstanding and misuse of TDR standards. DSA-WDS agree that openness, flexibility of options for a range of repository types, and a commitment to evolve with the data management community are critical to consistency and improvement. As all these TDR approaches encompass the organisation as a whole (infrastructure, digital object management, security etc.) the preparation of responses provides an excellent opportunity for internal communications and cooperation within applicants. Both the DSA and the WDS have experience of applicants reporting how new and interesting conversations were started and avenues for improvement were opened by the initial self-assessment process and the subsequent peer review.

The CoreTrustSeal process requires that final, successful applications are public, providing an accessible route for sharing peer experiences and evidence in a standardised form. Over and above the certifications themselves, we see the requirements as way to drive improved internal communications, business information management and sharing of good practice between peer repositories. The CoreTrustSeal Board (and all its members) are actors in this space, we must be responsive to the needs of data repositories, depositors and users as we improve our certifications standards, processes and our community as a whole.
One example of active community change, as mentioned in the post, is the ‘grand vision of big data’. Both the OAIS itself and the TDR standards will need to demonstrate that they remain valid in the face of these new and novel forms of data which offer new research opportunities. We face an increasingly wide number of actors in the data/big data space, with a complex range of services, partnerships, research infrastructures and outsourcing agreements to support the variety of high volume, high velocity data in play. How does the notion of certifying a traditional ‘archive’ need to change when each actor may undertake a subset of traditional repository functions?

Whether an archive, a temporary data steward (such as a research project), or a service (such as a multiple redundant bit-level storage provider) seeks certification will depend on a mixture of local practice and funder/customer expectations.
From the CoreTrustSeal perspective we would expect that most of the evidence required for Core certification is required to run a consistent, sustainable repository operation. On the one hand, if you don’t maintain procedures and records while deploying the right skills to manage appropriately secured data, then you have bigger problems than being ‘uncertified’. On the other hand we’ve seen the DSA and WDS criteria used as the starting point for defining (and seeking funding for) better practice. The initial costs of preparing such evidence have benefits beyond achieving TDR status and the maintenance of such evidence simply forms part of ongoing business management. The incremental cost of updating evidence statements against 16 Requirements every three years is likely to be minimal in comparison. We know that adapting internal documentation for public consumption can be a challenge, but adjusting to this change of audience is a ‘one off’ rather than a recurring cost to the applicant and supports effective communication with depositors, users and funders.

Being prepared to make public statements about practice is critical to Core certification as the applicant must eventually present their claims before all their peers, as well as before the reviewers. This is important in a process which involves re-certification only every three years without the ongoing internal and external audits which form part of the ISO process.

Since the post was originally published the Primary Trustworthy Digital Repository Authorisation Body Ltd (PTAB at iso16363.org) has been accredited to perform certifications against ISO16363 by NABCB in India. These will be undertaken in line with the standard requirements set out in ISO17021 and further clarified for repositories in ISO16919. These processes and the ISO16363 metrics themselves can now be validated and evaluated in the real world.

Though some adoption is indicated by the (similar, in terms of metrics) TRAC audits to date, an appetite for the level of investment required by formal audit to ISO16363 has yet to be demonstrated. In any case it seems unlikely that all the data (research and otherwise) that we value will be held in ISO16363 certified repositories in the immediate future. With this in mind the CoreTrustSeal Board sees value to the community in offering Core certification.

None of the TDR options are specific to a particular discipline or type of data (WDS and DSA originally came from the ‘hard’ science and social science/humanities sectors respectively). Stakeholder perception, legal obligations, appetite for financial investment, and size of organisation may be better indicators of which certification choice a repository makes. For organisations with large budgets the financial and administrative implications of ISO certification may be offset by the benefits of rigorous and repeated external review and the perceived reputation of ISO as a standards body. Repositories holding high security level data or holding sensitive personal data may take the ISO16363 route, though ISO27001 for information security (as referenced by ISO16363) may be more mission-relevant in these cases. There is no undertaking for organisations audited against ISO16363 to make their evidence public so there is no comparable contribution to the community knowledge base.

David Giaretta (the driving force behind the OAIS, ISO16363 and the PTAB) expresses some doubts about the notion of a stepped framework of increasing trustworthiness in his comments below the original post.

Of course, the existence of any perceived hierarchy depends on standards which serve a need and are in demand. It is ultimately an individual and community choice. Several current DSA Board members’ organisations undertook test audits against ISO16363 when the standard was being finalised and the quality and potential value of the standard as a formal certification approach is not in doubt. OAIS remains the common community reference point for all the referenced standards so any effort to undertake the Core certification route will remain of value when approaching Extended and Formal Certification.

Add New Comment

Posting Calendar

October 2017

Sun Mon Tue Wed Thu Fri Sat
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
 
13
 
14
 
15
 
16
 
17
 
18
 
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
31
 
 
 
 
 
Back to Top