TY - JOUR
T1 - Identifiers for the 21st century
T2 - How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
AU - McMurry, Julie A.
AU - Juty, Nick
AU - Blomberg, Niklas
AU - Burdett, Tony
AU - Conlin, Tom
AU - Conte, Nathalie
AU - Courtot, Melanie
AU - Deck, John
AU - Dumontier, Michel
AU - Fellows, Donal K.
AU - Gonzalez-Beltran, Alejandra
AU - Gormanns, Philipp
AU - Grethe, Jeffrey
AU - Hastings, Janna
AU - Heriche, Jean-Karim
AU - Hermjakob, Henning
AU - Ison, Jon C.
AU - Jimenez, Rafael C.
AU - Jupp, Simon
AU - Kunze, John
AU - Laibe, Camille
AU - Le Novere, Nicolas
AU - Malone, James
AU - Martin, Maria Jesus
AU - McEntyre, Johanna R.
AU - Morris, Chris
AU - Muilu, Juha
AU - Mueller, Wolfgang
AU - Rocca-Serra, Philippe
AU - Sansone, Susanna-Assunta
AU - Sariyar, Murat
AU - Snoep, Jacky L.
AU - Soiland-Reyes, Stian
AU - Stanford, Natalie J.
AU - Swainston, Neil
AU - Washington, Nicole
AU - Williams, Alan R.
AU - Wimalaratne, Sarala M.
AU - Winfree, Lilly M.
AU - Wolstencroft, Katherine
AU - Goble, Carole
AU - Mungall, Christopher J.
AU - Haendel, Melissa A.
AU - Parkinson, Helen
PY - 2017/6/29
Y1 - 2017/6/29
N2 - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
AB - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
KW - GENE NAME ERRORS
KW - ONTOLOGIES
KW - COMMUNITY
U2 - 10.1371/journal.pbio.2001414
DO - 10.1371/journal.pbio.2001414
M3 - Comment/Letter to the editor
C2 - 28662064
SN - 1545-7885
VL - 15
JO - PLOS BIOLOGY
JF - PLOS BIOLOGY
IS - 6
M1 - 2001414
ER -