TY - JOUR
T1 - Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology
AU - Sourlos, Nikos
AU - Vliegenthart, Rozemarijn
AU - Santinha, Joao
AU - Klontzas, Michail E.
AU - Cuocolo, Renato
AU - Huisman, Merel
AU - van Ooijen, Peter
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Various healthcare domains have witnessed successful preliminary implementation of artificial intelligence (AI) solutions, including radiology, though limited generalizability hinders their widespread adoption. Currently, most research groups and industry have limited access to the data needed for external validation studies. The creation and accessibility of benchmark datasets to validate such solutions represents a critical step towards generalizability, for which an array of aspects ranging from preprocessing to regulatory issues and biostatistical principles come into play. In this article, the authors provide recommendations for the creation of benchmark datasets in radiology, explain current limitations in this realm, and explore potential new approaches.Clinical relevance statement: Benchmark datasets, facilitating validation of AI software performance can contribute to the adoption of AI in clinical practice.Key Points: Benchmark datasets are essential for the validation of AI software performance. Factors like image quality and representativeness of cases should be considered. Benchmark datasets can help adoption by increasing the trustworthiness and robustness of AI. Graphical Abstract: (Figure presented.)
AB - Various healthcare domains have witnessed successful preliminary implementation of artificial intelligence (AI) solutions, including radiology, though limited generalizability hinders their widespread adoption. Currently, most research groups and industry have limited access to the data needed for external validation studies. The creation and accessibility of benchmark datasets to validate such solutions represents a critical step towards generalizability, for which an array of aspects ranging from preprocessing to regulatory issues and biostatistical principles come into play. In this article, the authors provide recommendations for the creation of benchmark datasets in radiology, explain current limitations in this realm, and explore potential new approaches.Clinical relevance statement: Benchmark datasets, facilitating validation of AI software performance can contribute to the adoption of AI in clinical practice.Key Points: Benchmark datasets are essential for the validation of AI software performance. Factors like image quality and representativeness of cases should be considered. Benchmark datasets can help adoption by increasing the trustworthiness and robustness of AI. Graphical Abstract: (Figure presented.)
KW - Artificial intelligence (AI) software
KW - Benchmark dataset
KW - Bias
KW - Radiology
KW - Validation
UR - http://www.scopus.com/inward/record.url?scp=85206351465&partnerID=8YFLogxK
U2 - 10.1186/s13244-024-01833-2
DO - 10.1186/s13244-024-01833-2
M3 - Article
AN - SCOPUS:85206351465
SN - 1869-4101
VL - 15
JO - Insights into Imaging
JF - Insights into Imaging
M1 - 248
ER -