TY - JOUR
T1 - Development and external validation of automated detection, classification, and localization of ankle fractures
T2 - inside the black box of a convolutional neural network (CNN)
AU - Machine Learning Consortium
AU - Prijs, Jasper
AU - Liao, Zhibin
AU - To, Minh Son
AU - Verjans, Johan
AU - Jutte, Paul C.
AU - Stirler, Vincent
AU - Olczak, Jakub
AU - Gordon, Max
AU - Guss, Daniel
AU - DiGiovanni, Christopher W.
AU - Jaarsma, Ruurd L.
AU - IJpma, Frank F.A.
AU - Doornberg, Job N.
AU - Aksakal, Kaan
AU - Barvelink, Britt
AU - Beuker, Benn
AU - Bultra, Anne Eva
AU - Oliviera, Luisa e.Carmo
AU - Colaris, Joost
AU - de Klerk, Huub
AU - Duckworth, Andrew
AU - ten Duis, Kaj
AU - Fennema, Eelco
AU - Harbers, Jorrit
AU - Hendrickx, Ran
AU - Heng, Merilyn
AU - Hoeksema, Sanne
AU - Hogervorst, Mike
AU - Jadav, Bhavin
AU - Jiang, Julie
AU - Karhade, Aditya
AU - Kerkhoffs, Gino
AU - Kuipers, Joost
AU - Laane, Charlotte
AU - Langerhuizen, David
AU - Lubberts, Bart
AU - Mallee, Wouter
AU - Mhmud, Haras
AU - El Moumni, Mostafa
AU - Nieboer, Patrick
AU - Oude Nijhuis, Koen
AU - van Ooijen, Peter
AU - Oosterhoff, Jacobien
AU - Rawat, Jai
AU - Ring, David
AU - Schilstra, Sanne
AU - Schwab, Jospeph
AU - Sprague, Sheila
AU - de Vries, Jean Paul
AU - Wendt, Klaus
N1 - Funding Information:
One author (JP) certifies that he has received, an amount less than USD 15,000 from the Michael van Vloten Foundation (Rotterdam, The Netherlands), an amount less than USD 10,000 from ZonMw (Den Haag, The Netherlands), and an amount less than USD 10,000 from the Prins Bernhard Cultuur Fonds (Amsterdam, The Netherlands). One author (JND) certifies that he has received an unrestricted Postdoc Research Grant from the Marti-Keuning-Eckhardt Foundation.
Publisher Copyright:
© 2022, The Author(s).
PY - 2023
Y1 - 2023
N2 - Purpose: Convolutional neural networks (CNNs) are increasingly being developed for automated fracture detection in orthopaedic trauma surgery. Studies to date, however, are limited to providing classification based on the entire image—and only produce heatmaps for approximate fracture localization instead of delineating exact fracture morphology. Therefore, we aimed to answer (1) what is the performance of a CNN that detects, classifies, localizes, and segments an ankle fracture, and (2) would this be externally valid?Methods: The training set included 326 isolated fibula fractures and 423 non-fracture radiographs. The Detectron2 implementation of the Mask R-CNN was trained with labelled and annotated radiographs. The internal validation (or ‘test set’) and external validation sets consisted of 300 and 334 radiographs, respectively. Consensus agreement between three experienced fellowship-trained trauma surgeons was defined as the ground truth label. Diagnostic accuracy and area under the receiver operator characteristic curve (AUC) were used to assess classification performance. The Intersection over Union (IoU) was used to quantify accuracy of the segmentation predictions by the CNN, where a value of 0.5 is generally considered an adequate segmentation.Results: The final CNN was able to classify fibula fractures according to four classes (Danis-Weber A, B, C and No Fracture) with AUC values ranging from 0.93 to 0.99. Diagnostic accuracy was 89% on the test set with average sensitivity of 89% and specificity of 96%. External validity was 89–90% accurate on a set of radiographs from a different hospital. Accuracies/AUCs observed were 100/0.99 for the ‘No Fracture’ class, 92/0.99 for ‘Weber B’, 88/0.93 for ‘Weber C’, and 76/0.97 for ‘Weber A’. For the fracture bounding box prediction by the CNN, a mean IoU of 0.65 (SD ± 0.16) was observed. The fracture segmentation predictions by the CNN resulted in a mean IoU of 0.47 (SD ± 0.17).Conclusions: This study presents a look into the ‘black box’ of CNNs and represents the first automated delineation (segmentation) of fracture lines on (ankle) radiographs. The AUC values presented in this paper indicate good discriminatory capability of the CNN and substantiate further study of CNNs in detecting and classifying ankle fractures.Level of evidence: II, Diagnostic imaging study.
AB - Purpose: Convolutional neural networks (CNNs) are increasingly being developed for automated fracture detection in orthopaedic trauma surgery. Studies to date, however, are limited to providing classification based on the entire image—and only produce heatmaps for approximate fracture localization instead of delineating exact fracture morphology. Therefore, we aimed to answer (1) what is the performance of a CNN that detects, classifies, localizes, and segments an ankle fracture, and (2) would this be externally valid?Methods: The training set included 326 isolated fibula fractures and 423 non-fracture radiographs. The Detectron2 implementation of the Mask R-CNN was trained with labelled and annotated radiographs. The internal validation (or ‘test set’) and external validation sets consisted of 300 and 334 radiographs, respectively. Consensus agreement between three experienced fellowship-trained trauma surgeons was defined as the ground truth label. Diagnostic accuracy and area under the receiver operator characteristic curve (AUC) were used to assess classification performance. The Intersection over Union (IoU) was used to quantify accuracy of the segmentation predictions by the CNN, where a value of 0.5 is generally considered an adequate segmentation.Results: The final CNN was able to classify fibula fractures according to four classes (Danis-Weber A, B, C and No Fracture) with AUC values ranging from 0.93 to 0.99. Diagnostic accuracy was 89% on the test set with average sensitivity of 89% and specificity of 96%. External validity was 89–90% accurate on a set of radiographs from a different hospital. Accuracies/AUCs observed were 100/0.99 for the ‘No Fracture’ class, 92/0.99 for ‘Weber B’, 88/0.93 for ‘Weber C’, and 76/0.97 for ‘Weber A’. For the fracture bounding box prediction by the CNN, a mean IoU of 0.65 (SD ± 0.16) was observed. The fracture segmentation predictions by the CNN resulted in a mean IoU of 0.47 (SD ± 0.17).Conclusions: This study presents a look into the ‘black box’ of CNNs and represents the first automated delineation (segmentation) of fracture lines on (ankle) radiographs. The AUC values presented in this paper indicate good discriminatory capability of the CNN and substantiate further study of CNNs in detecting and classifying ankle fractures.Level of evidence: II, Diagnostic imaging study.
KW - Ankle
KW - Artificial Intelligence
KW - CNN
KW - Lateral Malleolus
U2 - 10.1007/s00068-022-02136-1
DO - 10.1007/s00068-022-02136-1
M3 - Article
AN - SCOPUS:85142065660
SN - 1863-9933
SP - 1057
EP - 1069
JO - European Journal of Trauma and Emergency Surgery
JF - European Journal of Trauma and Emergency Surgery
ER -