OBJECTIVES: To evaluate the performance of a novel convolutional neural network (CNN) for the classification of typical perifissural nodules (PFN).
METHODS: Chest CT data from two centers in the UK and The Netherlands (1668 unique nodules, 1260 individuals) were collected. Pulmonary nodules were classified into subtypes, including "typical PFNs" on-site, and were reviewed by a central clinician. The dataset was divided into a training/cross-validation set of 1557 nodules (1103 individuals) and a test set of 196 nodules (158 individuals). For the test set, three radiologically trained readers classified the nodules into three nodule categories: typical PFN, atypical PFN, and non-PFN. The consensus of the three readers was used as reference to evaluate the performance of the PFN-CNN. Typical PFNs were considered as positive results, and atypical PFNs and non-PFNs were grouped as negative results. PFN-CNN performance was evaluated using the ROC curve, confusion matrix, and Cohen's kappa.
RESULTS: Internal validation yielded a mean AUC of 91.9% (95% CI 90.6-92.9) with 78.7% sensitivity and 90.4% specificity. For the test set, the reader consensus rated 45/196 (23%) of nodules as typical PFN. The classifier-reader agreement (k = 0.62-0.75) was similar to the inter-reader agreement (k = 0.64-0.79). Area under the ROC curve was 95.8% (95% CI 93.3-98.4), with a sensitivity of 95.6% (95% CI 84.9-99.5), and specificity of 88.1% (95% CI 81.8-92.8).
CONCLUSION: The PFN-CNN showed excellent performance in classifying typical PFNs. Its agreement with radiologically trained readers is within the range of inter-reader agreement. Thus, the CNN-based system has potential in clinical and screening settings to rule out perifissural nodules and increase reader efficiency.
KEY POINTS: • Agreement between the PFN-CNN and radiologically trained readers is within the range of inter-reader agreement. • The CNN model for the classification of typical PFNs achieved an AUC of 95.8% (95% CI 93.3-98.4) with 95.6% (95% CI 84.9-99.5) sensitivity and 88.1% (95% CI 81.8-92.8) specificity compared to the consensus of three readers.
|Status||Published - jun.-2021|