TY - JOUR
T1 - Subspace corrected relevance learning with application in neuroimaging
AU - van Veen, Rick
AU - Tamboli, Neha Rajendra Bari
AU - Lövdal, Sofie
AU - Meles, Sanne K.
AU - Renken, Remco J.
AU - de Vries, Gert Jan
AU - Arnaldi, Dario
AU - Morbelli, Silvia
AU - Clavero, Pedro
AU - Obeso, José A.
AU - Oroz, Maria C.Rodriguez
AU - Leenders, Klaus L.
AU - Villmann, Thomas
AU - Biehl, Michael
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/3
Y1 - 2024/3
N2 - In machine learning, data often comes from different sources, but combining them can introduce extraneous variation that affects both generalization and interpretability. For example, we investigate the classification of neurodegenerative diseases using FDG-PET data collected from multiple neuroimaging centers. However, data collected at different centers introduces unwanted variation due to differences in scanners, scanning protocols, and processing methods. To address this issue, we propose a two-step approach to limit the influence of center-dependent variation on the classification of healthy controls and early vs. late-stage Parkinson's disease patients. First, we train a Generalized Matrix Learning Vector Quantization (GMLVQ) model on healthy control data to identify a “relevance space” that distinguishes between centers. Second, we use this space to construct a correction matrix that restricts a second GMLVQ system's training on the diagnostic problem. We evaluate the effectiveness of this approach on the real-world multi-center datasets and simulated artificial dataset. Our results demonstrate that the approach produces machine learning systems with reduced bias - being more specific due to eliminating information related to center differences during the training process - and more informative relevance profiles that can be interpreted by medical experts. This method can be adapted to similar problems outside the neuroimaging domain, as long as an appropriate “relevance space” can be identified to construct the correction matrix.
AB - In machine learning, data often comes from different sources, but combining them can introduce extraneous variation that affects both generalization and interpretability. For example, we investigate the classification of neurodegenerative diseases using FDG-PET data collected from multiple neuroimaging centers. However, data collected at different centers introduces unwanted variation due to differences in scanners, scanning protocols, and processing methods. To address this issue, we propose a two-step approach to limit the influence of center-dependent variation on the classification of healthy controls and early vs. late-stage Parkinson's disease patients. First, we train a Generalized Matrix Learning Vector Quantization (GMLVQ) model on healthy control data to identify a “relevance space” that distinguishes between centers. Second, we use this space to construct a correction matrix that restricts a second GMLVQ system's training on the diagnostic problem. We evaluate the effectiveness of this approach on the real-world multi-center datasets and simulated artificial dataset. Our results demonstrate that the approach produces machine learning systems with reduced bias - being more specific due to eliminating information related to center differences during the training process - and more informative relevance profiles that can be interpreted by medical experts. This method can be adapted to similar problems outside the neuroimaging domain, as long as an appropriate “relevance space” can be identified to construct the correction matrix.
KW - Generalized Matrix Learning Vector Quantization (GMLVQ)
KW - Learning vector quantization
KW - Multi-source data
KW - Neuroimaging
KW - Relevance learning
UR - http://www.scopus.com/inward/record.url?scp=85183826493&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2024.102786
DO - 10.1016/j.artmed.2024.102786
M3 - Article
AN - SCOPUS:85183826493
SN - 0933-3657
VL - 149
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
M1 - 102786
ER -