This thesis presents a set of intrinsically interpretable machine learning models which were applied on real-world medical datasets, a synthetic dataset, and a publicly available dataset from the UCI repository, which posed the challenges of heterogeneous measurements, imbalanced classes, and systematic missingness. The interpretability of the presented set of classifiers are in terms of (1) the classifier's confidence in assigning a class label to a presented sample (instead of just crisp labels), (2) straightforward visualization of the decision boundaries of a presented problem as learned by the classifier, (3) implicit feature relevance computation, and (4) extraction of typical profile(s) of each of the learned classes (prototypes) by the classifier. These newly introduced set of classifiers are nearest prototype based classifiers (NPCs) which belong to the family of Learning Vector Quantization (LVQ). This thesis first presents the angle-dissimilarity based variants of Generalized Relevance LVQ (GRLVQ), Generalized Matrix Relevance LVQ (GMLVQ), Local metric tensor LVQ (LGMLVQ) and Localized Limited Rank Metric LVQ (LLiRAM LVQ). Next, probabilistic variants of the GMLVQ and angle GMLVQ are presented. These newly developed models not just have comparable performance to that of Random Forests, they also help in medical knowledge-extraction from the dataset they are trained on. In this thesis we introduced a geodesic averaging technique which combined the power of ensembling while maintaining the interpretability aspect of the LVQ models.
|Qualification||Doctor of Philosophy|
|Place of Publication||[Groningen]|
|Publication status||Published - 2021|