Data-Driven Supervised Learning for Life Science Data

Maximilian Münch, Christoph Raab, Michael Biehl, Frank-Michael Schleif

Research output: Contribution to journalArticleAcademicpeer-review

9 Citations (Scopus)
67 Downloads (Pure)


Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms.

Original languageEnglish
JournalFrontiers in Applied Mathematics and Statistics
Publication statusPublished - 6-Nov-2020

Cite this