Data-Driven Supervised Learning for Life Science Data

Maximilian Münch, Christoph Raab, Michael Biehl, Frank-Michael Schleif

Onderzoeksoutput: ArticleAcademicpeer review

4 Citaten (Scopus)
42 Downloads (Pure)


Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms.

Originele taal-2English
TijdschriftFrontiers in Applied Mathematics and Statistics
StatusPublished - 6-nov.-2020

Citeer dit