TY - JOUR
T1 - Data-Driven Supervised Learning for Life Science Data
AU - Münch, Maximilian
AU - Raab, Christoph
AU - Biehl, Michael
AU - Schleif, Frank-Michael
PY - 2020/11/6
Y1 - 2020/11/6
N2 - Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms.
AB - Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms.
UR - https://www.mendeley.com/catalogue/3559dbf1-e57a-3f04-9937-010a33deab4e/
U2 - 10.3389/fams.2020.553000
DO - 10.3389/fams.2020.553000
M3 - Article
SN - 2297-4687
VL - 6
JO - Frontiers in Applied Mathematics and Statistics
JF - Frontiers in Applied Mathematics and Statistics
ER -