Abstract
Dimensionality reduction methods, also known as projections, are often used to explore multidimensional data in machine learning, data science, and information visualization. However, several such methods, such as the well-known t-distributed stochastic neighbor embedding and its variants, are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct any such projections. We train a deep neural network based on sample set drawn from a given data universe, and their corresponding two-dimensional projections, compute with any user-chosen technique. Next, we use the network to infer projections of any dataset from the same universe. Our approach generates projections with similar characteristics as the learned ones, is computationally two to four orders of magnitude faster than existing projection methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high-dimensional datasets from machine learning.
Original language | English |
---|---|
Pages (from-to) | 247-269 |
Number of pages | 23 |
Journal | Information visualization |
Volume | 19 |
Issue number | 3 |
Early online date | 18-May-2020 |
DOIs | |
Publication status | Published - Jul-2020 |
Keywords
- Dimensionality reduction
- machine learning
- multidimensional projections
- NONLINEAR DIMENSIONALITY REDUCTION
- EIGENMAPS