Runtime Prediction of Filter Unsupervised Feature Selection Methods

Teun van der Weij, Venustiano Soancatl Aguilar, Saúl Solorio-Fernández

Research output: Contribution to journalArticleAcademicpeer-review

13 Downloads (Pure)

Abstract

In recent years, the speed and quality of data analysis have been hindered by an increase in data size, an increase in data dimensionality, and the expensive task of data labeling. Much research has been conducted in the field of Unsupervised Feature Selection (UFS) to counteract this hindrance. Specifically, filter UFS methods are popular due to their simplicity and efficiency in counteracting performance problems in unlabeled data analysis. However, this popularity resulted in a great variety of filter UFS methods, each with their own advantages and disadvantages, making it hard to choose an appropriate method for a particular problem. Unfortunately, an inappropriate method choice can lead to
a decrease in research or project quality, and it can render data analysis unfeasible due to time constraints. Importantly, terminating a method’s
analysis before completion means in most cases that no partial results are
obtained either. Previous works on the evaluation of filter UFS methods
focused mainly on assessing clustering and classification performance. Although very useful, choosing an appropriate method often requires knowledge about the method’s runtime as well. In this paper, we study the runtimes of six popular filter UFS methods using synthetic and real-world datasets. Runtime prediction models were trained on 114 synthetic datasets and tested on 29 real-world datasets. The models showed good performance on four out of the six methods. Finally, we present general runtime guidelines for each method. To the best of our knowledge, this is the first paper that investigates methods’ runtimes in this fashion.
Original languageEnglish
Pages (from-to)138-150
Number of pages13
JournalResearch in Computing Science
Volume150
Issue number8
Publication statusPublished - 2022

Cite this