Abstract
As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy.
Original language | English |
---|---|
Article number | 115895 |
Journal | Expert systems with applications |
Volume | 187 |
Early online date | 16-Sept-2021 |
DOIs | |
Publication status | Published - Jan-2022 |
Keywords
- Feature selection
- Boosting
- Ensemble learning
- XGBoost