TY - JOUR
T1 - Early detection of violating Mobile Apps
T2 - A data-driven predictive model approach
AU - Mohsen, Fadi
AU - Karastoyanova, Dimka
AU - Azzopardi, George
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2022/12
Y1 - 2022/12
N2 - Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work, we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features’ relevance that helps the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, which are available before publishing an app. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.
AB - Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work, we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features’ relevance that helps the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, which are available before publishing an app. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.
KW - Actions
KW - Android
KW - App stores
KW - Broadcast receivers
KW - Mobile apps
KW - Permissions
KW - Predictive analysis
KW - Privacy
KW - Third-party apps
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85140021687&partnerID=8YFLogxK
U2 - 10.1016/j.sasc.2022.200045
DO - 10.1016/j.sasc.2022.200045
M3 - Article
AN - SCOPUS:85140021687
VL - 4
JO - Systems and Soft Computing
JF - Systems and Soft Computing
SN - 2772-9419
M1 - 200045
ER -