TY - GEN
T1 - Pattern Recognition and Context Prediction of COVID-19 cases in European Countries
AU - Tosayeva, Arzu
AU - Birihanu, Ermiyas
AU - Tashu, Tsegaye Misikir
N1 - Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)
PY - 2023
Y1 - 2023
N2 - The global impact of the COVID-19 pandemic has been significant, which requires data analysis to understand trends and patterns. However, this endeavor is challenging due to the complex transmission dynamics and diverse factors that influence the virus's spread. The data associated with COVID-19 is extensive and constantly evolving, and extracting meaningful insights from it is difficult. Therefore, the objective of this study is to analyze the impact of COVID-19 in various European countries, to identify common patterns, and to make predictions within the relevant context. To accomplish this, we used clustering techniques to reveal patterns in COVID-19 cases among European countries. The implementation involved cluster analysis to estimate labels based on cluster size and density while considering relevant background information. Subsequently, a classification model was applied to the labeled dataset. Using the K-Prototypes algorithm and leveraging the Silhouette score for identification, we determined the optimal number of clusters. These clusters were then combined based on density, and the degree of sparsity was assessed. As a result, two clusters emerged: one labeled as "low chance of infection" and the other as "high chance of infection." Using these results, we implemented a classification algorithm, achieving an accuracy rate of 90%. For this study, we gathered data from five different sources, consolidating them into a single dataset. Our findings demonstrate that combining COVID-19 datasets with diverse features enables trend analysis, while the use of clustering algorithms facilitates successful label identification in unsupervised learning scenarios involving unlabeled data. The density and size of clusters prove valuable in estimating labels, enhancing our overall understanding of the data. Our code is publicly available here.
AB - The global impact of the COVID-19 pandemic has been significant, which requires data analysis to understand trends and patterns. However, this endeavor is challenging due to the complex transmission dynamics and diverse factors that influence the virus's spread. The data associated with COVID-19 is extensive and constantly evolving, and extracting meaningful insights from it is difficult. Therefore, the objective of this study is to analyze the impact of COVID-19 in various European countries, to identify common patterns, and to make predictions within the relevant context. To accomplish this, we used clustering techniques to reveal patterns in COVID-19 cases among European countries. The implementation involved cluster analysis to estimate labels based on cluster size and density while considering relevant background information. Subsequently, a classification model was applied to the labeled dataset. Using the K-Prototypes algorithm and leveraging the Silhouette score for identification, we determined the optimal number of clusters. These clusters were then combined based on density, and the degree of sparsity was assessed. As a result, two clusters emerged: one labeled as "low chance of infection" and the other as "high chance of infection." Using these results, we implemented a classification algorithm, achieving an accuracy rate of 90%. For this study, we gathered data from five different sources, consolidating them into a single dataset. Our findings demonstrate that combining COVID-19 datasets with diverse features enables trend analysis, while the use of clustering algorithms facilitates successful label identification in unsupervised learning scenarios involving unlabeled data. The density and size of clusters prove valuable in estimating labels, enhancing our overall understanding of the data. Our code is publicly available here.
KW - Context prediction
KW - COVID-19
KW - Label estimation
KW - Pattern recognition
UR - http://www.scopus.com/inward/record.url?scp=85175856284&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175856284
T3 - CEUR Workshop Proceedings
SP - 50
EP - 57
BT - 23rd Conference Information Technologies - Applications and Theory, ITAT 2023
PB - CEUR Workshop Proceedings
T2 - 23rd Conference Information Technologies - Applications and Theory, ITAT 2023
Y2 - 22 September 2023 through 26 September 2023
ER -