TY - GEN
T1 - Automatic Identification of Assumptions from the Hibernate Developer Mailing List
AU - Li, Ruiyin
AU - Liang, Peng
AU - Yang, Chen
AU - Digkas, Georgios
AU - Chatzigeorgiou, Alexander
AU - Xiong, Zhuang
N1 - Funding Information:
ACKNOWLEDGMENTS This work is partially sponsored by the National Key R&D Program of China with Grant No. 2018YFB1402800. The authors gratefully acknowledge the financial support from the China Scholarship Council.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 'Assumption' sentences and 400 'Non-Assumption' sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.
AB - During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 'Assumption' sentences and 400 'Non-Assumption' sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.
KW - Assumption
KW - Automatic Identification
KW - Hibernate
KW - Mailing List
KW - Open Source Software
UR - http://www.scopus.com/inward/record.url?scp=85078098471&partnerID=8YFLogxK
U2 - 10.1109/APSEC48747.2019.00060
DO - 10.1109/APSEC48747.2019.00060
M3 - Conference contribution
AN - SCOPUS:85078098471
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 394
EP - 401
BT - Proceedings - 2019 26th Asia-Pacific Software Engineering Conference, APSEC 2019
PB - IEEE Computer Society
T2 - 26th Asia-Pacific Software Engineering Conference, APSEC 2019
Y2 - 2 December 2019 through 5 December 2019
ER -