Automatic Identification of Assumptions from the Hibernate Developer Mailing List

Ruiyin Li, Peng Liang*, Chen Yang, Georgios Digkas, Alexander Chatzigeorgiou, Zhuang Xiong

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

7 Citations (Scopus)
79 Downloads (Pure)

Abstract

During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 'Assumption' sentences and 400 'Non-Assumption' sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.

Original languageEnglish
Title of host publicationProceedings - 2019 26th Asia-Pacific Software Engineering Conference, APSEC 2019
PublisherIEEE Computer Society
Pages394-401
Number of pages8
ISBN (Electronic)9781728146485
DOIs
Publication statusPublished - Dec-2019
Event26th Asia-Pacific Software Engineering Conference, APSEC 2019 - Putrajaya, Malaysia
Duration: 2-Dec-20195-Dec-2019

Publication series

NameProceedings - Asia-Pacific Software Engineering Conference, APSEC
Volume2019-December
ISSN (Print)1530-1362

Conference

Conference26th Asia-Pacific Software Engineering Conference, APSEC 2019
Country/TerritoryMalaysia
CityPutrajaya
Period02/12/201905/12/2019

Keywords

  • Assumption
  • Automatic Identification
  • Hibernate
  • Mailing List
  • Open Source Software

Fingerprint

Dive into the research topics of 'Automatic Identification of Assumptions from the Hibernate Developer Mailing List'. Together they form a unique fingerprint.

Cite this