TY - GEN
T1 - Extracting software modules as communities
AU - Sas, Cezar
AU - Capiluppi, Andrea
N1 - Publisher Copyright:
Copyright 2020 for this paper by its authors.
PY - 2020
Y1 - 2020
N2 - Component Based Software Engineering (CBSE) is a development discipline based on the availability of software components, that are described and indexed for internal or external, present or future, reuse. Although the creation of reusable components is requested to be designed from scratch, this is often time consuming and expensive. An alternative is to extract such components from pre-existing OO software. In this work, we compare two different community detection algorithms to perform components extraction from existing software. Considering 'components' as 'communities', the aim is to evaluate how independent, yet cohesive, the components are, when extracted by community detection algorithms. Using a small sample of 3 Java systems, we show how the components can be extracted based on structural information. Furthermore, we consolidate the extracted components using semantic information, to ensure their cohesion. We use three document representation techniques to evaluate the internal cohesion of components. The results show that both algorithms perform well with each having their own strengths. Leiden extracts less cohesive, but better separated, and better clustered components that depend less on similar ones. Infomap, on the other side, creates more cohesive, slightly overlapping clusters that are more likely to depend more on other semantically similar components.
AB - Component Based Software Engineering (CBSE) is a development discipline based on the availability of software components, that are described and indexed for internal or external, present or future, reuse. Although the creation of reusable components is requested to be designed from scratch, this is often time consuming and expensive. An alternative is to extract such components from pre-existing OO software. In this work, we compare two different community detection algorithms to perform components extraction from existing software. Considering 'components' as 'communities', the aim is to evaluate how independent, yet cohesive, the components are, when extracted by community detection algorithms. Using a small sample of 3 Java systems, we show how the components can be extracted based on structural information. Furthermore, we consolidate the extracted components using semantic information, to ensure their cohesion. We use three document representation techniques to evaluate the internal cohesion of components. The results show that both algorithms perform well with each having their own strengths. Leiden extracts less cohesive, but better separated, and better clustered components that depend less on similar ones. Infomap, on the other side, creates more cohesive, slightly overlapping clusters that are more likely to depend more on other semantically similar components.
KW - Community Detection
KW - Components Identification
KW - Components Semantic Analysis
UR - http://www.scopus.com/inward/record.url?scp=85111386975&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85111386975
T3 - CEUR Workshop Proceedings
SP - Code 170546
BT - BENEVOL 2020
A2 - Papadakis, Mike
A2 - Cordy, Maxime
PB - CEUR-WS.org
T2 - 19th Belgium-Netherlands Software Evolution Workshop, BENEVOL 2020
Y2 - 3 December 2020 through 4 December 2020
ER -