Lexical content as a cooperation aide: A study based on Java software

Andrea Capiluppi*, Nemitari Ajienka

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Collaborative development is a paradigm shift in software development. Loosely coupled developers coordinate their work via distributed versioning systems (SVN, Git, and others), code reviews and priority-led bug tracking systems. This development approach allows many different developers to input additional source code to the same source artifact. This article focuses on the lexical content of the source code produced in a collaborative environment. The lexical content is described as the ‘dictionary’ of the key terms contained within a source artifact. We posit that the lexical content of a Java class will increase as long as more developers add more content to the same class. We analyse the 100 top-ranked GitHub applications (at the time of the sampling) written in Java. Each of their classes is reduced to its lexical content, its size (in LOCs) recorded, as well as the number of different developers who contributed to its source code. Our results show that (i) the lexical content of Java classes is bounded in size, (ii) more developers make the size of the lexical content larger, and (iii) the lexical content of a system's classes might increase with more developers, but depending on its application domain. The implications for practitioners are two-fold: (i) classes with a large set of lexical content should be split in multiple classes, to minimize the need for further maintenance; and (ii) classes developed by many developers should adhere to specific guidelines so that its lexical content does not increase boundlessly. We tested our results in a tailored case study and we confirmed our findings: larger-than-threshold class corpora tend to deteriorate the class cohesion.
Original languageEnglish
Article number110543
JournalJournal of Systems and Software
Publication statusPublished - Jun-2020
Externally publishedYes


  • Clustering
  • Distributed development
  • Information retrieval (IR)
  • Lexical content
  • Object-oriented (OO)
  • Open-source software

Cite this