Replication package of Tracing the Lifecycle of Architecture Technical Debt in Software Systems: A Dependency Approach

Dataset

Description

Replication package of Tracing the Lifecycle of Architecture Technical Debt in Software Systems: A Dependency Approach Description of this study: Architectural technical debt (ATD) represents trade-offs in software architecture that accelerate initial development but create long-term maintenance challenges. ATD, in particular when self-admitted (SATD), impacts the foundational structure of software, making it difficult to detect and resolve. This study investigates the lifecyle of ATD, focusing on how it affects i) the connectivity between classes, and ii) the frequency of file modifications. We aim to understand how ATD evolves from introduction to repayment, and its implications on software architectures. Our empirical approach was applied to a dataset of SATD from various software artifacts, such as commit messages and issue tracking systems. We isolated ATD instances, filtered for architectural indicators, and calculated dependencies at different lifecycle stages using FAN-IN and FAN-OUT metrics. Statistical analyses, including the Mann-Whitney U test and Cliff’s delta, were used to assess the significance and effect size of connectivity and dependency changes over time. We observed that ATD repayment increased class connectivity, with FAN-IN increasing by 57% on average and FAN-OUT by 26.7%, suggesting a shift toward centralization and increased architectural complexity post-repayment. Moreover, ATD files were modified less frequently than Non-ATD files, with changes accumulated in high-dependency portions of the code. Our study shows that resolving ATD improves software quality in the short-term, but can make the architecture more complex by centralizing dependencies. Also, even if dependency metrics (like FAN-IN and FAN-OUT) can be helpful for understanding the impact of ATD, they should be combined with other measures to capture other effects of ATD on software maintainability. Contents Dataset atd_final.csv: A CSV containing a list of the 18 ATD items from the trackers of five Apache open-source projects. From the 18 identified ATD items, we identified 5,135 files affected in the introduction phase and 3,553 files in the payment phase. non_atd_final.csv: A CSV containing a list of the 18 ATD items from the trackers of five Apache open-source projects. From the 18 identified Non-ATD items, we identified 753 files in the initial commit phase and 621 files in the recorded phase. Source code commit_hash_retrieval_with_date.py: Script to extract commit hashes with a range of times from a Git repository. to_get_all_changes_updated.py: Script for retrieving and listing all changes or updates in files between introduction and payment. und_count_file_dependencies.py: Script for counting file dependencies using Understand by Scitools. golang-dependency.py: Script to analyze and extract file dependencies in a Go (Golang) project. erlang-dependency.py: Script to analyze and extract file dependencies in an Erlang project. file_path.csv: CSV file containing file paths of ATD-affected files. calculate_sloc.py: Script to calculate the Source Lines of Code (SLOC) for given ATD-affected files. lizard-cyclomatic-complexity.py: Script for calculating the cyclomatic complexity of source code using the Lizard library. data_distribution_log1p.py: Script for analyzing and visualizing data distributions using a log(1+x) transformation. calculate_mean_median_min_max.py: Script to calculate statistical values like mean, median, minimum, and maximum from a dataset. mann-whitney-u-test.py: Script to perform the Mann-Whitney U test, a non-parametric test for comparing two independent samples. cliffs_delta.py: Script for computing Cliffs_delta, a measure of effect size between two groups. boxplot_number_of_changes.py: Script for generating a boxplot to visualize the number of changes in files. partial-spearmans-correlation.py: Script for calculating partial Spearman’s correlation coefficients between introduction and payment.
Date made available20-Jan-2025
PublisherZENODO

Cite this