TY - JOUR
T1 - A systematic mapping study on graph machine learning for static source code analysis
AU - Maarleveld, Jesse
AU - Guo, Jiapan
AU - Feitosa, Daniel
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/7
Y1 - 2025/7
N2 - Context: In recent years, graph machine learning and particularly graph neural networks have seen successful and widespread applications in many fields, including static source code analysis. Such machine learning techniques enable learning on rich information networks capable of representing different relations and entities. However, there have been no comprehensive studies investigating the use of graph machine learning for static source code analysis. There is no complete systematic picture of what techniques may be considered tried and tested, and where opportunities for future improvements can still be found. Objective: The main goal of this study is to provide a broad overview of the state of the art of static source code analysis using graph machine learning. Methods: A systematic mapping was performed covering 4499 studies, presenting a final selection of 323 primary studies. Results: Among the selected studies, seven major sub-domains were identified. The use and combinations of artefacts, different graph representations, different features, and different machine learning models used were collected and categorised. Conclusions: The use of graph learning, and in particular graph neural networks, has increased significantly since 2018. Although a wide variety of methods is used, across every dimension we investigated (artefacts, graphs, features, models), we found small sets of technologies which are used in the vast majority of studies. Future opportunities lie in exploring under-explored domains more thoroughly, exploring the use of additional artefacts alongside source code, and paying more attention to interpretability and explainability.
AB - Context: In recent years, graph machine learning and particularly graph neural networks have seen successful and widespread applications in many fields, including static source code analysis. Such machine learning techniques enable learning on rich information networks capable of representing different relations and entities. However, there have been no comprehensive studies investigating the use of graph machine learning for static source code analysis. There is no complete systematic picture of what techniques may be considered tried and tested, and where opportunities for future improvements can still be found. Objective: The main goal of this study is to provide a broad overview of the state of the art of static source code analysis using graph machine learning. Methods: A systematic mapping was performed covering 4499 studies, presenting a final selection of 323 primary studies. Results: Among the selected studies, seven major sub-domains were identified. The use and combinations of artefacts, different graph representations, different features, and different machine learning models used were collected and categorised. Conclusions: The use of graph learning, and in particular graph neural networks, has increased significantly since 2018. Although a wide variety of methods is used, across every dimension we investigated (artefacts, graphs, features, models), we found small sets of technologies which are used in the vast majority of studies. Future opportunities lie in exploring under-explored domains more thoroughly, exploring the use of additional artefacts alongside source code, and paying more attention to interpretability and explainability.
KW - Graph machine learning
KW - Graph neural networks
KW - Static source code analysis
KW - Systematic mapping study
UR - http://www.scopus.com/inward/record.url?scp=105001290883&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2025.107722
DO - 10.1016/j.infsof.2025.107722
M3 - Review article
AN - SCOPUS:105001290883
SN - 0950-5849
VL - 183
JO - Information and Software Technology
JF - Information and Software Technology
M1 - 107722
ER -