Publications
2025
- Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?Antonio Vitale, Antonio Mastropaolo, Rocco Oliveto, and 2 more authorsProceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC 2025), Jan 2025To Appear – ICPC 2025
Automated code summarization is a long-standing goal for code comprehension. This task automatically generates documentation using a given method. Deep Learning (DL)-based approaches have been proven beneficial for various software engineering (SE) tasks, including this one. Most state-of-the-art datasets for code summarization are automatically mined from GitHub and, thus, might contain erroneous or sub-optimal examples. Previous work showed that using a simple rule-based approach for removing noisy instances allows for a tangible reduction of the training set size while not reducing the effectiveness of the trained models. Motivated by this finding, we conjecture that it is possible to further reduce the dataset size by removing instances that contain different issues. In this paper, we explore the extent to which code-comment coherence, a specific quality attribute of code summaries, can be used to optimize code summarization datasets. Specifically, we hypothesize that removing incoherent code-comment pairs might positively impact the effectiveness of the models. To do this, we rely on SIDE, a recently introduced metric for code-summary coherence. We examine multiple selectivity levels of training instances from two state-of-the-art datasets (TL-CodeSum and Funcom) and evaluate the resulting models on three manually curated test sets. The results show that even halving the training set sizes does not significantly affect the model’s ability to generate summaries. However, when comparing the most restrictive selection strategy with a simpler one that randomly selects the training instances, we observe that the resulting accuracy of the model also does not change. This result suggests that (i) current datasets contain many irrelevant examples, and (ii) different quality attributes should be explored for optimizing code summarization datasets.
@article{vitale2025optimizing, title = {Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?}, author = {Vitale, Antonio and Mastropaolo, Antonio and Oliveto, Rocco and Di Penta, Massimiliano and Scalabrino, Simone}, journal = {Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC 2025)}, year = {2025}, note = {To Appear -- ICPC 2025}, month = jan, keywords = {Software engineering, Artificial Intelligence, Code Summarization, Optimization, Datasets, LLMs} }
- Resource-Efficient & Effective Code SummarizationSaima Afrin, Joseph Call, Khai-Nguyen Nguyen, and 2 more authorsProceedings of the 2nd ACM international conference on AI Foundation Models and Software Engineering (FORGE 2025), Jan 2025To Appear
Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models’ sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA’s capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programming languages: Python and Java. Our research tasked models with generating descriptions for Python and Java code methods. The results align with prior findings on QLoRA for source code generation, showing that QLoRA enables efficient fine-tuning of CLMs for code summarization.
@article{afrin2025resource, title = {Resource-Efficient \& Effective Code Summarization}, author = {Afrin, Saima and Call, Joseph and Nguyen, Khai-Nguyen and Chaparro, Oscar and Mastropaolo, Antonio}, journal = {Proceedings of the 2nd ACM international conference on AI Foundation Models and Software Engineering (FORGE 2025)}, year = {2025}, note = {To Appear}, month = jan, keywords = {Software engineering, Artificial Intelligence, Code Summarization, Optimization, Datasets, LLMs} }
- Toward Neurosymbolic Program ComprehensionAlejandro Velasco, Aya Garryyeva, David N Palacio, and 2 more authorsProceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC-ERA 2025) , Jan 2025To Appear
Recent advancements in Large Language Models (LLMs) have paved the way for Large Code Models (LCMs), enabling automation in complex software engineering tasks, such as code generation, software testing, and program comprehension, among others. Tools like GitHub Copilot and ChatGPT have shown substantial benefits in supporting developers across various practices. However, the ambition to scale these models to trillion-parameter sizes, exemplified by GPT-4, poses significant challenges that limit the usage of Artificial Intelligence (AI)-based systems powered by large Deep Learning (DL) models. These include rising computational demands for training and deployment and issues related to trustworthiness, bias, and interpretability. Such factors can make managing these models impractical for many organizations, while their "black-box” nature undermines key aspects, including transparency and accountability. In this paper, we question the prevailing assumption that increasing model parameters is always the optimal path forward, provided there is sufficient new data to learn additional patterns. In particular, we advocate for a Neurosymbolic research direction that combines the strengths of existing DL techniques (e.g., LLMs) with traditional symbolic methods–renowned for their reliability, speed, and determinism. To this end, we outline the core features and present preliminary results for our envisioned approach, aimed at establishing the first Neurosymbolic Program Comprehension (NsPC) framework to aid in identifying defective code components.
@article{velasco2025toward, title = {Toward Neurosymbolic Program Comprehension}, author = {Velasco, Alejandro and Garryyeva, Aya and Palacio, David N and Mastropaolo, Antonio and Poshyvanyk, Denys}, year = {2025}, note = {To Appear}, journal = {Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC-ERA 2025) }, month = jan, keywords = {Software engineering, Intelligence, Program Comprehension, Neurosymbolic} }
2024
- From Triumph to Uncertainty: The Journey of Software Engineering in the AI EraAntonio Mastropaolo, Camilo Escobar-Velásquez, and Mario Linares-VásquezACM Trans. Softw. Eng. Methodol., Dec 2024Just Accepted
Over the last ten years, the realm of Artificial Intelligence (AI) has experienced an explosion of revolutionary breakthroughs, transforming what seemed like a far-off dream into a reality that is now deeply embedded in our everyday lives. AI’s widespread impact is revolutionizing virtually all aspects of human life, and software engineering (SE) is no exception. As we explore this changing landscape, we are faced with questions about what the future holds for SE and how AI will reshape the roles, duties, and methodologies within the field. The introduction of these groundbreaking technologies highlights the inevitable shift towards a new paradigm, suggesting a future where AI’s capabilities may redefine the boundaries of SE, potentially even more than human input.In this paper, we aim at outlining the key elements that, based on our expertise, are vital for the smooth integration of AI into SE, all while preserving the intrinsic human creativity that has been the driving force behind the field. First, we provide a brief description of SE and AI evolution. Afterward, we delve into the intricate interplay between AI-driven automation and human innovation, exploring how these two components can work together to advance SE practices to new methods and standards.
@article{10.1145/3709360, author = {Mastropaolo, Antonio and Escobar-Vel\'{a}squez, Camilo and Linares-V\'{a}squez, Mario}, title = {From Triumph to Uncertainty: The Journey of Software Engineering in the AI Era}, year = {2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, note = {Just Accepted}, journal = {ACM Trans. Softw. Eng. Methodol.}, month = dec, keywords = {Software engineering, Artificial Intelligence, History, AI4SE, LLM4Code} }