Publications | The AURA Lab

2025

Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?

Antonio Vitale, Antonio Mastropaolo, Rocco Oliveto, and 2 more authors

Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC 2025), Jan 2025

To Appear – ICPC 2025

Abs Bib Website

Automated code summarization is a long-standing goal for code comprehension. This task automatically generates documentation using a given method. Deep Learning (DL)-based approaches have been proven beneficial for various software engineering (SE) tasks, including this one. Most state-of-the-art datasets for code summarization are automatically mined from GitHub and, thus, might contain erroneous or sub-optimal examples. Previous work showed that using a simple rule-based approach for removing noisy instances allows for a tangible reduction of the training set size while not reducing the effectiveness of the trained models. Motivated by this finding, we conjecture that it is possible to further reduce the dataset size by removing instances that contain different issues. In this paper, we explore the extent to which code-comment coherence, a specific quality attribute of code summaries, can be used to optimize code summarization datasets. Specifically, we hypothesize that removing incoherent code-comment pairs might positively impact the effectiveness of the models. To do this, we rely on SIDE, a recently introduced metric for code-summary coherence. We examine multiple selectivity levels of training instances from two state-of-the-art datasets (TL-CodeSum and Funcom) and evaluate the resulting models on three manually curated test sets. The results show that even halving the training set sizes does not significantly affect the model’s ability to generate summaries. However, when comparing the most restrictive selection strategy with a simpler one that randomly selects the training instances, we observe that the resulting accuracy of the model also does not change. This result suggests that (i) current datasets contain many irrelevant examples, and (ii) different quality attributes should be explored for optimizing code summarization datasets.
@article{vitale2025optimizing, title = {Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?}, author = {Vitale, Antonio and Mastropaolo, Antonio and Oliveto, Rocco and Di Penta, Massimiliano and Scalabrino, Simone}, journal = {Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC 2025)}, year = {2025}, note = {To Appear -- ICPC 2025}, month = jan, keywords = {Software engineering, Artificial Intelligence, Code Summarization, Optimization, Datasets, LLMs} }
Resource-Efficient & Effective Code Summarization

Saima Afrin, Joseph Call, Khai-Nguyen Nguyen, and 2 more authors

Proceedings of the 2nd ACM international conference on AI Foundation Models and Software Engineering (FORGE 2025), Jan 2025

To Appear

Abs Bib Website

Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models’ sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA’s capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programming languages: Python and Java. Our research tasked models with generating descriptions for Python and Java code methods. The results align with prior findings on QLoRA for source code generation, showing that QLoRA enables efficient fine-tuning of CLMs for code summarization.
@article{afrin2025resource, title = {Resource-Efficient \& Effective Code Summarization}, author = {Afrin, Saima and Call, Joseph and Nguyen, Khai-Nguyen and Chaparro, Oscar and Mastropaolo, Antonio}, journal = {Proceedings of the 2nd ACM international conference on AI Foundation Models and Software Engineering (FORGE 2025)}, year = {2025}, note = {To Appear}, month = jan, keywords = {Software engineering, Artificial Intelligence, Code Summarization, Optimization, Datasets, LLMs} }
Toward Neurosymbolic Program Comprehension

Alejandro Velasco, Aya Garryyeva, David N Palacio, and 2 more authors

Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC-ERA 2025) , Jan 2025

To Appear

Abs Bib Website

Recent advancements in Large Language Models (LLMs) have paved the way for Large Code Models (LCMs), enabling automation in complex software engineering tasks, such as code generation, software testing, and program comprehension, among others. Tools like GitHub Copilot and ChatGPT have shown substantial benefits in supporting developers across various practices. However, the ambition to scale these models to trillion-parameter sizes, exemplified by GPT-4, poses significant challenges that limit the usage of Artificial Intelligence (AI)-based systems powered by large Deep Learning (DL) models. These include rising computational demands for training and deployment and issues related to trustworthiness, bias, and interpretability. Such factors can make managing these models impractical for many organizations, while their "black-box” nature undermines key aspects, including transparency and accountability. In this paper, we question the prevailing assumption that increasing model parameters is always the optimal path forward, provided there is sufficient new data to learn additional patterns. In particular, we advocate for a Neurosymbolic research direction that combines the strengths of existing DL techniques (e.g., LLMs) with traditional symbolic methods–renowned for their reliability, speed, and determinism. To this end, we outline the core features and present preliminary results for our envisioned approach, aimed at establishing the first Neurosymbolic Program Comprehension (NsPC) framework to aid in identifying defective code components.
@article{velasco2025toward, title = {Toward Neurosymbolic Program Comprehension}, author = {Velasco, Alejandro and Garryyeva, Aya and Palacio, David N and Mastropaolo, Antonio and Poshyvanyk, Denys}, year = {2025}, note = {To Appear}, journal = {Proceedings of the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC-ERA 2025) }, month = jan, keywords = {Software engineering, Intelligence, Program Comprehension, Neurosymbolic} }
Is Quantization a Deal-breaker? Empirical Insights from Large Code Models

Saima Afrin, Bowen Xu, and Antonio Mastropaolo

Proceedings of the 41st IEEE International Conference on Software Maintainance and Evolution (ICSME 2025) , Jun 2025

To Appear

Abs Bib Website

The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that can reduce the resource demands of LLMs by decreasing parameter precision without substantially affecting performance (e.g., 16 bit to 4 bit). While recent studies have established quantization as a promising approach for optimizing large code models (LCMs), a specialized subset of LLMs tailored for automated software engineering, their findings offer only limited insights into its practical implications. Specifically, current investigations focus only on the functional correctness of the code generated by quantized models, neglecting how quantization impacts critical aspects of code quality such as reliability, maintainability, and security. To bridge this gap, our study investigates the effects of quantization on the qualitative aspects of automatically generated code. We apply Activation-aware Weight Quantization (AWQ) to two widely used code models, CodeLlama and DeepSeekCoder, to generate Java and Python code. Using state-of-the-art static analysis tools, we evaluate software quality metrics and static features including cyclomatic complexity, cognitive complexity, and lines of code. Our findings reveal that quantization is a robust technique that not only preserves functional correctness, but also retains key qualitative code attributes sought after by developers, such as maintainability and structural simplicity.
@article{afrin2025quantization, title = {Is Quantization a Deal-breaker? Empirical Insights from Large Code Models}, author = {Afrin, Saima and Xu, Bowen and Mastropaolo, Antonio}, journal = {Proceedings of the 41st IEEE International Conference on Software Maintainance and Evolution (ICSME 2025) }, note = {To Appear}, month = jun, year = {2025}, keywords = {Software engineering, Quantization, Efficient Software Engineering, Code Quality} }
A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm

Antonio Mastropaolo, and Denys Poshyvanyk

In Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, Apr 2025

Abs Bib Website

The emergence of Large Code Models (LCMs) has transformed software engineering (SE) automation, driving significant advancements in tasks such as code generation, source code documentation, code review, and bug fixing. However, these advancements come with trade-offs: achieving high performance often entails exponential computational costs, reduced interpretability, and an increasing dependence on data-intensive models with hundreds of billions of parameters. In this paper, we propose Neurosymbolic Software Engineering, in short NSE, as a promising paradigm combining neural learning with symbolic (rule-based) reasoning, while strategically introducing a controlled source of chaos to simulate the complex dynamics of real-world software systems. This hybrid methodology aims to enhance efficiency, reliability, and transparency in AI-driven software engineering, while introducing controlled randomness to adapt to evolving requirements, unpredictable system behaviors, and non-deterministic execution environments. By redefining the core principles of AI-driven software engineering automation, NSE lays the groundwork for solutions that are more adaptable, transparent, and closely aligned with the evolving demands of modern software development practices.
@inproceedings{mastropaolo2025path, title = {A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm}, author = {Mastropaolo, Antonio and Poshyvanyk, Denys}, booktitle = {Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering}, pages = {1358--1362}, month = apr, year = {2025}, keywords = {Software engineering, Neurosymbolic, Position Paper, AI4SE, LLM4Code} }

2024

From Triumph to Uncertainty: The Journey of Software Engineering in the AI Era

Antonio Mastropaolo, Camilo Escobar-Velásquez, and Mario Linares-Vásquez

ACM Trans. Softw. Eng. Methodol., Dec 2024

Just Accepted

Abs Bib Website

Over the last ten years, the realm of Artificial Intelligence (AI) has experienced an explosion of revolutionary breakthroughs, transforming what seemed like a far-off dream into a reality that is now deeply embedded in our everyday lives. AI’s widespread impact is revolutionizing virtually all aspects of human life, and software engineering (SE) is no exception. As we explore this changing landscape, we are faced with questions about what the future holds for SE and how AI will reshape the roles, duties, and methodologies within the field. The introduction of these groundbreaking technologies highlights the inevitable shift towards a new paradigm, suggesting a future where AI’s capabilities may redefine the boundaries of SE, potentially even more than human input.In this paper, we aim at outlining the key elements that, based on our expertise, are vital for the smooth integration of AI into SE, all while preserving the intrinsic human creativity that has been the driving force behind the field. First, we provide a brief description of SE and AI evolution. Afterward, we delve into the intricate interplay between AI-driven automation and human innovation, exploring how these two components can work together to advance SE practices to new methods and standards.
@article{10.1145/3709360, author = {Mastropaolo, Antonio and Escobar-Vel\'{a}squez, Camilo and Linares-V\'{a}squez, Mario}, title = {From Triumph to Uncertainty: The Journey of Software Engineering in the AI Era}, year = {2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, note = {Just Accepted}, journal = {ACM Trans. Softw. Eng. Methodol.}, month = dec, keywords = {Software engineering, Artificial Intelligence, History, AI4SE, LLM4Code} }