On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

Abstract

Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes Point Machine (BPM) and Fisher Score with Bayes Point Machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an FScore of 0.96.

Publication
Innovations in Systems and Software Engineering
Licelot Marmolejos
Licelot Marmolejos
Cybersecurity Associate
Eman Abdullah Alomar
Eman Abdullah Alomar
Assistant Professor of Software Engineering Department
Mohamed Wiem Mkaouer
Mohamed Wiem Mkaouer
Assistant Professor of Software Engineering

Research interests software refactoring and quality.

Christian D. Newman
Christian D. Newman
Assistant Professor

Research interests software refactoring and quality.

Ali Ouni
Ali Ouni
Associate Professor

Research interests software refactoring and quality.