TFCs

Consulta aos Conteúdos

Título:

CODE SMELL DETECTION IN PYTHON SYSTEMS WITH MACHINE LEARNING

Autor(es):

ALEXANDRE CESAR BRANDAO DE ANDRADE

Colaborador(es):

JULIANA ALVES PEREIRA - Orientador

Catalogação:

25/MAR/2026

Língua(s):

PORTUGUESE - BRAZIL

Tipo:

TEXT

Subtipo:

SENIOR PROJECT

Notas:

[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.

Referência(s):

[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75818@1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75818@2

DOI:

https://doi.org/10.17771/PUCRio.acad.75818

Resumo:

The internal quality of code directly impacts the maintenance and evolution of software systems. Code smells are one of the main indicators of the degradation of this quality. This work investigates the use of Machine Learning models for detecting code smells as identified by PySmell in Python projects, based on static code metrics. Six smells were considered – Large Class, Long Method, Long Parameter List, Long Message Chain, Long Scope Chaining and Long Base Class List. From an expert-labeled dataset, a pipeline was built that integrates automatic metric extraction and the training of several supervised algorithms. The results indicate that Machine Learning models can reproduce with high fidelity the labels of most of the evaluated code smells. However, code smells that depend on call structure remain difficult to capture using only the set of considered metrics. It is observed that complexity and size metrics are generally sufficient for smells based on size and methods, whereas structural metrics become decisive for smells that depend on inheritance and relationships among classes and calls. Finally, the study shows that using static tools to label a reference dataset leads to an artificial inflation of model performance, characterizing strong data leakage and compromising the validity of conclusions in scenarios that do not control for this effect. The work, therefore, offers practical and methodological recommendations for future research on automated code smell detection in Python.

Descrição:			Arquivo:
COMPLETE			PDF