| Título: | CODE SMELL DETECTION IN PYTHON SYSTEMS WITH MACHINE LEARNING | ||||||||||||
| Autor(es): |
ALEXANDRE CESAR BRANDAO DE ANDRADE |
||||||||||||
| Colaborador(es): |
JULIANA ALVES PEREIRA - Orientador |
||||||||||||
| Catalogação: | 25/MAR/2026 | Língua(s): | PORTUGUESE - BRAZIL |
||||||||||
| Tipo: | TEXT | Subtipo: | SENIOR PROJECT | ||||||||||
| Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
| Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75818@1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75818@2 |
||||||||||||
| DOI: | https://doi.org/10.17771/PUCRio.acad.75818 | ||||||||||||
| Resumo: | |||||||||||||
|
The internal quality of code directly impacts the maintenance and
evolution of software systems. Code smells are one of the main indicators of the
degradation of this quality. This work investigates the use of Machine Learning
models for detecting code smells as identified by PySmell in Python projects,
based on static code metrics. Six smells were considered – Large Class, Long
Method, Long Parameter List, Long Message Chain, Long Scope Chaining and
Long Base Class List. From an expert-labeled dataset, a pipeline was built that
integrates automatic metric extraction and the training of several supervised
algorithms. The results indicate that Machine Learning models can reproduce
with high fidelity the labels of most of the evaluated code smells. However,
code smells that depend on call structure remain difficult to capture using
only the set of considered metrics. It is observed that complexity and size
metrics are generally sufficient for smells based on size and methods, whereas
structural metrics become decisive for smells that depend on inheritance and
relationships among classes and calls. Finally, the study shows that using
static tools to label a reference dataset leads to an artificial inflation of model
performance, characterizing strong data leakage and compromising the validity
of conclusions in scenarios that do not control for this effect. The work,
therefore, offers practical and methodological recommendations for future
research on automated code smell detection in Python.
|
|||||||||||||
|
|||||||||||||