Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: ISSUES THAT LEAD TO CODE TECHNICAL DEBT IN MACHINE LEARNING SYSTEMS
Autor: RODRIGO GALDINO XIMENES
Colaborador(es): MARCOS KALINOWSKI - Orientador
TATIANA ESCOVEDO - Coorientador
Catalogação: 10/SET/2024 Língua(s): ENGLISH - UNITED STATES
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=67941&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=67941&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.67941
Resumo:
[Context] Technical debt (TD) in machine learning (ML) systems, much like its counterpart in software engineering (SE), holds the potential to lead to future rework, posing risks to productivity, quality, and team morale. However, better understanding code-related issues leading to TD in ML systems is still a green field. [Objective] This dissertation aims to identify and discuss the relevance of code-related issues leading to TD in ML code throughout the ML life cycle. [Method] Initially, the study generated a list of potential factors that may contribute to accruing TD in ML code. This compilation was achieved by looking at the phases of the ML life cycle along with their usual tasks. Subsequently, the identified issues were refined by evaluating their prevalence and relevance in causing TD in ML code. This refinement process involved soliciting feedback from industry professionals during two focus group sessions. [Results] The study compiled a list of 34 potential issues contributing to TD in the source code of ML systems. Through two focus group sessions with nine participants, this list was refined into 30 issues leading to ML code-related TD, with 24 considered highly relevant. The data pre-processing phase was the most critical, with 14 issues considered highly relevant in potentially leading to severe ML code TD. Five issues were considered highly relevant in the model creation and training phase and four in the data collection phase. The final list of issues is available to the community. [Conclusion] The list can help to raise awareness on issues to be addressed throughout the ML life cycle to minimize accruing TD, helping to improve the maintainability of ML systems.
Descrição: Arquivo:   
COMPLETE PDF