Título: | ISSUES THAT LEAD TO CODE TECHNICAL DEBT IN MACHINE LEARNING SYSTEMS | ||||||||||||
Autor: |
RODRIGO GALDINO XIMENES |
||||||||||||
Colaborador(es): |
MARCOS KALINOWSKI - Orientador TATIANA ESCOVEDO - Coorientador |
||||||||||||
Catalogação: | 10/SET/2024 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=67941&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=67941&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.67941 | ||||||||||||
Resumo: | |||||||||||||
[Context] Technical debt (TD) in machine learning (ML) systems, much
like its counterpart in software engineering (SE), holds the potential to lead to
future rework, posing risks to productivity, quality, and team morale. However,
better understanding code-related issues leading to TD in ML systems is still
a green field. [Objective] This dissertation aims to identify and discuss the
relevance of code-related issues leading to TD in ML code throughout the ML
life cycle. [Method] Initially, the study generated a list of potential factors that
may contribute to accruing TD in ML code. This compilation was achieved
by looking at the phases of the ML life cycle along with their usual tasks.
Subsequently, the identified issues were refined by evaluating their prevalence
and relevance in causing TD in ML code. This refinement process involved
soliciting feedback from industry professionals during two focus group sessions.
[Results] The study compiled a list of 34 potential issues contributing to TD
in the source code of ML systems. Through two focus group sessions with nine
participants, this list was refined into 30 issues leading to ML code-related
TD, with 24 considered highly relevant. The data pre-processing phase was the
most critical, with 14 issues considered highly relevant in potentially leading to
severe ML code TD. Five issues were considered highly relevant in the model
creation and training phase and four in the data collection phase. The final list
of issues is available to the community. [Conclusion] The list can help to raise
awareness on issues to be addressed throughout the ML life cycle to minimize
accruing TD, helping to improve the maintainability of ML systems.
|
|||||||||||||
|