Título: | CAN MACHINE LEARNING REPLACE A REVIEWER IN THE SELECTION OF STUDIES FOR SYSTEMATIC LITERATURE REVIEW UPDATES? | ||||||||||||
Autor: |
MARCELO COSTALONGA CARDOSO |
||||||||||||
Colaborador(es): |
MARCOS KALINOWSKI - Orientador |
||||||||||||
Catalogação: | 19/SET/2024 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=68121&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=68121&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.68121 | ||||||||||||
Resumo: | |||||||||||||
[Context] The importance of systematic literature reviews (SLRs) to find
and synthesize new evidence for Software Engineering (SE) is well known, yet
performing and keeping SLRs up-to-date is still a big challenge. One of the most
exhaustive activities during an SLR is the study selection because of the large
number of studies to be analyzed. Furthermore, to avoid bias, study selection
should be conducted by more than one reviewer. [Objective] This dissertation
aims to evaluate the use of machine learning (ML) text classification models
to support the study selection in SLR updates and verify if such models can
replace an additional reviewer. [Method] We reproduce the study selection of
an SLR update performed by three experienced researchers, applying the ML
models to the same dataset they used. We used two supervised ML algorithms
with different configurations (Random Forest and Support Vector Machines) to
train the models based on the original SLR. We calculated the study selection
effectiveness of the ML models in terms of precision, recall, and f-measure.
We also compared the level of similarity and agreement between the studies
selected by the ML models and the original reviewers by performing a Kappa
Analysis and Euclidean Distance Analysis. [Results] In our investigation, the
ML models achieved an f-score of 0.33 for study selection, which is insufficient
for conducting the task in an automated way. However, we found that such
models could reduce the study selection effort by 33.9 percent without loss of evidence
(keeping a 100 percent recall), discarding studies with a low probability of being
included. In addition, the ML models achieved a moderate average kappa level
of agreement of 0.42 with the reviewers. [Conclusion] The results indicate that
ML is not ready to replace study selection by human reviewers and may also
not be used to replace the need for an additional reviewer. However, there is
potential for reducing the study selection effort of SLR updates.
|
|||||||||||||
|