Título: | SUMARIZATION OF HEALTH SCIENCE PAPERS IN PORTUGUESE | ||||||||||||
Autor: |
DAYSON NYWTON C R DO NASCIMENTO |
||||||||||||
Colaborador(es): |
HELIO CORTES VIEIRA LOPES - Orientador FERNANDO ALBERTO CORREIA DOS SANTOS JUNIOR - Coorientador |
||||||||||||
Catalogação: | 30/OUT/2023 | Língua(s): | PORTUGUESE - BRAZIL |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=64511&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=64511&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.64511 | ||||||||||||
Resumo: | |||||||||||||
In this work, we present a study on the fine-tuning of a pre-trained Large
Language Model for abstractive summarization of long texts in Portuguese. To
do so, we built a corpus gathering a collection of 7,450 public Health Sciences
papers in Portuguese. We fine-tuned a pre-trained BERT model for Brazilian
Portuguese (the BERTimbau) with this corpus. In a similar condition, we also
trained a second model based on Long Short-Term Memory (LSTM) from
scratch for comparison purposes. Our evaluation showed that the fine-tuned
model achieved higher ROUGE scores, outperforming the LSTM based by 30
points for F1-score. The fine-tuning of the pre-trained model also stands out in
a qualitative evaluation performed by assessors, to the point of generating the
perception that the generated summaries could have been created by humans
in a specific collection of documents in the Health Sciences domain.
|
|||||||||||||
|