Título: | CORPUS FOR ACADEMIC DOMAIN: MODELS AND APPLICATIONS | ||||||||||||
Autor: |
IVAN DE JESUS PEREIRA PINTO |
||||||||||||
Colaborador(es): |
SERGIO COLCHER - Orientador |
||||||||||||
Catalogação: | 16/NOV/2021 | Língua(s): | PORTUGUESE - BRAZIL |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=55901&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=55901&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.55901 | ||||||||||||
Resumo: | |||||||||||||
Academic data (i.e., Thesis, Dissertation) encompasses aspects of a whole society, as well as its scientific knowledge. There is a wealth of information to be explored by computational models, and that can be positive for society.
Machine learning models in particular, have an increasing need for training
data, that are efficient and of considerable size. Its use in the area of natural language processing (NLP) is pervasive in many different tasks.
This work makes the effort of collecting, constructing, analyzing and
training of models for the biggest known academic corpus in the Portuguese
language. Word embeddings, bag of words and transformers models have been
trained. The Bert-Academico has shown the better result, with 77 percent of f1-score in Great area of knowledge and 63 percent in knowledge area classification of Thesis and Dissertation.
A semantic analysis of the academic corpus is made through topic
modelling, and an unprecedented visualization of the knowledge areas is
presented. Lastly, an application that uses the trained models is showcased,
the SucupiraBot.
|
|||||||||||||
|