Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: A DATA ANNOTATION APPROACH USING LARGE LANGUAGE MODELS
Autor: CARLOS VINICIOS MARTINS ROCHA
Colaborador(es): HELIO CORTES VIEIRA LOPES - Orientador
JONATAS DOS SANTOS GROSMAN - Coorientador
Catalogação: 17/OUT/2024 Língua(s): ENGLISH - UNITED STATES
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=68379&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=68379&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.68379
Resumo:
Documents are essential for the economic and academic system; however, exploring them can be complex and time-consuming. An approach to surpass this problem is the use of Visual Question and Answering (VQA) models to extract information from documents through natural language prompts. In VQA, as well as for the development of various models, it is necessary to have annotated data for training and validation. However, creating these datasets is challenging due to the high cost involved in the process. To face this challenge, we propose a four-step process that combines Computer Vision Models and Large Language Models (LLMs) for VQA data annotation in financial reports. The proposed method starts with recognizing the textual structure of documents through Document Layout Analysis and Table Structure Extraction models. Then, it uses two distinct LLMs for the generation and evaluation of question and answer pairs, automating the construction and selection of the best pairs to compose the final dataset. To evaluate the proposed method, we generate a dataset for train and evaluate VQA specialized models.
Descrição: Arquivo:   
COMPLETE PDF