Logo PUC-Rio Logo Maxwell
TRABALHOS DE FIM DE CURSO @PUC-Rio
Consulta aos Conteúdos
Estatística
Título: COMPARATIVE EVALUATION OF TOOLS FOR TABLE EXTRACTION IN PDF DOCUMENTS
Autor(es): PAULO DE SALDANHA DA G DE M VIANNA
Colaborador(es): AUGUSTO CESAR ESPINDOLA BAFFA - Orientador
Catalogação: 25/MAR/2026 Língua(s): PORTUGUESE - BRAZIL
Tipo: TEXT Subtipo: SENIOR PROJECT
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75809@1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75809@2
DOI: https://doi.org/10.17771/PUCRio.acad.75809
Resumo:
This paper presents a comparative evaluation of table extraction tools for Brazilian financial PDF documents. The study assessed geometric rulebased tools (Camelot, Tabula, pdfplumber), specialized deep learning (IBM Docling), and a multimodal model (Google Gemini), following the four-level methodology proposed by Göbel et al. (2012): page detection, localization, cell structure, and textual content. Experiments were conducted using Real Estate Investment Fund (FII) reports, which are characterized by irregular tables and merged cells. The results highlight significant differences between the approaches and reveal the persistent challenges in the automated extraction of financial tables.
Descrição: Arquivo:   
COMPLETE PDF