TFCs

Consulta aos Conteúdos

Estatística

Título:

COMPARATIVE EVALUATION OF TOOLS FOR TABLE EXTRACTION IN PDF DOCUMENTS

Autor(es):

PAULO DE SALDANHA DA G DE M VIANNA

Colaborador(es):

AUGUSTO CESAR ESPINDOLA BAFFA - Orientador

Catalogação:

25/MAR/2026

Língua(s):

PORTUGUESE - BRAZIL

Tipo:

TEXT

Subtipo:

SENIOR PROJECT

Notas:

[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.

Referência(s):

[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75809@1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=75809@2

DOI:

https://doi.org/10.17771/PUCRio.acad.75809

Resumo:

This paper presents a comparative evaluation of table extraction tools for Brazilian financial PDF documents. The study assessed geometric rulebased tools (Camelot, Tabula, pdfplumber), specialized deep learning (IBM Docling), and a multimodal model (Google Gemini), following the four-level methodology proposed by Göbel et al. (2012): page detection, localization, cell structure, and textual content. Experiments were conducted using Real Estate Investment Fund (FII) reports, which are characterized by irregular tables and merged cells. The results highlight significant differences between the approaches and reveal the persistent challenges in the automated extraction of financial tables.

Descrição:			Arquivo:
COMPLETE			PDF