TFCs

Consulta aos Conteúdos

Título:

WEBETL: EXTRACT, TRANSFORM AND LOAD DATA FROM THE WEB

Autor(es):

FELIPE SALVINI BOURRUS

Colaborador(es):

MARCOS VIANNA VILLAS - Orientador

Catalogação:

01/OUT/2010

Língua(s):

PORTUGUESE - BRAZIL

Tipo:

TEXT

Subtipo:

SENIOR PROJECT

Notas:

[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.

Referência(s):

[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=16419@1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/TFCs/consultas/conteudo.php?strSecao=resultado&nrSeq=16419@2

DOI:

https://doi.org/10.17771/PUCRio.acad.16419

Resumo:

Especific domain crawlers (robots that crawl the web and indexes information from websites, usually search engines) previously chosen to extract desired information. This crawler can interact with millions of websites during a short period of time. This project describes the architecture and implementation of a distributed crawler. Its discussed bottlenecks, efficient techniques to get best performance and present statistics obtained by the crawler.

Descrição:			Arquivo:
COMPLETE			PDF