Maxwell

XINFORMAÇÕES SOBRE DIREITOS AUTORAIS

As obras disponibilizadas nesta Biblioteca Digital foram publicadas sob expressa autorização dos respectivos autores, em conformidade com a Lei 9610/98.

A consulta aos textos, permitida por seus respectivos autores, é livre, bem como a impressão de trechos ou de um exemplar completo exclusivamente para uso próprio. Não são permitidas a impressão e a reprodução de obras completas com qualquer outra finalidade que não o uso próprio de quem imprime.

A reprodução de pequenos trechos, na forma de citações em trabalhos de terceiros que não o próprio autor do texto consultado,é permitida, na medida justificada para a compreeensão da citação e mediante a informação, junto à citação, do nome do autor do texto original, bem como da fonte da pesquisa.

A violação de direitos autorais é passível de sanções civis e penais.

Coleção Digital

Estatísticas | Formato DC |

Título: TEXT MINING AT THE INTELLIGENT WEB CRAWLING PROCESS
Autor: FABIO DE AZEVEDO SOARES

Veja mais

Soares, Fabio de Azevedo

19-

Aluno de Pós-Graduação

Programa de Pós-Graduação em Engenharia Elétrica

PUC-Rio

Instituição: PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO - PUC-RIO
Colaborador(es): MARLEY MARIA BERNARDES REBUZZI VELLASCO - ADVISOR
EMMANUEL PISECES LOPES PASSOS - CO-ADVISOR
Nº do Conteudo: 13212
Catalogação: 31/03/2009 Liberação: 31/03/2009 Idioma(s): PORTUGUESE - BRAZIL
Tipo: TEXT Subtipo: THESIS
Natureza: SCHOLARLY PUBLICATION
Nota: Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
Referência [pt]: https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=13212&idi=1
Referência [en]: https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=13212&idi=2
Referência DOI: https://doi.org/10.17771/PUCRio.acad.13212

Resumo:

This dissertation presents a study about the application of Text Mining as part of the intelligent Web crawling process. The most usual way of gathering data in Web consists of the utilization of web crawlers. Web crawlers are softwares that, once provided with an initial set of URLs (seeds), start the methodical proceeding of visiting a site, store it in disk and extract its hyperlinks that will be used for the next visits. But seeking for content in this way is an expensive and exhausting task. An intelligent web crawling process, more than collecting and storing any web document available, analyses its available crawling possibilities for finding links that, probably, will provide high relevant content to a topic defined a priori. In the approach suggested in this work, topics are not defined by words, but rather by the employment of text documents as examples. Next, pre-processing techniques used in Text Mining, including the use of a Thesaurus, analyze semantically the document submitted as example. Based on this analysis, the web crawler thus constructed will be guided toward its objective: retrieve relevant information to the document. Starting from seeds or querying through available search engines, the crawler analyzes, exactly as in the previous step, every document retrieved in Web. the similarity level between them is obtained, the retrieved document`s hyperlinks are analysed, queued and, later, will be dequeued according to each one`s probable degree of importance. By the end of the gathering data process, another Text Mining technique is applied, with the propose of selecting the most representative document among the collected texts: Document Clustering. The implementation of a tool incorporating all the researched heuristics allowed to achieve results, making possible to evaluate the performance of the developed techniques and compare all obtained results with others means of retrieving data in Web. The present work shows that the use of Text Mining is a track worthy to be exploited in the process of retrieving relevant information in Web.

Descrição	Arquivo
COVER, ACKNOWLEDGEMENTS, RESUMO, ABSTRACT, SUMMARY AND LISTS	PDF
CHAPTER 1	PDF
CHAPTER 2	PDF
CHAPTER 3	PDF
CHAPTER 4	PDF
CHAPTER 5	PDF
CHAPTER 6	PDF
CHAPTER 7	PDF
REFERENCES	PDF