Título: | RANKING OF WEB PAGES BY LEARNING MULTIPLE LATENT CATEGORIES | ||||||||||||||||||||||||||||||||||||||||
Autor: |
FRANCISCO BENJAMIM FILHO |
||||||||||||||||||||||||||||||||||||||||
Colaborador(es): |
RUY LUIZ MILIDIU - Orientador |
||||||||||||||||||||||||||||||||||||||||
Catalogação: | 17/MAI/2012 | Língua(s): | PORTUGUESE - BRAZIL |
||||||||||||||||||||||||||||||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||||||||||||||||||||||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||||||||||||||||||||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=19540&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=19540&idi=2 |
||||||||||||||||||||||||||||||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.19540 | ||||||||||||||||||||||||||||||||||||||||
Resumo: | |||||||||||||||||||||||||||||||||||||||||
The rapid growth and generalized accessibility of the World Wide Web
(WWW) have led to an increase in research in the field of the information
retrieval for Web pages. The WWW is an immense and prodigious environment
in which Web pages resemble a huge community of elements. These
elements are connected via hyperlinks on the basis of similarity between the
content of the pages, the popularity of a given page, the extent to which the
information provided is authoritative in relation to a given field etc. In fact,
when the author of a Web page links it to another, s/he is acknowledging
the importance of the linked page to his/her information. As such the hyperlink
structure of the WWW significantly improves research performance
beyond the use of simple text distribution statistics. To this effect, the HITS
approach introduces two basic categories of Web pages, hubs and authorities
which uncover certain hidden semantic information using the hyperlink
structure. In 2005, we made a first extension of HITS, called Extended Hyperlink
Induced Topic Search (XHITS), which inserted two new categories
of Web pages, which are novelties and portals. In this thesis, we revised the
XHITS, transforming it into a generalization of HITS, broadening the model
from two categories to various and presenting an efficient machine learning
algorithm to calibrate the proposed model using multiple latent categories.
The findings we set out here indicate that the new learning approach
provides a more precise XHITS model. It is important to note, in closing,
that experiments with the ClueWeb09 25TB collection of Web pages,
downloaded in 2009, demonstrated that the XHITS is capable of significantly
improving Web research efficiency and producing results comparable
to those of the TREC 2009/2010 Web Track.
|
|||||||||||||||||||||||||||||||||||||||||
|