Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: EFFICIENT WEB PAGE REFRESH POLICIES
Autor: CRISTON PEREIRA DE SOUZA
Colaborador(es): EDUARDO SANY LABER - Orientador
Catalogação: 15/JUL/2010 Língua(s): PORTUGUESE - BRAZIL
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=15893&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=15893&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.15893
Resumo:
A search engine needs to continuously revisit web pages in order to keep its local repository up-to-date. A page revisiting schedule must be defined to keep the repository up-to-date using the available resources. In order to avoid web server overload, the revisiting policy must respect a minimum amount of time between consecutive requests to the same server. This rule is called politeness constraint. Due to the large number of web pages, we consider that a revisiting policy is efficient when the mean time to schedule a revisit is sublinear on the number of pages in the repository. Therefore, when the politeness constraint is considered, there are no existing efficient policies with theoretical quality guarantees. We investigate three efficient policies that respect the politeness constraint, called MERGE, RANDOM and DELAYED. We provide approximation factors for the repository’s up-to-date level for the MERGE and RANDOM policies. Based on these approximation factors, we devise a 0.77 lower bound for the approximation factor provided by the RANDOM policy and we present a conjecture that 0.927 is a lower bound for the approximation factor provided by the MERGE policy. We evaluate these policies through simulation experiments which try to keep a repository with 14.5 million web pages up-to-date. Additional experiments based on a repository with Wikipedia’s articles concluded that the MERGE policy provides better results than a natural greedy strategy. The main conclusion of this research is that there are simple and efficient policies that can be applied to this problem, even when the politeness constraint must be respected, resulting in a small loss of repository’s up-to-date level.
Descrição: Arquivo:   
COVER, ACKNOWLEDGEMENTS, RESUMO, ABSTRACT, SUMMARY AND LISTS PDF    
CHAPTER 1 PDF    
CHAPTER 2 PDF    
CHAPTER 3 PDF    
CHAPTER 4 PDF    
CHAPTER 5 PDF    
CHAPTER 6 PDF    
CHAPTER 7 PDF    
REFERENCES PDF