Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: A FRAMEWORK FOR THE CONSTRUCTION OF MEDIATORS OFFERING DEDUPLICATION
Autor: GUSTAVO LOPES MOURAD
Colaborador(es): KARIN KOOGAN BREITMAN - Orientador
Catalogação: 24/JAN/2011 Língua(s): PORTUGUESE - BRAZIL
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=16775&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=16775&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.16775
Resumo:
As Web applications that obtain data from different sources (Mashups) grow in importance, timely solutions to the duplicate detection problem become central. Most existing techniques, however, are based on machine learning algorithms, that heavily rely on the use of relevant, manually labeled, training datasets. Such solutions are not adequate when talking about data sources on the Deep Web, as there is often little information regarding the size, volatility and hardly any access to relevant samples to be used for training. In this thesis we propose a strategy to aid in the extraction (scraping), duplicate detection and integration of data that resulted from querying Deep Web resources. Our approach does not require the use of pre-defined training sets , but rather uses a combination of a Vector Space Model classifier with similarity functions, in order to provide a viable solution. To illustrate our approach, we present a case study where the proposed framework was instantiated for an application in the wine industry domain.
Descrição: Arquivo:   
COVER, ACKNOWLEDGEMENTS, RESUMO, ABSTRACT, SUMMARY AND LISTS PDF    
CHAPTER 1 PDF    
CHAPTER 2 PDF    
CHAPTER 3 PDF    
CHAPTER 4 PDF    
CHAPTER 5 PDF    
CHAPTER 6 PDF    
CHAPTER 7 PDF    
REFERENCES PDF