Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: A FAST AND SPACE-ECONOMICAL APPROACH TO WORD MOVER S DISTANCE
Autor: MATHEUS TELLES WERNER
Colaborador(es): EDUARDO SANY LABER - Orientador
Catalogação: 02/ABR/2020 Língua(s): ENGLISH - UNITED STATES
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=47317&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=47317&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.47317
Resumo:
The Word Mover s Distance (WMD) proposed in Kusner et. al. [ICML,2015] is a distance between documents that takes advantage of semantic relations among words that are captured by their Word Embeddings. This distance proved to be quite effective, obtaining state-of-the-art error rates for classification tasks, but also impracticable for large collections or documents because it needs to compute a transportation problem on a complete bipartite graph for each pair of documents. By using assumptions, that are supported by empirical properties of the distances between Word Embeddings, we simplify WMD so that we obtain a new distance whose computation requires the solution of a max flow problem in a sparse graph, which can be solved much faster than the transportation problem in a dense graph. Our experiments show that we can obtain a performance gain up to three orders of magnitude over WMD while maintaining the same error rates in document classification tasks.
Descrição: Arquivo:   
COMPLETE PDF