Título: | A FAST AND SPACE-ECONOMICAL APPROACH TO WORD MOVER S DISTANCE | ||||||||||||
Autor: |
MATHEUS TELLES WERNER |
||||||||||||
Colaborador(es): |
EDUARDO SANY LABER - Orientador |
||||||||||||
Catalogação: | 02/ABR/2020 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=47317&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=47317&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.47317 | ||||||||||||
Resumo: | |||||||||||||
The Word Mover s Distance (WMD) proposed in Kusner et. al.
[ICML,2015] is a distance between documents that takes advantage of semantic relations among words that are captured by their Word Embeddings.
This distance proved to be quite effective, obtaining state-of-the-art error
rates for classification tasks, but also impracticable for large collections or
documents because it needs to compute a transportation problem on a complete bipartite graph for each pair of documents.
By using assumptions, that are supported by empirical properties of the
distances between Word Embeddings, we simplify WMD so that we obtain a
new distance whose computation requires the solution of a max flow problem
in a sparse graph, which can be solved much faster than the transportation
problem in a dense graph. Our experiments show that we can obtain a
performance gain up to three orders of magnitude over WMD while maintaining
the same error rates in document classification tasks.
|
|||||||||||||
|