Maxwell Para Simples Indexação

Título

[en] APPROXIMATE NEAREST NEIGHBOR SEARCH FOR THE KULLBACK-LEIBLER DIVERGENCE

Título

[pt] BUSCA APROXIMADA DE VIZINHOS MAIS PRÓXIMOS PARA DIVERGÊNCIA DE KULLBACK-LEIBLER

Autor

[pt] DANIEL ALEJANDRO MESEJO-LEON

Vocabulário

[pt] DIVERGENCIA KULLBACK-LEIBER

Vocabulário

[pt] ARVORES DE BREGMAN

Vocabulário

[pt] HASH SENSIVEL A LOCALIDADE

Vocabulário

[pt] INDICES INVERTIDOS

Vocabulário

[pt] BUSCA DE VIZINHOS MAIS PROXIMOS

Vocabulário

[en] KULLBACK-LEIBLER DIVERGENCE

Vocabulário

[en] BREGMAN BALL TREE

Vocabulário

[en] LOCALITY SENSITIVE HASHING

Vocabulário

[en] INVERTED INDEX

Vocabulário

[en] NEAREST NEIGHBOR SEARCH

Resumo

[pt] Em uma série de aplicações, os pontos de dados podem ser representados como distribuições de probabilidade. Por exemplo, os documentos podem ser representados como modelos de tópicos, as imagens podem ser representadas como histogramas e também a música pode ser representada como uma distribuição de probabilidade. Neste trabalho, abordamos o problema do Vizinho Próximo Aproximado onde os pontos são distribuições de probabilidade e a função de distância é a divergência de Kullback-Leibler (KL). Mostramos como acelerar as estruturas de dados existentes, como a Bregman Ball Tree, em teoria, colocando a divergência KL como um produto interno. No lado prático, investigamos o uso de duas técnicas de indexação muito populares: Índice Invertido e Locality Sensitive Hashing. Os experimentos realizados em 6 conjuntos de dados do mundo real mostraram que o Índice Invertido é melhor do que LSH e Bregman Ball Tree, em termos de consultas por segundo e precisão.

Resumo

[en] In a number of applications, data points can be represented as probability distributions. For instance, documents can be represented as topic models, images can be represented as histograms and also music can be represented as a probability distribution. In this work, we address the problem of the Approximate Nearest Neighbor where the points are probability distributions and the distance function is the Kullback-Leibler (KL) divergence. We show how to accelerate existing data structures such as the Bregman Ball Tree, by posing the KL divergence as an inner product embedding. On the practical side we investigated the use of two, very popular, indexing techniques: Inverted Index and Locality Sensitive Hashing. Experiments performed on 6 real world data-sets showed the Inverted Index performs better than LSH and Bregman Ball Tree, in terms of queries per second and precision.

Orientador(es)

EDUARDO SANY LABER

Banca

HELIO CORTES VIEIRA LOPES

Banca

ALEXANDRE ROBERTO RENTERIA

Banca

EDUARDO SANY LABER

Catalogação

2018-03-19

Apresentação

2018-01-09

Tipo

[pt] TEXTO

Formato

application/pdf

Idioma(s)

INGLÊS

Referência [pt]

https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=33305@1

Referência [en]

https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=33305@2

Referência DOI

https://doi.org/10.17771/PUCRio.acad.33305

Arquivos do conteúdo

NA ÍNTEGRA PDF