Título: | THEORETICAL AND EXPERIMENTAL RESULTS IN INFORMATION-THEORETIC CLUSTERING | ||||||||||||
Autor: |
LUCAS SAADI MURTINHO |
||||||||||||
Colaborador(es): |
EDUARDO SANY LABER - Orientador |
||||||||||||
Catalogação: | 21/SET/2020 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=49518&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=49518&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.49518 | ||||||||||||
Resumo: | |||||||||||||
We present theoretical and experimental results related to the problem
of clustering a set of vectors (which can be interpreted as probability
distributions) with the goal of minimizing a weighted impurity measure
of the resulting partition. The problem of clustering while minimizing the
weighted Gini impurity of the partition is shown to be NP-complete and
APX-hard, via a connection with the geometrical k-means problem. We
also analyze a family of algorithms for information-theoretic clustering that
rely on the dominant (largest) component of the vectors to be clustered.
These algorithms are shown to be very fast compared to the state of the art,
while able to achieve comparable results in terms of the resulting partition s
weighted entropy.
|
|||||||||||||
|