Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: CLUSTERING UNDER CONSTRAINTS: EXPLAINABILITY VIA DECISION TREES AND SEPARABILITY WITH MINIMUM SIZE
Autor: LUCAS SAADI MURTINHO
Colaborador(es): EDUARDO SANY LABER - Orientador
Catalogação: 18/MAR/2025 Língua(s): ENGLISH - UNITED STATES
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=69655&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=69655&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.69655
Resumo:
We investigate two methods of clustering with constraints on the partitions being generated: explainable clustering, in which the partition must be induced by a binary decision tree (i.e., by cuts that are parallel to the axes); and minimum-size clustering, in which all clusters must have at least a predetermined number of elements. For explainable clustering, we present theoretical algorithms and bounds for the k-centers, k-medians, k-means, and minimum-spacing cost functions. We also introduce three practical algorithms for the popular k-means cost function: ExGreedy, which presents results generally better than comparable algorithms in the literature; ExShallow, with a penalty term related to the depth of the tree that induces the partition, allowing for a trade-off between performance (reducing the cost function) and explainability (generating shallower trees); and ExBisection, to our knowledge the first explainable clustering algorithm based on decision trees for the k-means cost function that builds an explainable partition from scratch (i.e., without starting from an unrestricted partition). For minimum-size clustering, our focus is on inter-clustering measures. We show that Single-Linkage, the algorithm that maximizes the minimum spacing, also maximizes the minimum-spanning-tree cost of a graph induced by the partition it generates; however, it is also prone to generating small clusters, which motivates the search for algorithms that perform well for these cost functions without suffering from this tendency. We introduce one approximation algorithm for each cost function, and present the results of experiments showing that they produce partitions that perform better than the popular k-means algorithm for these instances of the clustering task.
Descrição: Arquivo:   
COMPLETE PDF