ETDs

Estatística

Título:

A MODEL-BASED FRAMEWORK FOR SEMI-SUPERVISED CLUSTERING AND COMMUNITY DETECTION

Autor:

DANIEL LEMES GRIBEL

Colaborador(es):

THIBAUT VICTOR GASTON VIDAL - Orientador
MICHEL GENDREAU - Coorientador

Catalogação:

09/SET/2021

Língua(s):

ENGLISH - UNITED STATES

Tipo:

TEXT

Subtipo:

THESIS

Notas:

[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.

Referência(s):

[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=54595&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=54595&idi=2

DOI:

https://doi.org/10.17771/PUCRio.acad.54595

Resumo:

In model-based clustering, we aim to separate data samples into meaningful groups by optimizing the fit of some observed data to a mathematical model. The recent adoption of model-based clustering has allowed practitioners to model complex patterns in data and explore a wide range of applications. This thesis investigates model-driven approaches for community detection and semisupervised clustering by adopting a maximum-likelihood perspective. We first focus on exploiting constrained optimization techniques to present a new model for community detection with stochastic block models (SBMs). We show that the proposed constrained formulation reveals communities structurally different from those obtained with classical community detection models. We then study a setting where inaccurate annotations are provided as must-link and cannot-link relations, and propose a novel semi-supervised clustering model. Our experimental analysis shows that incorporating partial supervision and appropriately encoding prior user knowledge significantly enhance clustering performance. Finally, we examine the problem of semi-supervised clustering in the presence of unreliable class labels. We focus on the case where groups of untrustworthy annotators deliberately misclassify data samples and propose a model to handle such incorrect statements.

Descrição:			Arquivo:
COMPLETE			PDF