Título: | DECISION DIAGRAMS FOR CLASSIFICATION: NEW CONSTRUCTIVE APPROACHES | ||||||||||||
Autor: |
PEDRO SARMENTO BARBOSA MARTINS |
||||||||||||
Colaborador(es): |
THIBAUT VICTOR GASTON VIDAL - Orientador |
||||||||||||
Catalogação: | 16/OUT/2023 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=64308&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=64308&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.64308 | ||||||||||||
Resumo: | |||||||||||||
Decision diagrams are a generalization of decision trees. They have
been repeatedly proposed as a supervised classification model for machine
learning but have not been widely adopted. The reason appears to be the
difficulty of training the model, as the requirement of deciding splits and
merging nodes can lead to difficult combinatorial optimization problems.
A decision diagram has marked advantages over decision trees because it
better models disjoint binary concepts, avoiding the replication of subtrees
and thus has less sample fragmentation in internal nodes. Because of this,
devising an effective construction algorithm is important. In this context, the
Optimal Decision Diagram (ODD) algorithm was recently proposed, which
formulates the problem of building a diagram as a mixed-integer linear program
(MILP), with a warm start provided by a greedy constructive heuristic. Initial
experiments have shown that this heuristic can be improved upon, in order
to find close-to-optimal solutions more effectively and in turn provide the
MILP with a better warm start. In this study, we report improvements to this
constructive heuristic, by randomizing the split decisions, pruning pure flows
(i.e. flows with samples from a single class), and applying bottom-up pruning,
which considers the complexity of the model in addition to its accuracy. All
proposed improvements have positive effects on accuracy and generalization,
as well as the objective value of the ODD algorithm. The bottom-up pruning
strategy, in particular, has a substantial impact on the objective value, and
thus on the ability of the MILP solver to find optimal solutions. In addition, we
provide experiments on the expressiveness of decision diagrams when compared
to trees in the context of small boolean functions in Disjoint Normal Form
(DNF), as well as a web application for the visual exploration of the proposed
constructive approaches.
|
|||||||||||||
|