Título: | APPROXIMATE BORN AGAIN TREE ENSEMBLES | ||||||||||||
Autor: |
MATHEUS DE SOUSA SUKNAIC |
||||||||||||
Colaborador(es): |
MARCO SERPA MOLINARO - Orientador THIBAUT VICTOR GASTON VIDAL - Coorientador |
||||||||||||
Catalogação: | 28/OUT/2021 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=55539&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=55539&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.55539 | ||||||||||||
Resumo: | |||||||||||||
Ensemble methods in machine learning such as random forest, boosting,
and bagging have been thoroughly studied and proven to have better accuracy
than using a single predictor. However, their drawback is that they give models
that can be much harder to interpret than those given by, for example, decision
trees. In this work, we approach in a principled way the problem of constructing
a decision tree that approximately reproduces a tree ensemble, exploring the
tradeoff between accuracy and interpretability that can be obtained once exact
reproduction is relaxed.
First, we formally define the problem of obtaining the decision tree of a
given depth that is most adherent to a tree ensemble and give a Dynamic
Programming algorithm for solving this problem. We also prove that the
decision trees obtained by this procedure satisfy generalization guarantees
related to the generalization of the original tree ensembles, a crucial element
for their effectiveness in practice. Since the computational complexity of the
Dynamic Programming algorithm is exponential in the number of features, we
also design heuristics to compute trees of a given depth with good adherence
to a tree ensemble.
Finally, we conduct a comprehensive computational evaluation of the
algorithms proposed. The results indicate that in many situations, there is little
or no loss in accuracy in working more interpretable classifiers: even restricting
to only depth-6 decision trees, our algorithms produce trees with average
accuracies that are within 1 percent (for the Dynamic Programming algorithm) or
2 percent (heuristics) of the original random forest.
|
|||||||||||||
|