An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

English

Séminaire Données et Aléatoire Théorie & Applications

3/03/2022 - 14:00 Christophe Dutang (Paris Dauphine) Salle 106

Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this presentation proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable.
A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods.