英语论文网

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY
Appl. Stochastic. Models Bus. Ind. 15, 277}299 (1999)
THE EFFECTS OF PRUNING METHODS ON THE PREDICTIVE ACCURACY OF INDUCED
DECISION TREES
FLORIANA ESPOSITO,s DONATO MALERBA, GIOVANNI SEMERARO AND
VALENTINA TAMMA
Dipartimento di Informatica, Universita` degli Studi di Bari, via Orabona 4, 70126 Bari - Italy
SUMMARY
Several methods have been proposed in the literature for decision tree (post)-pruning. This article presents
a unifying framework according to which any pruning method can be de"ned as a four-tuple (Space,
Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization
problem. Six well-known pruning methods are investigated by means of this framework and their common
aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the e!ect of
post-pruning on both the predictive accuracy and the size od induced decision trees is reported. The
experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation
procedure. The results con"rm most of the conclusions drawn in a previous comparison based on the
holdout procedure. Copyright ( 1999 John Wiley & Sons, Ltd.
KEY WORDS: Induction of decision trees; Decision tree pruning; State space; Cross-validation study
1. INTRODUCTION
Various heuristic methods have been proposed for the construction of a decision tree,https://www.51lunwen.org/geguolw.html among
which the most widely known is the top-down approach [1]. In top-down induction of decision
trees (TDIDT) it is possible to identify three tasks [2]:
(1) the assignment of each leaf with a class,
(2) the selection of the splits according to a selection measure, and
(3) the decision when to declare a node terminal or to continue splitting it.
The third task is deemed critical for the construction of good decision trees. There are two
di!erent ways to cope with it: Either prospectively deciding when to stop the growth of a tree or
retrospectively reducing the size of a fully expanded tree by pruning some branches. Methods that
control the growth of a decision tree during its construction are called pre-pruning methods, while
the others are called post-pruning methods [3].
* Correspondence to: Prof. Floriana Esposito, Dipartimento di Informatica Universita` degli Studi, via Orabona, 4, 70126
Bari, Italy.
s E-mail: esposito@di.uniba.it
CCC 1524}1904/99/040277}23$17.50 Received June 1997
Copyright ( 1999 John Wiley & Sons, Ltd. Revised May 1999
Many post-pruning (or simply pruning) methods have been proposed in the literature, some of
which are: reduced error pruning, minimum error pruning, pessimistic error pruning, critical value
pruning, cost-complexity pruning, and error-based pruning. A previous comparative study has
already pointed out both their similarities and their di!erences and investigated the real e!ect of
some of these methods on both the predictive accuracy and the size of the induced tree [4, 5]. In
that study, optimally pruned trees have been used to evaluate the maximum improvement
produced by an ideal pruning algorithm.
The main purpose of this article is that of providing a further comparison of these pruning
methods. Their search spaces and search strategies are investigated, in order to analyse their
computational complexity,