as well as to point out the biases that a!ect the search strategies.
In the next section, the search space of pruning methods is introduced. It can be represented as
a directed acyclic graph whose nodes correspond to simpli"ed versions of a tree ¹.!9, while its
edges correspond to the application of a pruning/grafting operator. By de"ning an evaluation
function on the set of graph vertices, the problem of tree pruning can be cast as the problem of
searching the vertex with the highest value of f. Section 3 is devoted to the presentation of some
well-known pruning methods in the unifying framework of search in the state space. Finally, the
data sets considered, the experimental procedure based on cross-validation, and the experimental
results are reported in Section 4.
2. A UNIFYING FRAMEWORK FOR DESCRIBING PRUNING METHODS
Before discussing the search space explored by pruning methods, some useful notations are
introduced.
A (rooted) tree can be formally de"ned as a "nite set of nodes, NT"Mt
0, t
1,2, tnN, and an
associated relation BT-NT]NT for which the following properties hold:
(1) there exists only one node t
0 in NT, named root, such that "Sti , tjT3BT : tjOt
0;
(2) "tj3NT, tjOt
0, there exists only one node ti3NT such that Sti , tjT3BT.
The set NT can be partitioned into the set IT of internal nodes and the set LT of leaves.
A particular subset of IT is ET, the set of internal nodes whose children are all inLT. Following
Breiman et al.'s notation, we will denote with ¹t the branch of ¹ containing t and all its
descendants (see Figure 1).
Given a set O of N observations Oi , each of which is described by an (M#1)-dimensional
feature vector, SX
1,2, XM, CT, it is possible to build tree-structured classi"ers, named decision
trees.
De,nition 1 (Decision tree). A decision tree is a particular tree ¹ in which:
f each node t is associated with a subset of O, O(t);
f the root t
0 is associated with O itself;
f every edge Sti , tjT3BT is labelled with a test result, ¸T(Sti , tjT);
f test results for edges out coming from a node t de"ne a partitioning of the feature space, and
hence of O(t).
If p is a path from the root t
0 of a tree ¹ to a leaf ti of ¹,
p"St
0, t
1T, St
1, t
2T,2, Sti~1, tiT
278 F. ESPOSITO E¹ A¸.
Copyright ( 1999 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 15, 277}299 (1999)
Figure 1. In this tree ¹, NT"Mt0, t1,2, t9N, BT"MSt0, t1T, St0, t2T, St2, t3T, St2, t4T, St3, t5T, St3, t6T,
St4, t7T, St4, t8T, St4, t9TN, LT"Mt1, t5, t6, t7, t8, t9N, IT"Mt0, t2, t3, t4N, ET"Mt3, t4N. The dashed line
encloses the branch ¹t2
.
then the label associated to p, ¸T(p), can be de"ned as the sequence of labels:
¸T(p)"¸T(St
0, t
1T), ¸T(St
1, t
2T),2, ¸T(Sti~1, tiT)
Let T denote the set of all possible decision trees that can be built from O. It is possible to
de"ne two distinct partial-order relations (unless node renaming) onT, namely)P and)G, that
satis"es the properties of re#exivity, antisymmetry and transitivity.
De,nition 2 (Pruning ordering). Let ¹ and ¹@ be two decision trees in T. Then ¹@)P¹ i! for
each path p@ from the root of ¹@ to a leaf in ¹@ there exists a path p from the root of ¹ to a leaf of
¹ such that
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。