英语论文网

cation rules learned in Figure 1(a) canbe used to predict the credit rating of new or future (i.e.previously unseen) customers. A decision tree is a flowchart-like tree structure, where each internal node denotes atest on an attribute, each branch represents an outcome ofthe test, and leaf nodes represent classes or class distributions.The decision trees can easily be converted toclassification rules. When the decision trees are built, manyof the branches may reflect noise or outliers in the trainingdata. Tree pruning attempts to identify and remove suchbranches, with the goal of improving classificationaccuracy on unseen data.name age income Credit_ratingSandy JonesBill LeeCourtney FoxSusan LakeClaire PhipsAndre Beau…<= 30<= 3031…40>40>4031…40…LowLowHighMedMedHigh…FairExcellentExcellentFairFairExcellent…If age = “31…40”And income = highThenCredit_rating = Excellentname age income Credit_ratingFrank JonesSylvia CrestAnne Yee…>40<=3031…40…HighLowHigh…FairFairExcellent…(John Henri, 31…40, High)Credit rating?Excellent(a) (b)Figure 1: Data classification process.
Expert Systems, September 2005, Vol. 22, No. 42.2. Data mining in a changing environmentThere is existing work on learning (Helmbold & Long,1994; Widmer, 1996; Freund&Mansour, 1997) and mining(Bay & Pazzani, 1999; Ganti et al., 1999; Liu et al., 2000;Han & Kamber, 2001) in a changing environment. All thefollowing related studies focus on dynamic aspects orcomparison between two different data sets or rules. Theyare clustered as six categories in this paper.The first field of study that examines mining in achanging environment is rule maintenance (Cheung et al.,1996; Feldman et al., 1997; Thomas et al., 1997). Thepurpose of these studies is to improve accuracy in achanging environment. For example, in the study ofCheung et al. (1996), incremental updating techniques areproposed for the efficient maintenance of discoveredassociation rules when new transaction data are added toa transaction database. But these techniques do not provideany changes for the user. They just maintain existingknowledge.The second research trend is to discover emergingpatterns (Agrawal & Psaila, 1995; Dong & Li, 1999; Liet al., 2000), which are defined as item sets whose supportsincrease significantly from one data set to another.Emerging patterns can capture emerging trends in timestampeddatabases or useful contrasts between data classes.But they do not consider the structural changes in the rules.For example, in a market basket, these techniques candiscover significant rule changes which increase the growthrate or decrease the rate of consumption over time butcannot detect any unexpected changes such as a changefrom coffee ) tea to coffee ) milk.Another related area of research is subjective interestingnessin data mining (Liu & Hsu, 1996; Silberschatz &Tuzhilin, 1996; Liu et al., 1997; Suzuki, 1997; Padmanabhan& Tuzhilin, 1999). These papers give a number oftechniques for finding unexpected rules with respect to theuser’s existing knowledge. This technique cannot be usedfor detecting changes as its analysis only compares eachnewly generated rule with each existing one to find degreesof difference. It does not find which aspects have changed,what kinds of changes have taken place and how muchchange has occurred.The fourth research stream is mining from time-seriesdata. There is increasing interest in discovering regularity intime-series data (Das et