﻿ 非一致决策表的决策树分析 Decision Tree Analysis for Inconsistent Decision Tables

Computer Science and Application
Vol.06 No.10(2016), Article ID:18828,10 pages
10.12677/CSA.2016.610074

Decision Tree Analysis for Inconsistent Decision Tables

Meiling Xu, Ying Qiao, Jing Zeng, Yuchang Mo, Farong Zhong

College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua Zhejiang

Received: Oct. 5th, 2016; accepted: Oct. 23rd, 2016; published: Oct. 28th, 2016

Copyright © 2016 by authors and Hans Publishers Inc.

ABSTRACT

Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. Based on the greedy algorithm, we propose a new approach to construct a decision tree for inconsistent decision table. Firstly, an inconsistent decision table is transformed into a many-valued decision table. After that, we develop a greedy algorithm using “weighted sum” as the impurity and uncertainty measure to construct a decision tree for inconsistent decision tables. An illustration example is used to show that our “weighted sum” measure is better than the existing “weighted max” measure to reduce the size of constructed decision tree.

Keywords:Data Mining, Inconsistent Decision Table, Many-Valued Decision, Greedy Algorithm

1. 引言

2. 多值决策表构造

E(T)表示决策表T中条件属性值非一致的属性集合。例如，在决策表T中，E(T) = {f1, f2, f3}。同样，在子表T(f1, 1)中，E(T(f1, 1)) = {f2, f3}，因为在T(f1, 1)中，属性f1的值是一致的。对于fi ∈E(T)，E(T, fi)表示决策表T中条件属性fi对应的属性值集合。例如，决策表T中条件属性f1对应的属性值集合表示为E(T, f1) = {0, 1}。

3. 多值决策表构造

Γ表示决策表T构造的决策树，v表示Γ的节点。T(v)表示T的子表，如果v是决策树的根节点，那么T(v) = T，否则T(v)就是决策表T的子表T(fi11)…(fimm)，条件属性为fi1，…，fim对应的属性值为δ1，…，δm，分别代表从根节点到节点v整条路径上的节点和边。决策树Γ满足以下条件：

1) 如果T(v)是退化表，那么节点v用公共决策标记。

2) 如果T(v)不是退化表，那么节点v用fi标记，fi∈E(T(v))，如果E(T(v), fi) = {a1, …, ak}，那么节点v有k条边，分别用a1，…，ak表示。

Table 1. Decision table T0

Table 2. Decision table

Table 3. Decision table

Table 4. Decision table

Table 5. A many-valued decision table T

Table 6. A subtable of many-valued decision table T

Table 7. A subtable of many-valued decision table T

Table 8. A degenerate many-valued decision table

Figure 1. Decision tree for the many-valued decision table

1) 不确定性指标：不确定性指标U是从非空多值决策表集合到真实数值集合的函数，因此，U(T) ≥ 0，当且仅当是退化表时，U(T) = 0。我们选择错分误差 [11] 作为不确定性指标。

2) 不纯度函数：U是一个不确定性指标，fi∈E(T), E(T, fi) = {a1, …, at}，属性fi把决策表T分成t个子表：T1 = T(fi, a1), …, Tt = T(fi, at)。我们将不纯度函数做如下定义：

4. 决策树构造实例

5. 总结

Table 9. Greedy algorithm

Table 10. A many-valued decision table T

Table 11. A many-valued decision table

Figure 2. Decision tree for the first step

Figure 3. Decision tree for the second step

Figure 4. Decision tree building by the weighted sum and misclassification error

Figure 5. Decision tree building by the weighted max and misclassification error

Decision Tree Analysis for Inconsistent Decision Tables[J]. 计算机科学与应用, 2016, 06(10): 597-606. http://dx.doi.org/10.12677/CSA.2016.610074

1. 1. Moshkov, M. and Zielosko, B. (2011) Combinatorial Machine Learning-A Rough Set Approach, ser. Studies in Computational Intel-ligence. Springer, Berlin, Vol. 360.

2. 2. Tsoumakas, G. and Katakis, I. (2007) Multi-Label Classification: An Overview. International Journal of Data Warehousing and Mining (IJDWM), 3, 1-13. http://dx.doi.org/10.4018/jdwm.2007070101

3. 3. Cour, T., Sapp, B., Jordan, C. and Taskar, B. (2009) Learning from Ambiguously Labeled Images. IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June 2009, 919-926. http://dx.doi.org/10.1109/CVPRW.2009.5206667

4. 4. Hüllermeier, E. and Beringer, J. (2006) Learning from Ambiguously Labeled Examples. Intelligent Data Analysis, 10, 419-439.

5. 5. Jin, R. and Ghahramani, Z. (2002) Learning with Multiple Labels. Neural Information Processing Systems, Vancouver, 9 December 2002, 897-904.

6. 6. Azad, M., Chikalov, I., Moshkov, M. and Zielosko, B. (2012) Greedy Algorithm for Construction of Decision Trees for Tables with Many-Valued Decisions. Proceedings of the 21st International Workshop on Concurrency, Specification and Programming, Berlin, 26-28 September 2012, CEUR-WS.org, Vol. 928.

7. 7. Azad, M. and Moshkov, M. (2014) Minimization of Decision Tree Average Depth for Decision Tables with Many- Valued Decisions. Procedia Computer Science, 35, 368-377. http://dx.doi.org/10.1016/j.procs.2014.08.117

8. 8. Dembczynski, K., Greco, S., Kotlowski, W. and Roman, S. (2007) Optimized Generalized Decision in Dominance- Based Rough Set Approach. 2nd International Conference on Rough Sets and Knowledge Technology, Toronto, 14-16 May 2007, 118-125.

9. 9. Azad, M., Chikalov, I. and Moshkov, M. (2013) Three Approaches to Deal with Inconsistent Decision Tables— Comparison of Decision Tree Complexity. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, 46-54.

10. 10. Mingers, J. (1988) An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning, 3, 319-342. http://dx.doi.org/10.1007/BF00116837

11. 11. Azad, M. and Moshkov, M. (2014) “Misclassification Error” Greedy Heuristic to Construct Decision Trees for Inconsistent Decision Tables. Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Rome, 184-191. http://dx.doi.org/10.5220/0005059201840191