﻿ 基于机器学习分类方法的信用卡审批应用 The Application of Credit Approval Based on Machine Learning Classification Method

Hans Journal of Data Mining
Vol.06 No.03(2016), Article ID:18342,9 pages
10.12677/HJDM.2016.63012

The Application of Credit Approval Based on Machine Learning Classification Method

Yulian Mo, Yu Fei*

School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming Yunnan

Received: Jul. 23rd, 2016; accepted: Aug. 15th, 2016; published: Aug. 18th, 2016

ABSTRACT

The traditional method of credit card approval is often rely on the experience of credit personnel and is to decide whether the credit card applicants meet the conditions of application. Obviously, this approval method has a lot of randomness and instability. In this paper, we take advantages of R software and introduce the six latest machine learning classification method, decision tree classification, AdaBoost, Bagging classification, random forest classifier, support vector machine (SVM) classification, artificial neural network (Ann) into the credit card application management, then establish the automatic application management system, effectively reducing the randomness and instability of the examination and approval results. Finally we calculate the mean square error of all the classification method through 8-fold cross validation and chose the classification with the best effect. The result shows that the classification error of random forest classification is the smallest.

Keywords:Credit Card Application, Machine Learning Classification, Random Forest

1. 研究背景

2. 机器学习分类方法概述

2.1. 决策树分类

2.3. Bagging分类

Bagging (bootstrap aggregating的简写)可以译为“自助整合法”，它利用了自助法(Bootstrap)放回抽样，对训练样本做多次放回抽样，每次抽取的样本量相同的观测值。对每个抽取的样本生成一棵决策树，由这些树的分类结果的“投票”产生bagging分类。

2.4. 随机森林分类

2.5. 支持向量机分类

2.6. 人工神经网络分类

3. 实证分析

3.1. 数据来源与说明

3.2. 机器学习分类方法的R软件实现

3.2.1. 决策树分类结果

3.2.3. Bagging分类结果

3.2.4. 随机森林分类结果

3.2.5. 支持向量机分类结果

Table 1. Type and value of variables for the Credit Approval Data

Table 2. Class distribution for the Credit Approval Data

Table 3. Statistics of missing values in the Credit Approval Data

Table 4. Decision tree classification results on the Credit Approval Data

Figure 1. Decision tree output for the Credit Approval Data

Figure 2. Decision tree for the Credit Approval Data

Figure 3. The variables importance of Adaboost fitting the Credit Approval Data

Figure 4. The variables importance of Bagging fitting the Credit Approval Data

Figure 5. The variables importance of Random forest fitting the Credit Approval Data

Table 5. Adaboost classification results on the Credit Approval Data

Table 6. Bagging classification results on the Credit Approval Data

Table 7. Random forest classification results on the Credit Approval Data

Table 8. SVM classification results on the Credit Approval Data

3.2.6. 人工神经网络分类结果

3.2.7. 六种机器学习分类方法的八折交叉验证结果

4. 结论

1) 本文将机器学习方法引入信用卡风险管理中有效地解决传统审批方法的不稳定性和随意性，大大提升了信用审批的效率和自动化过程。六种分类方法对信用卡数据集(Credit Approval Data Set)的拟合效果都比较不错，它们的分类精度都≥83%。

2) 随机森林分类方法对信用卡数据集(Credit Approval Data Set)在这六种方法中具有最好的拟合效果。不管是对训练集或者是测试集，随机森林分类的平均分类误差都为0，这也显示出了随机森林分类器强大的稳定性。

3) 本文仅仅只是根据这组数据得到结果，没有对其他的信用卡申请审批数据进行验证，并且未对其他的信用卡申请审批方法进行对比。

Table 9. 8-fold cross-validation average false positive rate of six classification methods on the Credit Approval Data

The Application of Credit Approval Based on Machine Learning Classification Method[J]. 数据挖掘, 2016, 06(03): 97-105. http://dx.doi.org/10.12677/HJDM.2016.63012

1. 1. 刘继海, 陈晓剑. SVM模型在信用卡申请管理中的创新应用[J]. 哈尔滨工业大学学报: 社会科学版, 2007, 9(4):133-136.

2. 2. 田晓光, 孔德婧. 数据挖掘在信用卡发行中的应用[J]. 科技信息, 2008(5): 64-66.

3. 3. 刘慧. 基于数据挖掘技术的信用卡申请评分模型研究[D]: [硕士学位论文]. 大连: 东北财经大学, 2010.

4. 4. Sakprasat, S. and Sinclair, M.C. (2007) Classification Rule Mining for Automatic Credit Approval Using Genetic Programming. IEEE Congress on Evolutionary Computation, Singapore, 25-28 September 2007, 548-555.

5. 5. Matsatsinis, N.F. (2002) An Intelligent Decision Support System for Credit Card Assessment Based on a Machine Learning Technique. Operational Research, 2, 243-260. http://dx.doi.org/10.1007/bf02936329

6. 6. 吴喜之. 复杂数据统计方法——基于R的应用(第二版) [M]. 北京: 中国人民大学出版社, 2013.

*通讯作者。