﻿ 大数据下交叉销售问题的建模与预测 Modeling and Prediction of Cross-Selling Problems in Big Data

Vol.06 No.09(2017), Article ID:23256,12 pages
10.12677/AAM.2017.69149

Modeling and Prediction of Cross-Selling Problems in Big Data

Xingfang Huang1, Xuelian Wang2

1Institute of Statistics and Data Science, Nanjing Audit University, Nanjing Jiangsu

2School of Mathematics, Southeast University, Nanjing Jiangsu

Received: Dec. 1st, 2017; accepted: Dec. 22nd, 2017; published: Dec. 29th, 2017

ABSTRACT

Multiple Logistic method and Two-stage Logistic method all have good advantages of dealing with large number of variables and big data. The main purpose for this paper is building a cross-selling model from Auto Insurance to Home Insurance and then predicting the customers’ behavior. A famous American insurance company’s cross-selling data in eleven months is used in this paper. Multiple Logistic method and Two-stage Logistic method are separately applied to build cross-selling models on California and Non-California area. The results for these models can predict which products the prospects are more likely to purchase. Finally, it makes a conclusion that Two-stage Logistic model performs better on both California and Non-California data.

Keywords:Cross-Selling, Multiple Logistic Model, Two-Stage Logistic Model

1南京审计大学统计科学与大数据研究院，江苏 南京

2东南大学数学学院，江苏 南京

1. 引言

“交叉销售”一词最早在1965年被国外银行业普遍使用 [1] ，35年以后，交叉销售的理论和实践也得到了大规模的研究。Nash (1993) [2] 和Deighton等学者(1994) [3] 指出，交叉销售是指“鼓励一个已经购买了某公司A产品的顾客购买其B产品”。郭国庆(2003) [4] 认为交叉销售的实质是：充分利用一切资源，服务市场、开展营销、赢得用户。其后，赵华、宋顺林(2007) [5] 基于ERMSW算法，对已有的客户购买序列进行有维度约束，探究客户的消费趋势，预测匹配度满足一定条件的客户可能的购买行为。Li Chunqing等(2010) [6] 针对我国银行的实际情况，对NPTB模型的变量进行修正，并通过实证分析，论证了神经网络模型在交叉销售预测中的优越性。

2. Logistic模型介绍

$\mathrm{ln}\left(\frac{{P}_{i}}{1-{P}_{i}}\right)=\alpha +\beta {x}_{i}$

$\mathrm{ln}\left(\frac{{P}_{i}}{1-{P}_{i}}\right)=\alpha +\sum _{i=1}^{N}{\beta }_{k}{x}_{ki}$

(一) 多重Logistic模型

$\begin{array}{l}{\pi }_{ij}=\frac{\mathrm{exp}\left({\beta }_{0j}+{\beta }_{1j}{x}_{i1}+\cdots +{\beta }_{pj}{x}_{ip}\right)}{\mathrm{exp}\left({\beta }_{01}+{\beta }_{11}{x}_{i1}+\cdots +{\beta }_{p1}{x}_{ip}\right)+\cdots +\mathrm{exp}\left({\beta }_{0k}+{\beta }_{1k}{x}_{i1}+\cdots +{\beta }_{pk}{x}_{ip}\right)},\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(i=1,2,\cdots ,n,j=1,2,\cdots ,k\right)\end{array}$

${\pi }_{ij}=\frac{\mathrm{exp}\left({\beta }_{0j}+{\beta }_{1j}{x}_{i1}+\cdots +{\beta }_{pj}{x}_{ip}\right)}{1+\mathrm{exp}\left({\beta }_{02}+{\beta }_{12}{x}_{i1}+\cdots +{\beta }_{p2}{x}_{ip}\right)+\cdots +\mathrm{exp}\left({\beta }_{0k}+{\beta }_{1k}{x}_{i1}+\cdots +{\beta }_{pk}{x}_{ip}\right)}$

(二) 两阶段Logistic模型

3. 模型的建立与结果分析

(一) 建模要求

(二) 评价模型的指标

(1) 基尼系数

$Gini$ 的值在0与1之间，即 $0\le Gini\le 1$ 。计算基尼系数的方法有很多，如直接计算法、切块法、函数法、弓形面积法等。本文使用切块法。根据理论知识，先画出该方法基尼系数计算示意图，如图2

Figure 1. Gini score graph

Figure 2. Gini coefficient obtained by jackknife method

${S}_{1}=\frac{1}{2}\left({P}_{1}{I}_{1}+{P}_{2}{I}_{2}+\cdots +{P}_{n}{I}_{n}\right)$

${S}_{2}$ 为洛伦茨曲线以上的面积中除去 ${S}_{1}$ 的阴影面积的部分：

${S}_{2}={P}_{1}\left({I}_{2}+{I}_{3}+\cdots +{I}_{n}\right)+{P}_{2}\left({I}_{3}+{I}_{4}+\cdots +{I}_{n}\right)+{P}_{n-1}{I}_{n}$

${S}_{3}$ 为单位正方形面积的一半， ${S}_{3}=1/2$ 。由于 ${S}_{A}={S}_{1}+{S}_{2}-{S}_{3}$${S}_{A}+{S}_{B}=1/2$ ，Qi为前i条记录的累积比例，故基尼系数基本公式为：

$G=\sum _{i=1}^{n}{I}_{i}{P}_{i}+2\left[{P}_{1}\left(1-{Q}_{1}\right)+{P}_{2}\left(1-{Q}_{2}\right)+\cdots +{P}_{n-1}\left(1-{Q}_{n-1}\right)\right]-1$

(2) C值：一个衡量Logistic模型预测准确程度的统计值。

(3) 整个模型分组得分

(三) 多重Logistic模型

1、选出的变量一样，且系数相差不大。

2、相应的变量符号一样，即使有符号不同的，但它们在整个模型里贡献率之和还未达到10%。

Table 1. Evaluation and effect of statistic for Variables and models

Table 2. Results of multiple Logistic model for California

(四) 两阶段Logistic模型

Table 3. Basic information about the U.S. auto insurance rate system

Table 4. Results of two-stage Logistic model for California

$\begin{array}{l}Y=-4.3145-0.9755\ast {X}_{1}+0.8962\ast \mathrm{ln}\left({X}_{2}\right)+0.5581\ast \mathrm{ln}\left({X}_{3}\right)-0.0014*{X}_{4}\\ \text{}\text{ }\text{}-0.1829\ast {X}_{5}+0.1184\ast \mathrm{ln}\left({X}_{6}\right)-0.0208\ast {X}_{7}+0.1188\ast {X}_{8}-0.1647\ast \mathrm{ln}\left( X 9 \right)\end{array}$

Table 5. Model evaluation index for California

Table 6. Results of two-stage Logistic model for non-California

Table 7. Model evaluation index

$\begin{array}{l}Y=-2.2996+1.1104\ast \mathrm{ln}\left({X}_{1}\right)-0.0611\ast {X}_{2}-0.7215\ast {X}_{3}+0.0187\ast {X}_{4}\\ \text{}\text{\hspace{0.17em}}\text{ }-0.0499\ast {X}_{5}+0.2835\ast \mathrm{ln}\left({X}_{6}\right)-0.1031*\mathrm{ln}\left({X}_{7}\right)+0.3361\ast {X}_{8}+0.2484\ast {X}_{9}\end{array}$

4. 方法分析比较

(1) 加利福尼亚州。见表8表9图3

(2) 非加利福尼亚州。见表10表11图4

5. 结论与展望

Table 8. Scores of multiple Logistic model for California

Table 9. Scores of two-stage Logistic model for California

Table 10. Scores of multiple Logistic model for non-California

Table 11. Scores of two-stage Logistic model for non-California

Figure 3. Cumulative score comparison of two models for California

Figure 4. Cumulative score comparison of two models for non-California

Modeling and Prediction of Cross-Selling Problems in Big Data[J]. 应用数学进展, 2017, 06(09): 1236-1247. http://dx.doi.org/10.12677/AAM.2017.69149

1. 1. 汪涛, 崔楠. 国外交叉销售研究综述[J]. 外国经济与管理, 2005, 27(4): 43-49.

2. 2. Nash, E.L. (1993) Database Marketing: The Ultimate Marketing Tool. McGraw-Hill, New York.

3. 3. Deighton, J., Peppers, D. and Rogers, M. (1994) Consumer Transaction Databases: Present Status and Prospects. In: Blattberg, Glazer and Little, Eds., The Marketing Information Revolution, Harvard Business School Press, Boston, 58-79.

4. 4. 郭国庆. CRM与交叉销售在美国金融业的应用及其启示[J]. 山东大学学报, 2003(5): 79-84.

5. 5. 赵华, 宋顺林. 改进的序列模式挖掘算法在交叉营销中的应用[J]. 计算机工程与设计, 2007(5): 1219-1222.

6. 6. Li, C.Q., Qin, C.L. and Li, G. (2010) The Im-plication of Logistic Regression and Neural Nets in Cross-Selling of Bank’s Individual Customer. Proceedings of the Ninth Wuhan International Conference on E-Business Interface, Alfred University, USA, 2560-2564.

7. 7. 王雪莲. 针对保险交叉销售问题的Logistic建模与预测[D]: [硕士学位论文]. 南京: 东南大学, 2015.

8. 8. 王新军, 胡曼. 寿险交叉销售的聚类技术实务分析[J]. 保险研究, 2012(1): 86-95.

9. 9. 李玲瑶. 浅谈美国的住房问题[J]. 科技导报, 1986, 4(2): 44-47.