﻿ 基于机器学习方法的上证综合指数的预测分析 Forecast Analysis of Shanghai Composite Index Based on Machine Learning Method

Hans Journal of Data Mining
Vol.06 No.01(2016), Article ID:16757,8 pages
10.12677/HJDM.2016.61001

Forecast Analysis of Shanghai Composite Index Based on Machine Learning Method

Rengkang Wu

School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming Yunnan

Received: Dec. 25th, 2015; accepted: Jan. 11th, 2016; published: Jan. 14th, 2016

ABSTRACT

The Shanghai composite index is an important index that general investors pay close attention to. Shanghai composite index, which not only reflects the basic situation of the stock market in our country, but also takes an important guiding role to our economy. Prediction of Shanghai composite index and trend analysis plays an important role to stabilize market and guide investors. And stock market data are a typical nonlinear system; traditional statistical forecasting methods predict a low accuracy. In this paper, we use R software comprehensively and combine with the latest six kinds of methods in machine learning field, decision tree, boosting, bagging, random forests, support vector machine (SVM), neural network to train the training set, respectively, get the corresponding model. And set up the corresponding ten-fold cross validation to calculate the prediction mean square error of each method for comparison. Select the model with better effect, and make a visualized comparison between prediction data and real data. Analysis shows that the results of random forests, SVM are more fitting, and have high precision.

Keywords:Shanghai Composite Index, Machine Learning, Random Forests, SVM

1. 研究背景

1.1. 技术分析方法

1.2. 传统统计学预测方法

1.3. 基本面分析方法

1.4. 机器学习方法

2. 各类机器学习回归方法及其交叉验证

2.1. 数据预处理

2.2. 建立十折交叉验证集

2.3. 各类机器学习方法R软件的实现

2.3.1. 决策树

#分类树回归R语言处理主要程序：

library(rpart);library(rpart.plot);(a=rpart(v5~.,w)) ;rpart.plot(a,type=2) 其中决策树图，见图1

2.3.2. Boosting

Boosting是一种提高任意给定学习算法准确度的方法。它的思想起源于Valiant提出的PAC (Probably Approximately Correct)学习模型。Boosting方法也是一种用来提高弱分类算法准确度的方法，这种方法通过构造一个预测函数系列，然后以一定的方式将他们组合成一个预测函数。他是一种框架算法，主要是通过对样本集的操作获得样本子集,然后用弱分类算法在样本子集上训练生成一系列的基分类器。他可以用来提高其他弱分类算法的识别率。

#boosting回归R语言处理主要程序：

library(mboost); gg1=v5~btree(v1)+btree(v2)+btree(v3)+btree(v4);a=mboost(gg1,data =w)

Boosting十折交叉验证均方误差为：0.002187182。

Figure 1. Decision tree diagram

2.3.3. Bagging

Bagging是一种比Boosting简单的组合方法。在bagging中，就是不断放回地对训练样本进行再抽样(自助法样本)，每次再抽样的样本量和原来样本量一样。对每个自助法样本，都建立一棵回归树，最终，对于任何一个观测值，每棵树都给出一个预测值，最终的预测值为这些值的简单平均。Bagging能用来提高学习算法准确度。

#bagging回归R语言处理主要程序：

library(ipred);set.seed(4410);a=bagging(v5~.,w)

bagging十折交叉验证均方误差为：0.02317248。

2.3.4. 随机森林

#随机森林回归R语言处理主要程序：

library(randomForest) ;set.seed(10);a=randomForest(v5~.,w,importance=TRUE,proximity=TRUE)

2.3.5. 支持向量机

#支持向量机回归R语言处理主要程序：

library(e1071);a=svm(v5~., data = w,kernal=sigmoid)

2.3.6. 神经网络

#神经网络回归以及绘图R语言处理主要程序：

library(neuralnet);library(MASS);v=w;v\$v5=v\$v5/max(w[,5]);set.seed(1010)#w\$v5<=max(w[,5])

b=neuralnet(v5~v1+v2+v3+v4,data=v,err.fct=ssehidden=6,linear.output=FALSE);plot(b)

2.4. 各类机器学习十折交叉验证结果

3. 预测数据与真实数据的可视化对比

Figure 2. Randon forest diagram

Table 1. Mean square error of ten fold cross validation for six kinds of machine learning methods

Figure 3. Four machine learning methods to predict the values and the real value of the visual comparison

4. 结论与建议

Forecast Analysis of Shanghai Composite Index Based on Machine Learning Method[J]. 数据挖掘, 2016, 06(01): 1-8. http://dx.doi.org/10.12677/HJDM.2016.61001

1. 1. 黄伯中. 技术分析原理[M]. 香港: 明报出版社, 1995: 12-30.

2. 2. 鲍志强. 证券投资技巧与理论[M]. 南京: 河海大学出版社, 1991: 54-81.

3. 3. 马超群, 高仁祥. 现代预测理论与方法[M]. 长沙: 湖南大学出版社, 1998.

4. 4. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Edition, Chapman and Hall, London. http://dx.doi.org/10.1007/978-1-4899-3242-6

5. 5. Granger, C.W.J. (1980) Long Memory Relationships and the Aggregation of Dynamics Models. Journal of Econometrics, 14, 227-238. http://dx.doi.org/10.1016/0304-4076(80)90092-5

6. 6. Bollerslev, T. (1986) Generalized Autoregressive Condi-tional Heteroskedasticity. Journal of Econometrics, 31, 307- 327. http://dx.doi.org/10.1016/0304-4076(86)90063-1

7. 7. 赵传刚. 我国A股市场量价关系的实证分析[D]. 南昌: 江西财经大学, 2007: 20-22.

8. 8. 曹赛玉. 几种决策概率模型在现实生活中的应用[J]. 理论与实践理论月刊, 2006(5): 91-93.