﻿ 交叉验证法在模型选择中的应用——以OLS和RR为例 Application of Cross-Validation in Model Selection—Take OLS and RR as Examples

Statistics and Application
Vol. 08  No. 01 ( 2019 ), Article ID: 28469 , 5 pages
10.12677/SA.2019.81004

Application of Cross-Validation in Model Selection

—Take OLS and RR as Examples

Yanshan Cao

Yunnan University of Finance and Economics, Kunming Yunnan

Received: Dec. 26th, 2018; accepted: Jan. 9th, 2019; published: Jan. 16th, 2019

ABSTRACT

This paper reviews the origin and development of cross-validation, and summarizes the previous research results. On this basis, leave-one-out cross-validation is used to solve some problems for model selection. OLS and RR were used to analyze the reaction of acetylene data, establishing appropriate models and selecting the optimal model. At the same time, the rationality and reality of the model selection were discussed.

Keywords:Cross-Validation, OLS, RR, Model Selection

——以OLS和RR为例

1. 交叉验证法概述

2. 数据分析

Table 1. Data of acetylene reaction

$A=\left(\begin{array}{ccc}{x}_{1}& {x}_{2}& {x}_{3}\end{array}\right)$ ，利用MATLAB求得x1、x2、x3的相关关系矩阵为：

$Cov\left(A\right)=\left(\begin{array}{ccc}1& 0.2236& -0.9582\\ 0.2236& 1& -0.2402\\ -0.9582& -0.2402& 1\end{array}\right)$

3. 模型分析

${y}_{i}={\beta }_{0}+{\sum }_{s\in S}{x}_{i,s}{\beta }_{s}+{\epsilon }_{i},\text{\hspace{0.17em}}i=1,\cdots ,16$ (1)

 (2)

(一) OLS法模型估计

(二) RR法模型估计

Table 2. Seven models for model estimation with OLS and RR

(三) 留一交叉验证法模型选择 [9]

Table 3. Prediction error of models with OLS and RR

4. 合理性探讨

5. 结语

Application of Cross-Validation in Model Selection—Take OLS and RR as Examples[J]. 统计学与应用, 2019, 08(01): 26-30. https://doi.org/10.12677/SA.2019.81004

1. 1. Larson, S.C. (1931) The Shrinkage of the Coefficient of Multiple Correlation. Journal of Educational Psychology, 22, 45-55. https://doi.org/10.1037/h0072400

2. 2. Stone, M. (1974) Cross-Validatory Choice and Assessment of Sta-tistical Prediction. Journal of the Royal Statistical Society: Series B (Methodological), 36, 111-147. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x

3. 3. Geisser, S. (1974) A Predictive Approach to the Random Effect Model. Biometrika, 61, 101-107. https://doi.org/10.1093/biomet/61.1.101

4. 4. Geisser, S. (1975) The Predictive Sample Reuse Method with Ap-plications. Journal of the American Statistical Association, 70, 320-328. https://doi.org/10.1080/01621459.1975.10479865

5. 5. Devroye, L.P. and Wagner, T.J. (1979) Distribution-Free Performance Bounds for Potential Function Rules. IEEE Transactions on Information Theory, 25, 601-604. https://doi.org/10.1109/TIT.1979.1056087

6. 6. Shao, J. (1993) Linear Model Selection by Cross-Validation. Journal of the American Statistical Association, 88, 486-494. https://doi.org/10.1080/01621459.1993.10476299

7. 7. Dietterich, T. (1998) Approximate Statisitical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10, 1895-1924. https://doi.org/10.1162/089976698300017197

8. 8. Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Applications to Nonorthogonal Problems. Technometrics, 12, 69-82. https://doi.org/10.1080/00401706.1970.10488635

9. 9. Celisse, A. (2008) Model Selection in Density Estimation via Cross-Validation. Density Estimation, 14, 1-39.