﻿ 用SAS MACRO程序建立多项式模型与变量筛选 Using SAS MACRO Programs to Build a Polynomial Model and Do the Selection of Variables

Vol.04 No.02(2015), Article ID:15248,9 pages
10.12677/AAM.2015.42021

Using SAS MACRO Programs to Build a Polynomial Model and Do the Selection of Variables

Kui-Jang Wang

Department of Mathematics, Tamkang University, New Taipei Taiwan

Email: kjwang@math.tku.edu.tw

Received: Apr. 28th, 2015; accepted: May 15th, 2015; published: May 20th, 2015

ABSTRACT

The purpose of this paper is trying to provide a useful solution to build a polynomial model. In the past years, there are a few applications on polynomial model; the reason is that it is difficult to create a large number of variables. For example, if you want to build a 3rd order polynomial with 5 variables, then you need 55 variables. If the variables increase to 18, then a 2nd order polynomial model will need 189 variables. It is far away from our ability. That is the reason why I wrote the following programs. There are 3 major reasons that I would like to deal with the polynomial model: 1) if the unknown model was smooth plan curve, then a polynomial model can provide an acceptable approximation. This can be easily seen from the Taylor’s polynomial; 2) as long as we have enough observations, then using a high order polynomial model can solve the unfitted problems; 3) it can avoid deleting important variables from the selection steps, since it is not easy to remove a variable completely from the model because there are too many cross product terms shown in the model. This paper will provide 2 major SAS MACRO programs, %Homopoly and %Model_Selection. The first program is used to generate a polynomial model and the next one will provide summarized result tables similar to the Table 11.8 of Montgomery [1] including the information of the models and necessary statistics. Users can easily apply to do the further analysis. To write those programs, I also wrote another 20 SAS MACRO programs which can be downloaded from the web-site http://tsp.ec.tku.edu.tw/QuickPlace/054569qp/Main. nsf/h_Toc/BADD7D0BFF0904A1482576D300229684/?OpenDocument. Please follow the instruction given by the readme.txt file.

Keywords:Polynomial Model, Taylor’s Polynomial, SAS MACRO

Email: kjwang@math.tku.edu.tw

1. 引言

2. 如何用%Homopoly建立多项式数据文件

Table 1. The output data files provided by % HOMOPOLY

Table 2. The table of input parameters of % HOMOPOLY

[例题1.1]：介绍程序在数据量不足时如何运作，我先用下列程序产生76笔数据，然后要求程序去产生依完整的4次多项式数据。但是要产生这样的模型需要125笔数据，因此程序自动降成4次多项式而交叉相乘项的次方最大为3次。以下为SAS程序；其中用了一个程序，%VAR_NAME(X,END=5)，用来产生一串文字“X1 X2 X3 X4 X5”。

DATA INPUT_D;

DO I = 1 TO 76;

X1 = 1; X2 = 2; X3 = 3; X4 = 4; X5 = 5; Y = I; OUTPUT;

END；*产生76笔数据;

%HOMOPOLY(IN_DATA =INPUT_D, OUT_DATA =OUTPUT, X = %VAR_NAME(X,END=3), Y=Y,

DEGREE =3, C_CROSS =3, PRINT = YES, FOOTNOTE = YES, N_FOOT=5);

3. 如何用%Model_Selection得到模型

1) 提供6个回归模型；完整模型(Full Model)，线性模型(Linear Model)，前进搜寻法(Forward Selection)，后退搜寻法(Backward Selection)，逐步搜寻法(Stepwise Selection)，与CP选择法得到的模型的预估值与可供选取模型参考的统计量。另一为根据输入的模型提供相同的报表。

2) 产生输出档案用来做常态分配与变异数的一致性的检定。

3) 可以同时对未来值作预测。

Figure 1. The table for true variables and their corresponding regressors stored in WORK.OUT_NAME

Figure 2. Table of label for each of regressors stored in WORK.POLYNAME

Figure 3. The output data file with 5 observations and 22 variables

Table 3. The contents of the data file MOD_HOMO

%Model_Selection (DATA_IN =EXAMPLE.RIDGE_S20, X= T H C TH TC HC T2 H2 C2, Y= P, X_LIN= T H C, CHECK = YES, ID = ID_ID, ID2 = ID, GROUP=TEMP, F_MODEL=YES, STAT=VIF, MODEL_IN=NO, SLENTRY= 0.2, SLSTAY = 0.2);

[注]：

1) 选项X_FULL在MODEL_IN = NO时可以不给，程序会用&X来取代。

2) F_MODEL = YES是要求程序行印包含全部变量的模型。

3) STAT选项是选择打印VIF (variance inflation factor)的值，因为除了全部模型与线性模型外p-value值均小于0.2。

WARNING: The NOINT option is ignored in the computation of ridge regression.

1) 使用未标准化的数据：%Model_Selection(DATA_IN = EXAMPLE.RIDGE_20, X = T H C TH TC HC T2 H2 C2, Y = P, X_LIN = T H C, CHECK = YES, ID = ID_ID, ID2 = ID, GROUP = TEMP, F_MODEL=YES, STAT=VIF, MODEL_IN=NO, SLENTRY= 0.2, SLSTAY = 0.2);

2) 使用未标准化的数据：%Model_Selection(DATA_IN = EXAMPLE.RIDGE_16, X = T H C TH TC HC T2 H2 C2, Y = P, X_LIN = T H C, CHECK = YES, CHECK_D = EXAMPLE.RIDGE_04, ID = ID_ID, ID2 = ID, GROUP = TEMP, F_MODEL=YES, STAT=VIF, MODEL_IN=NO, SLENTRY= 0.2, SLSTAY = 0.2);

Table 4. The table of input parameters of %Model_Selection

[注]：1) 如果MODEL_IN = NO则程序会自动产生一数据文件“MMODELS”包含模型名称(M_NAME)，自变量名称(X)，应变量名称(Y)，文件名(FILENAME)与组名(GROUP)。2) 程序会检查输入数据文件有否包含遗漏值(Missing Value)，会产生三个数据文件，CHECK_D01 (加入检察预测值的数据文件)，MISSING(储存遗漏值)与N_of_miss_val (遗漏值的笔数)。3) 如果CHECK = YES，则程序会检察是否有输入预测值的数据文件“&CHECK_D”如果没有则会去数据文件中找遗漏值然后存于“CHECK_D01”。注意：如果数据文件中有遗漏值则程序会主动将之加入预测档案中，所以请小心对待数据文件中的遗漏值。

Figure 4. The data of file MMODELS.sas7bdat

Table 6.The data of file,Ridge_S16.sas7bdat

Figure 5. The predictions of referent and predicted points

4. 总结与展望

Using SAS MACRO Programs to Build a Polynomial Model and Do the Selection of Variables. 应用数学进展,02,162-171. doi: 10.12677/AAM.2015.42021

1. 1. Montgomery, D.C., Peck, E.A. and Vining, G.G. (2006) Introduction to linear regression analysis. 4th Edition, Willey, New York.

2. 2. Wang, K.-J. (2013) Notes for regression analysis. Tamkang University, New Taipei.