﻿ 主基底分析方法及在水质监测指标筛选中的研究 Principal Basis Analysis and Application in Feature Selection of Water Quality Data

Modeling and Simulation
Vol.05 No.04(2016), Article ID:19104,7 pages
10.12677/MOS.2016.54025

Principal Basis Analysis and Application in Feature Selection of Water Quality Data

Hui Zou1,2, Zhihong Zou1*, Xiaojing Wang1

1School of Economics and Management, Beihang University, Beijing

2School of Science, China Agricultural University, Beijing

Received: Nov. 11th, 2016; accepted: Nov. 27th, 2016; published: Nov. 30th, 2016

ABSTRACT

With the increasing emphasis on the environment and the improvement of monitoring technology, there appear more and more multivariate data in which the variable sets have multi-collinearity problem. The water quality data of Taizi River belong to this kind of data. In order to avoid the limitation of the traditional method, the principal basis analysis method based on the Gram- Schmidt transform is used to the feature selection of the water quality data of the Taizi River. This method selects information effectively from the large-scale variable set with the minimal loss of original information. Meanwhile, this method can exclude all redundant variables and reduplicate information. Furthermore, it can obtain a mini-dimensional orthogonal basis. Using the measurement of the net information content ratio of the selected features, it is effective to select the representative water quality monitoring variables. It is conducive to the improvement of water quality monitoring work and the experimental results indicate the effectiveness of this method.

Keywords:Gram-Schmidt Transform, Principal Basis, Variable Selection

1北京航空航天大学经济管理学院，北京

2中国农业大学理学院，北京

1. 引言

2. 基于Gram-Schmidt变换的变量筛选方法

2.1. 主基底分析方法

1) 对向量进行标准化；

2) 令，要求满足

(1)

3) 对于剩下的变量，继续分别和进行Gram-Schmidt变换，得到一组相应的变量：

(2)

4) 对于，选择这组变量中方差最大的一个记为，that is

(3)

5) 对于剩下的变量，分别和进行Gram-Schmidt变换。得到一组相应的变量：

(4)

6) 对于，选择具有最大方差的变量记为，即

(5)

7) 重复以上的过程，得到一组相互正交的向量

(6)

2.2. 主基底分析和主成分分析的比较

(7)

3. 案例研究

3.1. 研究区域及数据

3.2. 水质监测指标的筛选

Table 1. Original data

Table 2. Values of

Table 3. Variance of

Table 4. RNI of variables

Figure 1. The cumulative net information content ratio

4. 结论

Principal Basis Analysis and Application in Feature Selection of Water Quality Data[J]. 建模与仿真, 2016, 05(04): 198-204. http://dx.doi.org/10.12677/MOS.2016.54025

1. 1. Shrestha, S. and Kazama, F. (2007) Assessment of Surface Water Quality Using Multivariate Statistical Techniques: A Case Study of the Fuji River Basin, Japan. Environmental Modelling & Software, 22, 464-475. http://dx.doi.org/10.1016/j.envsoft.2006.02.001

2. 2. Kowalkowski, T., Zbytniewski, R., et al. (2006) Application of Chemo-metrics in River Water Classification. Water Research, 40, 744-752. http://dx.doi.org/10.1016/j.watres.2005.11.042

3. 3. Wang, X., Lu, Y., et al. (2007) Identification of Anthropogenic Influences on Water Quality of Rivers in Taihu Watershed. Journal of Envi-ronmental Sciences, 19, 475-481. http://dx.doi.org/10.1016/S1001-0742(07)60080-1

4. 4. Juahir, H., Zain, S.M., et al. (2011) Spatial Water Quality Assessment of Langat River Basin (Malaysia) Using Environmetric Techniques. Environmental Monitoring and Assessment, 173, 625-641. http://dx.doi.org/10.1007/s10661-010-1411-x

5. 5. Venkatesharaju, K., Somashekar, R.K., et al. (2010) Study of Seasonal and Spatial Variation in Surface Water Quality of Cauvery River Stretch in Karnataka. Journal of Ecology and the Natural Environment, 2, 1-9.

6. 6. Singh, K.P., Malik, A., et al. (2005) Water Quality Assessment and Apportionment of Pollution Sources of Gomti River (India) Using Multivariate Statistical Techniques—A Case Study. Analytica Chimica Acta, 538, 355-374. http://dx.doi.org/10.1016/j.aca.2005.02.006

7. 7. Zhou, F., Liu, Y., et al. (2007) Application of Multivariate Statistical Methods to Water Quality Assessment of the Watercourses in Northwestern New Territories, Hong Kong. Environmental Monitoring and As-sessment, 132, 1-13. http://dx.doi.org/10.1007/s10661-006-9497-x

8. 8. Wang, Y., Liu, C., et al. (2013) Spatial Pattern Assessment of River Water Quality: Implications of Reducing the Number of Monitoring Stations and Chemical Parameters. Environmental Monitoring and Assessment, 186, 1781- 1792. http://dx.doi.org/10.1007/s10661-013-3492-9

9. 9. Tanaka, Y. and Mori, Y. (1997) Principal Component Analysis Based on a Subset of Variables: Variable Selection and Sensitivity Analysis. American Journal of Mathematical and Management Sciences, 17, 61-89. http://dx.doi.org/10.1080/01966324.1997.10737430

10. 10. 王惠文, 仪彬, 叶明. 基于主基底分析的变量筛选[J]. 北京航空航天大学学报, 2008, 34(11): 1288-1291.