﻿ 基于广义加性模型的北京市PM2.5浓度影响因素分析 Analysis of Beijing PM2.5 Concentration Effect Factors Based on Generalized Additive Models

Hans Journal of Data Mining
Vol.06 No.04(2016), Article ID:18806,11 pages
10.12677/HJDM.2016.64019

Analysis of Beijing PM2.5 Concentration Effect Factors Based on Generalized Additive Models

Xiaotong Li*, Mo Zhang

College of Science, China University of Petroleum (Beijing), Beijing

Received: Oct. 8th, 2016; accepted: Oct. 24th, 2016; published: Oct. 27th, 2016

Copyright © 2016 by authors and Hans Publishers Inc.

ABSTRACT

In recent years, air pollution in Beijing is increasingly serious; PM2.5 has caused widespread concern in the community. There are obvious limitations about influencing factors’ type and model selection in current studies on the influence factors of PM2.5 concentration in Beijing. Based on the above two points, this paper conducted a generalized additive model which regarded PM2.5 concentration as response variable and influence factors as predictor variable, and the results showed that: the factors influencing PM2.5 concentration include NO2 concentration, wind speed, temperature, month, CO concentration, O3 concentration and humidity. This paper also conducted a linear regression model in order to make contrast, and the results showed that the goodness of fit of the additive model is much better than the linear model.

Keywords:PM2.5 Concentration, Generalized Additive Model, Linear Regression Model

1. 引言

2. 广义加性模型

(1)

(2)

(3)

1) 赋初值

2) 迭代过程

① 根据上一次迭代构造校整响应变量

② 构造权重

③ 利用加权backfitting算法求解以Z为响应变量的加性模型，我们将得到各光滑加性项的估计值。

3) 重复2)直到不再减少，其中指估计值的离差。

Backfitting算法的具体过程如下：

1) 初始化

2) 迭代过程

Table 1. Algorithmic process

3. 变量的选取

1) 响应变量

2) 预测变量

PM2.5中的有机碳主要来源于 ，而无机盐则主要是通过空气中的SO2、NO2等污染物进行光化学

Figure 1. Lnpm histogram

4. 建立广义加性模型

Table 2. The steps of building model

Table 3. The steps of BE

Table 4. Parameter estimates

Table 5. Fit summary for smoothing components

Table 6. Analysis of deviance

Table 7. Goodness of fit

Figure 2. Table of comparison between real value and GAM prediction value

5. 模型比较

Figure 3. On the prediction of ozone concentration

Figure 4. The forecast of humidity

6. 总结

Figure 5. Forecast of the month

Figure 6. The prediction of CO concentration

Table 8. Parameter estimates

Table 9. Goodness of fit

Figure 7. Table of comparison between the real value and the predicted value of the two models

1) 克服了现有研究中影响因素和模型选择的局限性：现有研究一般都是简单地以时间、气象条件或前体物中的一类作为自变量，而本文同时选取了这三类影响因素作为自变量，更加全面；常用的线性模型忽略了实际数据不满足线性假定的事实，而本文建立的广义加性模型不需要有线性假设，更适合空气质量领域的数据分析。

2) 模型拟合优度好，能解释PM2.5浓度87.70%的变化。本文建立的广义加性模型其AIC值达到了−816，MSE达到了0.1063，且模型对2016年的预测值十分准确，以上都说明了该模型的拟合优度好，很适合处理本文中的数据。

3) 首次将后向剔除法与模型半参数化调整相结合，在保证拟合优度的同时，也确保了拟合曲线的光滑度。本文使用的后向剔除法以显著性和光滑参数为标准，对不符合标准的变量不是简单地剔除，而是对其进行参数估计，这样处理保证了拟合的优度和光滑度，也避免了计算量大、过度拟合的问题，同时其参数部分也更便于解释，可谓一石三鸟。

Analysis of Beijing PM2.5 Concentration Effect Factors Based on Generalized Additive Models[J]. 数据挖掘, 2016, 06(04): 168-178. http://dx.doi.org/10.12677/HJDM.2016.64019

1. 1. 云慧, 何凌燕, 黄晓锋, 等. 深圳市PM2.5化学组成与时空分布特征[J]. 环境科学, 2013, 34(4): 1245-1251.

2. 2. 陈云进, 王劲. 基于监测统计的昆明PM2.5主要来源与污染气象因素分析[J]. 环境科学导刊, 2015(1): 37-44.

3. 3. 贾艳红, 陆赛娣, 冯小莉, 等. 中国雾霾分布及其组成相关性分析[J]. 测绘与空间地理信息, 2015(12): 9-12.

4. 4. 张人禾, 李强, 张若楠. 2013年1月中国东部持续性强雾霾天气产生的气象条件分析[J]. 中国科学: 地球科学, 2014, 44(1): 27-36.

5. 5. Stone, C.J. (1985) Additive Regression and Other Nonparametric Models. Annals of Statistics, 13, 689-705. http://dx.doi.org/10.1214/aos/1176349548

6. 6. Hastie, T. and Tibshirani, R. (1986) Generalized Additive Models. Statistical Science, 1, 297-310. http://dx.doi.org/10.1214/ss/1177013604

7. 7. 罗丹. 中国国债超额回报率的拟合和预测[D]: [硕士学位论文]. 湘潭: 湘潭大学,2014.

8. 8. 贾彬, 王彤, 王琳娜, 等. 广义可加模型共曲线性及其在空气污染问题研究中的应用[J]. 第四军医大学学报, 2005, 26(3): 280-283..