﻿ 基于经验累积分布的正态和均匀混合分布参数估计 The Parameter Estimation of the Mixture of Normal and Uniform Distribution Based on the Empirical Cumulative Distribution Function

Statistics and Application
Vol.05 No.03(2016), Article ID:18503,9 pages
10.12677/SA.2016.53024

The Parameter Estimation of the Mixture of Normal and Uniform Distribution Based on the Empirical Cumulative Distribution Function

Xiaoying Wang, Changlong Chen, Yinghua Li

School of Mathematics and Physics, North China Electric Power University, Beijing

Received: Aug. 20th, 2016; accepted: Sep. 4th, 2016; published: Sep. 9th, 2016

Copyright © 2016 by authors and Hans Publishers Inc.

ABSTRACT

The normal mixture model is easily influenced by the outlier, and the maximum likelihood estimation of parameters is not robust estimation. Fraley and Raftery propose a normal model with the addition of a uniform distribution that is regarded as the outlier’s distribution. It fits the observation data accurately. The maximum likelihood function is unbounded when the two parameters are near infinitely, because of the probability density function of the uniform distribution. It is impracticable for using the EM algorithm directly. We can specify the parameters of uniform distribution with two different points in observation data which are fixed in the iteration. Then the parameters are specified by the estimation values whose maximum likelihood function is maximum. Coretto and Henning propose the gridding method, but this method also has large amount of calculation and low efficiency. Based on above, we propose a new method based on empirical cumulative distribution function for the general situation parameter estimation of the mixture of normal and uniform distribution, first estimating the parameter of the uniform distribution, second estimating the mixing proportion and the parameter of the normal distribution. We can know from the numerical simulation that our method has the advantages in high efficiency, high estimation precision, less amount of calculation and easy implementation.

Keywords:Empirical Cumulative Distribution Function, EM Algorithm, Mixture Model

1. 引言

2. 模型介绍

(2.1)

(2.2)

(其中为分布所对应的标准差，)，但是这样使得在限制的参数空间上求极大似然估计过程更加困难。另外文献 [7] 在应用EM算法求极大似然估计的过程中，指定均匀分布的初始值后，每次迭代过程中均匀分布的参数估计值不变，因此这种方法需要遍历所有的观测数据为均匀分布的初始值，因此这种方法计算量大，且运算量随观测数据的规模指数增长，尽管其后来采用网格化的思想划分观测数据，但是这种方法的计算量仍然大，在大样本的情形下此方法不可行。本文提出的经验累积分布函数和EM算法相结合的方法能够极大地减小计算量，且不需要遍历整个数据集，提出的算法还能辅助EM算法选择合适的初始值。下面介绍本文如何应用经验累积分布函数估计均匀分布参数。

3. 经验累积分布函数估计均匀分布参数

3.1. 经验累积分布函数

(3.1)

3.2. 均匀分布参数估计

1) 运用核密度估计的方法求观测数据的概率密度函数，假设最大概率密度所对应的观测数据点为

2) 根据观测数据的密度选取合适的，确定在区间内的观测数据集合及其所对应的

Figure 1. Empirical cumulative distribution function

Figure 2. Fitting the empirical cumulative distribution function figure and the resident figure

3) 以集合为自变量数据集，以集合为因变量数据集进行线性回归；

4) 截取回归直线，使其因变量的取值为，确定合适的容差，选取经验累积分布函数和回归直线之间距离在容差内的观测值，设其为集合

5) 均匀分布的参数估计值分别为：

4. EM算法

(4.1)

(4.2)

(4.3)

E步：求完全数据的对数似然函数的条件期望，对于给定的一个初始值，第次迭代中得到的对数似然函数近似值为：

(4.4)

M步：最大化近似的对数似然函数，即最大化，通过化简和计算可以得到第次迭代结果为

(4.5)

5. 数值模拟

5.1. 模拟数据产生

5.2. 算法参数初始化

5.3. 参数估计模拟结果

Table 1. The parameter setting of simulated data

Table 2. The parameter estimation of simulation data

6. 结论

The Parameter Estimation of the Mixture of Normal and Uniform Distribution Based on the Empirical Cumulative Distribution Function[J]. 统计学与应用, 2016, 05(03): 237-245. http://dx.doi.org/10.12677/SA.2016.53024

1. 1. McLachlan, G. and Peel, D. (2004) Finite Mixture Models. John Wiley & Sons, New York, 11-14.

2. 2. 谭鲜明. 有限正态混合模型的参数估计及应用[D]: [博士学位论文]. 天津: 南开大学, 2002.

3. 3. Coretto, P. and Hennig, C. (2010) A Simulation Study to Compare Robust Clustering Methods Based on Mixtures. Advances in Data Analysis and Classification, 4, 111-135. http://dx.doi.org/10.1007/s11634-010-0065-4

4. 4. Fraley, C. and Raftery, A.E. (1998) How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis. The Computer Journal, 41, 578-588. http://dx.doi.org/10.1093/comjnl/41.8.578

5. 5. Dean, N. and Raftery, A.E. (2005) Normal Uniform Mixture Differential Gene Expression Detection for cDNA Microarrays. BMC Bioinformatics, 6, 1. http://dx.doi.org/10.1186/1471-2105-6-173

6. 6. Coretto, P. (2008) The Noise Component in Model-Based Clustering. Ph.D. Thesis, University of London, London.

7. 7. Coretto, P. and Hen-nig, C. (2011) Maximum Likelihood Estimation of Heterogeneous Mixtures of Gaussian and Uniform Distributions. Journal of Statis-tical Planning and Inference, 141, 462-473. http://dx.doi.org/10.1016/j.jspi.2010.06.024

8. 8. Dennis Jr., J.E. (1981) Algorithms for Nonlinear Fitting. Cambridge University Press, England.

9. 9. Rice, J. (2006) Mathematical Statistics and Data Analysis. Nelson Education, Australia, 378-380.

10. 10. 王豹. 浅谈经验分布函数的收敛性[J]. 徐州教育学院学报, 2008, 23(3): 80-81.

11. 11. 茆诗松, 王静龙, 濮晓龙. 高等数理统计[M]. 第二版. 北京: 高等教育出版社, 2006: 37-43.

12. 12. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.

13. 13. McLachlan, G. and Krishnan, T. (2007) The EM Algorithm and Extensions. 2nd Edition, John Wiley & Sons, New York.