Journal of Water Resources Research
Vol.06 No.05(2017), Article ID:21735,9 pages
10.12677/JWRR.2017.65050

Analysis of Correlation between River Flows Using Copula-Entropy Method

Kangdi Huang1, Lu Chen2*, Shenglian Guo3, Jianzhong Zhou1, Zhengying Yang1

1College of Hydropower & Information Engineering, Huazhong University of Science & Technology, Wuhan Hubei

2State Key Laboratory of Simulation and Regulation of River Basin Water Cycle, Beijing

3State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan Hubei

Received: Jul. 26th, 2017; accepted: Aug. 9th, 2017; published: Aug. 17th, 2017

ABSTRACT

Analysis of the dependence between the main stream and its upper tributaries is important for hydraulic design, flood prevention and risk control. In order to solve the disadvantages of the current hydrologic correlation analysis method, the method of copula entropy was introduced to estimate the dependence between Hydrological variables. The relationship between copula entropy and mutual information was discussed and the calculation procedures of copula entropy were deduced, and multiple integration and Monte Carlo methods were used to calculate the copula entropy. The upper Yangtze River was selected for case study. Results show that there is a significant difference in total correlation values, when different copula functions were used. The total correlation among the rivers is not high, and the one between Min and Tuo Rivers is the largest. There are some dependence among Jinsha, Min and Tuo Rivers, which constitutes a threat to flood control by the Three Gorges Dam (TGD). The flows of Jinsha, Jialing, Min and Tuo Rivers significantly influence the flood occurrence in the Yangtze River.

Keywords:Copula Entropy, Dependence, The Upper Yangtze River

1华中科技大学水电与数字化工程学院，湖北 武汉

2流域水循环模拟与调控国家重点实验室，北京

3武汉大学水资源与水电工程科学国家重点实验室，湖北 武汉

1. 引言

2. Copula熵

2.1. Copula熵的定义

$x\in {R}_{d}$ 为d维随机变量，其边缘分布函数为 ${F}_{i}\left(X\right)$${u}_{i}={F}_{i}\left(X\right)$$i=\text{1},\text{2},\cdots ,d$ 。其中， ${u}_{i}$ 为服从均匀分布的随机变量，将Copula函数的熵定义为Copula熵，可以表示为：

${H}_{c}\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)=-\underset{0}{\overset{1}{\int }}\cdots \underset{0}{\overset{1}{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\mathrm{log}\left(c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\right)\text{d}{u}_{1}\text{d}{u}_{2}\cdots \text{d}{u}_{d}$ (1)

$f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)=c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)$ (2)

$\begin{array}{c}H\left({X}_{1},{X}_{2},\cdots ,{X}_{d}\right)=-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\mathrm{log}\left[f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\mathrm{log}\left[c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\left\{\mathrm{log}\left[c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\right]+\sum _{i=1}^{d}\mathrm{log}\left[f\left({x}_{i}\right)\right]\right\}\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\cdot \mathrm{log}\left[c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ -\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\cdot \sum _{i=1}^{d}\mathrm{log}\left[f\left({x}_{i}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =A+B\end{array}$ (3)

$\begin{array}{c}A=-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\cdot \sum _{i=1}^{d}\mathrm{log}\left[f\left({x}_{i}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\cdot \sum _{i=1}^{d}\mathrm{log}\left[f\left({x}_{i}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\cdot \mathrm{log}\left[f\left({x}_{1}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-\cdots -\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\cdot \mathrm{log}\left[f\left({x}_{d}\right)\right]\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\mathrm{log}\left[f\left({x}_{1}\right)\right]\cdot \left[\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\text{d}{x}_{2}\cdots \text{d}{x}_{d}\right]\text{d}{x}_{1}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-\cdots -\underset{0}{\overset{\infty }{\int }}\mathrm{log}\left[f\left({x}_{d}\right)\right]\cdot \left[\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}f\left({x}_{1},{x}_{2},\cdots ,{x}_{d}\right)\text{d}{x}_{1}\text{d}{x}_{2}\cdots \text{d}{x}_{d-1}\right]\text{d}{x}_{d}\\ =-\sum _{i=1}^{d}\underset{0}{\overset{\infty }{\int }}f\left({x}_{i}\right)\mathrm{log}\left[f\left({x}_{i}\right)\right]\text{d}{x}_{i}=\sum _{i=1}^{d}H\left({X}_{i}\right)\end{array}$ (4)

$\begin{array}{c}B=-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\prod _{i=1}^{d}f\left({x}_{i}\right)\cdot \mathrm{log}\left[c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\right]\text{d}{u}_{1}\text{d}{u}_{2}\cdots \text{d}{u}_{d}\\ =-\underset{0}{\overset{\infty }{\int }}\cdots \underset{0}{\overset{\infty }{\int }}c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\cdot \mathrm{log}\left[c\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)\right]\text{d}{u}_{1}\text{d}{u}_{2}\cdots \text{d}{u}_{d}\\ ={H}_{C}\left(u\right)\end{array}$ (5)

$H\left({X}_{1},{X}_{2},\cdots ,{X}_{d}\right)=\sum _{i=1}^{d}H\left({X}_{i}\right)+{H}_{C}\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)$ (6)

2.2. Copula熵的计算

2.2.1. 多重积分法

2.2.2. 蒙特卡罗法

${H}_{C}\left({u}_{1},{u}_{2},\cdots ,{u}_{d}\right)=-\underset{{\left[0,1\right]}^{d}}{\int }c\left(U\right)\mathrm{ln}c\left(U\right)\text{d}U=-E\left[\mathrm{ln}c\left(U\right)\right]$ (7)

Copula熵等于lnc(U)的期望值，可以通过蒙特卡罗法求得，与多重积分法类似，首先需确定Copula函数的相关性系数和参数，从而确定Copula函数，然后计算-lnc(U)的期望值。

3. 总相关

$I\left({X}_{1},{X}_{2},\cdots ,{X}_{d}\right)=\sum _{i=1}^{d}H\left({X}_{i}\right)-H\left({X}_{1},{X}_{2},\cdots ,{X}_{d}\right)$ (8)

$I\left({X}_{1},{X}_{2},\cdots ,{X}_{d}\right)=-{H}_{C}\left(u\right)=-B$ (9)

4. 应用研究

4.1. 研究区域

Table 1. Dependence measures for the upper Yangtze River, China, based on annual maximum data

Figure 1. Results of fitting by POME method and normal distribution

4.2. 拟合优度检验

$\text{AIC}=2k-2\mathrm{ln}\left(L\right)$ (10)

4.3. 双变量模型

Table 2. Selections of copulas and determination of the parameters

Table 3. Total correlation values of two tributaries in upstream Yangtze River

${I}_{N}=-\frac{1}{2}\mathrm{log}\left(1-{\rho }^{2}\right)$ (11)

4.4. 多变量模型

Table 4. Comparisons of total correlation of Gaussian correlated variables in different methods

Table 5. Total correlation analysis of trivariate joint distribution

Table 6. Total correlation analysis of four-dimensional joint distribution

Table 7. Total correlation analysis of five-dimensional joint distribution

5. 结论

1) 应用阿基米德和椭圆Copula建立多元变量的联合分布。一般阿基米德Copula更适合于维数少的情况，

2) 基于信息理论和Copula函数，引入Copula熵方法，这是一种非参数方法，可以表示线性和非线性相关性，且不对边缘分布做出假设，可用于更高的维数。此外，该方法仅需要通过计算Copula熵来直接地估计总相关性，而不需要计算边缘熵和联合熵的值，从而避免了偏差的累积效应。

3) 使用多重积分法和蒙特卡罗法计算总相关值，得到的计算结果相近。对于特殊情况，可以采用不同类型的Copula函数，当使用不同的Copula函数时，总相关值存在显著差异，因此，选择合适的Copula函数对于相关性的估计是很重要的。

4) 计算结果表明，河流之间的总相关性不是很大，这与研究区域的气候特征有关。由于岷江和沱江这两条河流之间的距离最短，且属于同一暴雨区，所以它们之间的总相关系数最大为0.33。金沙江，岷江和沱江之间也存在一定的相关性，在平水年，由于长江左岸和右岸一般不会同时发生降雨，所以岷江和乌江的相关性不能忽视。由于金沙江，岷江，嘉陵江，沱江河流存在较大相关性，所以这几条河流有洪水遭遇的可能，且对三峡大坝的防洪构成了威胁。

Analysis of Correlation between River Flows Using Copula-Entropy Method[J]. 水资源研究, 2017, 06(05): 426-434. http://dx.doi.org/10.12677/JWRR.2017.65050

