﻿ 简单随机抽样中七个非常实用的R函数 Seven Very Practical R Functions in Simple Random Sampling

Statistics and Application
Vol. 11  No. 01 ( 2022 ), Article ID: 48541 , 14 pages
10.12677/SA.2022.111007

1重庆大学数学与统计学院统计与精算学系，重庆

2重庆大学分析数学与应用重庆市重点实验室，重庆

Seven Very Practical R Functions in Simple Random Sampling

Chen Bai1*, Ying-Ying Zhang1,2*

1Department of Statistics and Actuarial Science, College of Mathematics and Statistics, Chongqing University, Chongqing

2Chongqing Key Laboratory of Analytic Mathematics and Applications, Chongqing University, Chongqing

Received: Jan. 9th, 2022; accepted: Jan. 22nd, 2022; published: Feb. 9th, 2022

ABSTRACT

Simple random sampling is the most basic, mature, and simple sampling design method in sampling technology. In this paper, R software is used to program the point estimation and interval estimation of population mean and total value, the determination of sample size, and the estimation of total and mean value of sub population in simple random sampling. For simple random sampling, we compile seven very practical R functions (programs): compute_Y_bar_srs(), compute_Y_srs(), compute_P_N1_srs(), compute_n0_n_Y_bar_srs(), compute_n0_n_P_srs(), compute_Y_j_srs(), and compute_Y_bar_j_srs(), which will provide great convenience for users who need to use simple random sampling to analyze practical problems.

Keywords:Simple Random Sampling, Population Mean and Total Value, Point Estimation and Interval Estimation, The Determination of Sample Size, R Function

1. 引言

2. 简单随机抽样中七个非常实用的R函数及应用举例

R函数1：compute_Y_bar_srs()

compute_Y_bar_srs = function(y_vector, n, N, alpha, replace = c(FALSE, TRUE)){

t = qnorm(1 - alpha / 2)

f = n / N

s2 = var(y_vector)

y_bar = mean(y_vector)

if (missing(replace)) replace = FALSE

if (replace == FALSE){

v_y_bar = (1 - f) * s2 / n

}

else{

v_y_bar = s2 / n

}

se_y_bar = sqrt(v_y_bar)

L = y_bar - t * se_y_bar

U = y_bar + t * se_y_bar

res = data.frame(t, f, s2, y_bar, v_y_bar, se_y_bar, L, U)

}

Table 1. Sample data of telecom consumption of 36 college students in a month

$t={Z}_{\alpha /2}\approx 1.959964,\text{}f=\frac{n}{N}\approx 0.002363756,\text{}{s}^{2}\approx 1358.409$

$\stackrel{¯}{y}=\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{y}_{i}\approx 53.63889,\text{}v\left(\stackrel{¯}{y}\right)=\frac{1-f}{n}{s}^{2}\approx 37.64438,\text{}se\left(\stackrel{¯}{y}\right)=\sqrt{v\left(\stackrel{¯}{y}\right)}\approx 6.135502$

$L=\stackrel{¯}{y}-t\cdot se\left(\stackrel{¯}{y}\right)\approx 41.61353,\text{}U=\stackrel{¯}{y}+t\cdot se\left(\stackrel{¯}{y}\right)\approx 65.66425$

> rm(list=ls(all=TRUE))

> source(subfunctions.R)

>

>y_vector =

+ c(45, 36, 7, 13, 170, 89, 33, 75, 22, 56, 79, 5,

+ 48, 53, 24, 39, 41, 93, 19, 59, 111, 64, 35, 76,

+ 83, 51, 33, 25, 28, 90, 17, 57, 43, 146, 19, 47)

> n = 36

> N = 15230

> alpha = 0.05

>

> # 不放回

> # 默认replace = FALSE

>res_F = compute_Y_bar_srs(y_vector, n, N, alpha, replace = FALSE); res_F

t f s2 y_barv_y_barse_y_bar

1 1.959964 0.002363756 1358.409 53.63889 37.64438 6.135502

L U

1 41.61353 65.66425

> res_F_1 = compute_Y_bar_srs(y_vector, n, N, alpha); res_F_1

t f s2 y_barv_y_barse_y_bar

1 1.959964 0.002363756 1358.409 53.63889 37.64438 6.135502

L U

1 41.61353 65.66425

$\stackrel{¯}{y}=\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{y}_{i}\approx 53.63889,\text{}v\left(\stackrel{¯}{y}\right)=\frac{{s}^{2}}{n}\approx 37.73358,\text{}se\left(\stackrel{¯}{y}\right)=\sqrt{v\left(\stackrel{¯}{y}\right)}\approx 6.142766$

$L=\stackrel{¯}{y}-t\cdot se\left(\stackrel{¯}{y}\right)\approx 41.59929,\text{}U=\stackrel{¯}{y}+t\cdot se\left(\stackrel{¯}{y}\right)\approx 65.67849$

> # 放回

>res_T = compute_Y_bar_srs(y_vector, n, N, alpha, replace = TRUE); res_T

t f s2 y_barv_y_barse_y_bar

1 1.959964 0.002363756 1358.409 53.63889 37.73358 6.142766

L U

1 41.59929 65.67849

R函数2：compute_Y_srs()

compute_Y_srs = function(y_vector, n, N, alpha, replace = c(FALSE, TRUE)){

t = qnorm(1 - alpha / 2)

f = n / N

s2 = var(y_vector)

y_bar = mean(y_vector)

Y_hat = N * y_bar

if (missing(replace)) replace = FALSE

if (replace == FALSE){

v_y_bar = (1 - f) * s2 / n

}

else{

v_y_bar = s2 / n

}

v_Y_hat = N^2 * v_y_bar

se_Y_hat = sqrt(v_Y_hat)

L = Y_hat - t * se_Y_hat

U = Y_hat + t * se_Y_hat

res = data.frame(t, f, s2, y_bar, v_y_bar, Y_hat, v_Y_hat, se_Y_hat, L, U)

}

$t={Z}_{\alpha /2}\approx 1.959964,\text{}f=\frac{n}{N}\approx 0.002363756,\text{}{s}^{2}\approx 1358.409$

$\stackrel{¯}{y}=\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{y}_{i}\approx 53.63889,\text{}v\left(\stackrel{¯}{y}\right)=\frac{1-f}{n}{s}^{2}\approx 37.64438$

$\stackrel{^}{Y}\approx 816920.3,\text{}v\left(\stackrel{^}{Y}\right)={N}^{2}v\left(\stackrel{¯}{y}\right)\approx 8731723778,\text{}se\left(\stackrel{^}{Y}\right)=\sqrt{v\left(\stackrel{^}{Y}\right)}\approx 93443.69$

$L=\stackrel{^}{Y}-t\cdot se\left(\stackrel{^}{Y}\right)\approx 633774,\text{}U=\stackrel{^}{Y}+t\cdot se\left(\stackrel{^}{Y}\right)\approx 1000067$

> rm(list=ls(all=TRUE))

> source(subfunctions.R)

>

>y_vector =

+ c(45, 36, 7, 13, 170, 89, 33, 75, 22, 56, 79, 5,

+ 48, 53, 24, 39, 41, 93, 19, 59, 111, 64, 35, 76,

+ 83, 51, 33, 25, 28, 90, 17, 57, 43, 146, 19, 47)

> n = 36

> N = 15230

> alpha = 0.05

>

> # 不放回

> # 默认replace = FALSE

>res_F = compute_Y_srs(y_vector, n, N, alpha, replace = FALSE); res_F

t f s2 y_barv_y_barY_hat

1 1.959964 0.002363756 1358.409 53.63889 37.64438 816920.3

v_Y_hatse_Y_hat L U

1 8731723778 93443.69 633774 1000067

> res_F_1 = compute_Y_srs(y_vector, n, N, alpha); res_F_1

t f s2 y_barv_y_barY_hat

1 1.959964 0.002363756 1358.409 53.63889 37.64438 816920.3

v_Y_hatse_Y_hat L U

1 8731723778 93443.69 633774 1000067

$\stackrel{¯}{y}=\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}{y}_{i}\approx 53.63889,\text{}v\left(\stackrel{¯}{y}\right)=\frac{{s}^{2}}{n}\approx 37.73358$

$\stackrel{^}{Y}\approx 816920.3,\text{}v\left(\stackrel{^}{Y}\right)={N}^{2}v\left(\stackrel{¯}{y}\right)\approx 8752412343,\text{}se\left(\stackrel{^}{Y}\right)=\sqrt{v\left(\stackrel{^}{Y}\right)}\approx 93554.33$

$L=\stackrel{^}{Y}-t\cdot se\left(\stackrel{^}{Y}\right)\approx 633557.2,\text{}U=\stackrel{^}{Y}+t\cdot se\left(\stackrel{^}{Y}\right)\approx 1000283$

> # 放回

>res_T = compute_Y_srs(y_vector, n, N, alpha, replace = TRUE); res_T

t f s2 y_barv_y_barY_hat

1 1.959964 0.002363756 1358.409 53.63889 37.73358 816920.3

v_Y_hatse_Y_hat L U

1 8752412343 93554.33 633557.2 1000283

R函数3：compute_P_N1_srs()

compute_P_N1_srs = function(n1, n, N, correct, alpha){

t = qnorm(1 - alpha / 2)

p = n1 / n

f = n / N

v_p = (1 - f) / (n - 1) * p * (1 - p)

se_p = sqrt(v_p)

N1_hat = N * p

v_N1_hat = N * (N - n) / (n - 1) * p * (1 - p)

se_N1_hat = sqrt(v_N1_hat)

if (correct == TRUE){

L_P = p - (t * se_p + 1 / (2 * n))

U_P = p + (t * se_p + 1 / (2 * n))

L_N1 = N * L_P

U_N1 = N * U_P

res = data.frame(p, v_p, se_p, N1_hat, v_N1_hat, se_N1_hat, L_P, U_P, L_N1, U_N1)

}

else{

L_P = p - (t * se_p)

U_P = p + (t * se_p)

L_N1 = N * L_P

U_N1 = N * U_P

res = data.frame(p, v_p, se_p, N1_hat, v_N1_hat, se_N1_hat, L_P, U_P, L_N1, U_N1)

}

}

$p=\frac{{n}_{1}}{n}\approx 0.1944444,\text{}v\left(p\right)=\frac{1-f}{n-1}pq\approx 0.00446473,\text{}se\left(p\right)=\sqrt{v\left(p\right)}\approx 0.06681864$

$\stackrel{^}{{N}_{1}}=Np\approx 2961.389,\text{}v\left(\stackrel{^}{{N}_{1}}\right)=\frac{N\left(N-n\right)}{n-1}pq\approx 1035607,\text{}se\left(\stackrel{^}{{N}_{1}}\right)=\sqrt{v\left(\stackrel{^}{{N}_{1}}\right)}\approx 1017.648$

${L}_{P}=p-\left(t\cdot se\left(p\right)+\frac{1}{2n}\right)\approx 0.04959344,\text{}{U}_{P}=p+\left(t\cdot se\left(p\right)+\frac{1}{2n}\right)\approx 0.3392955$

${L}_{{N}_{1}}=N{L}_{P}\approx 755.308,\text{}{U}_{{N}_{1}}=N{U}_{P}\approx 5167.47$

> rm(list=ls(all=TRUE))

> source(subfunctions.R)

>

> # 修正后

> res = compute_P_N1_srs(n1 = 7, n = 36, N = 15230, correct = TRUE, alpha = 0.05); res

p v_pse_p N1_hat v_N1_hat se_N1_hat

1 0.1944444 0.00446473 0.06681864 2961.38910356071017.648

L_P U_P L_N1 U_N1

1 0.04959344 0.3392955 755.308 5167.47

${L}_{P}=p-t\cdot se\left(p\right)\approx 0.06348232,\text{}{U}_{P}=p+t\cdot se\left(p\right)\approx 0.3254066$

${L}_{{N}_{1}}=N{L}_{P}\approx 966.8358,\text{}{U}_{{N}_{1}}=N{U}_{P}\approx 4955.942$

> #修正前

> res = compute_P_N1_srs(n1 = 7, n = 36, N = 15230, correct = FALSE, alpha = 0.05); res

p v_pse_p N1_hat v_N1_hat se_N1_hat

1 0.1944444 0.00446473 0.06681864 2961.389 10356071017.648

L_P U_P L_N1 U_N1

1 0.06348232 0.3254066 966.8358 4955.942

R函数4：compute_n0_n_Y_bar_srs()

compute_n0_n_Y_bar_srs_from_V = function(V, S2, N){

n0 = S2 / V

n = n0 / (1 + n0 / N)

res = data.frame(n0, n)

}

$V={\left(\frac{\Delta }{t}\right)}^{2}={\left(\frac{\gamma \stackrel{¯}{Y}}{t}\right)}^{2}={\left(CV\cdot \stackrel{¯}{Y}\right)}^{2}=S{E}^{2}$

compute_n0_n_Y_bar_srs = function(

Given = c(V, Delta, gamma, CV, SE),

input, alpha, Y_bar, S2, N){

t = qnorm(1 - alpha/2)

V = switch(Given,

V = input,

Delta = (input / t)^2,

gamma = (input * Y_bar / t)^2,

CV = (input * Y_bar)^2,

SE = input^2

)

res = compute_n0_n_Y_bar_srs_from_V(V, S2, N)

}

${n}_{0}=\frac{{t}^{2}{S}^{2}}{{\Delta }^{2}}\approx 208.731,\text{}n=\frac{{n}_{0}}{1+\frac{{n}_{0}}{N}}\approx 205.909$

> ## 给定绝对允许误差 Delta

> Delta = 5

> alpha = 0.05

> t = qnorm(1 - alpha / 2); t

[1] 1.959964

> V = (Delta / t)^2; V

[1] 6.507944

>

>res_Delta = compute_n0_n_Y_bar_srs_from_V(V, S2 = 1358.41, N = 15230); res_Delta

n0 n

1 208.731 205.909

> Delta = 5

> res_Delta_1 = compute_n0_n_Y_bar_srs(Given = Delta, input = Delta, alpha = 0.05, Y_bar = 53.64, S2 = 1358.41, N = 15230); res_Delta_1

n0 n

1 208.731 205.909

R函数5：compute_n0_n_P_srs()

compute_n0_n_P_srs_from_V = function(V, P, N){

n0 = P * (1 - P) / V

n = n0 / (1 + (n0 - 1) / N)

res = data.frame(n0, n)

}

$V={\left(\frac{\Delta }{t}\right)}^{2}={\left(\frac{\gamma P}{t}\right)}^{2}={\left(CV\cdot P\right)}^{2}=S{E}^{2}$

compute_n0_n_P_srs = function(

Given = c(V, Delta, gamma, CV, SE),

input, alpha, P, N){

t = qnorm(1 - alpha/2)

V = switch(Given,

V = input,

Delta = (input / t)^2,

gamma = (input * P / t)^2,

CV = (input * P)^2,

SE = input^2

)

res = compute_n0_n_P_srs_from_V(V, P, N)

}

${n}_{0}=\frac{{t}^{2}q}{{\gamma }^{2}p}\approx 1591.913,\text{}n=\frac{{n}_{0}}{1+\frac{{n}_{0}-1}{N}}\approx 1441.351$

> gamma = 0.1

> p = 0.1944

> alpha = 0.05

> t = qnorm(1 - alpha / 2); t

[1] 1.959964

> V = (gamma * p / t)^2; V

[1] 9.837763e-05

>

>res_gamma = compute_n0_n_P_srs_from_V(V, P = p, N = 15230); res_gamma

n0 n

1 1591.913 1441.351

> gamma = 0.1

> p = 0.1944

> res_gamma_1 = compute_n0_n_P_srs(Given = gamma, input = gamma, alpha = 0.05, P = p, N = 15230); res_gamma_1

n0 n

1 1591.913 1441.351

R函数6：compute_Y_j_srs()

compute_Y_j_srs = function(N, n, n_j, y_bar_j, s_j, alpha){

t = qnorm(1 - alpha / 2)

p_j = n_j / n

q_j = 1 - p_j

Y_hat_j = N / n * n_j * y_bar_j

f = n / N

se_Y_hat_j = N * sqrt((1 - f) / n) * sqrt((n_j - 1) / (n - 1) * s_j^2 + n * p_j * q_j * y_bar_j^2 / (n - 1))

CV_hat_Y_hat_j = se_Y_hat_j / Y_hat_j

L_Y_j = Y_hat_j - t * se_Y_hat_j

U_Y_j = Y_hat_j + t * se_Y_hat_j

res = data.frame(Y_hat_j, se_Y_hat_j, CV_hat_Y_hat_j, L_Y_j, U_Y_j)

}

$se\left({\stackrel{^}{Y}}^{\left(j\right)}\right)=N\sqrt{\frac{1-f}{n}}\sqrt{\frac{{n}_{j}-1}{n-1}{s}_{j}^{2}+\frac{n}{n-1}{p}_{j}{q}_{j}{\left[{\stackrel{¯}{y}}^{\left(j\right)}\right]}^{2}}\approx 1222098$

${\stackrel{^}{Y}}^{\left(j\right)}=\frac{N}{n}{n}_{j}{\stackrel{¯}{y}}^{\left(j\right)}\approx 32409750,\text{}\text{​}\stackrel{^}{CV}\left({\stackrel{^}{Y}}^{\left(j\right)}\right)=\frac{se\left({\stackrel{^}{Y}}^{\left(j\right)}\right)}{{\stackrel{^}{Y}}^{\left(j\right)}}\approx 3.77%$

${L}_{{Y}^{\left(j\right)}}={\stackrel{^}{Y}}^{\left(j\right)}-t\cdot se\left({\stackrel{^}{Y}}^{\left(j\right)}\right)\approx 30014481,\text{}{U}_{{Y}^{\left(j\right)}}={\stackrel{^}{Y}}^{\left(j\right)}+t\cdot se\left({\stackrel{^}{Y}}^{\left(j\right)}\right)\approx 34805019$

> res = compute_Y_j_srs(N = 15800, n = 800, n_j = 375, y_bar_j = 4376, s_j = 755, alpha = 0.05); res

Y_hat_jse_Y_hat_jCV_hat_Y_hat_jL_Y_jU_Y_j

1 32409750 1222098 0.03770774 30014481 34805019

R函数7：compute_Y_bar_j_srs()

compute_Y_bar_j_srs = function(N, n, n_j, y_bar_j, s_j, alpha){

t = qnorm(1 - alpha / 2)

f = n / N

v_y_bar_j = (1 - f) / n_j * s_j^2

se_y_bar_j = sqrt(v_y_bar_j)

L_Y_bar_j = y_bar_j - t * se_y_bar_j

U_Y_bar_j = y_bar_j + t * se_y_bar_j

res = data.frame(t, f, v_y_bar_j, se_y_bar_j, L_Y_bar_j, U_Y_bar_j)

}

$t={Z}_{\alpha /2}\approx 1.959964,\text{}f=\frac{n}{N}\approx 0.05063291$

$v\left({\stackrel{¯}{y}}^{\left(j\right)}\right)=\frac{1-f}{{n}_{j}}{s}_{j}^{2}\approx 1443.101,\text{}se\left({\stackrel{¯}{y}}^{\left(j\right)}\right)=\sqrt{v\left({\stackrel{¯}{y}}^{\left(j\right)}\right)}\approx 37.98817$

${L}_{{\stackrel{¯}{Y}}^{\left(j\right)}}={\stackrel{¯}{y}}^{\left(j\right)}-t\cdot se\left({\stackrel{¯}{y}}^{\left(j\right)}\right)\approx 4301.545,\text{}{U}_{{\stackrel{¯}{Y}}^{\left(j\right)}}={\stackrel{¯}{y}}^{\left(j\right)}+t\cdot se\left({\stackrel{¯}{y}}^{\left(j\right)}\right)\approx 4450.455$

> res = compute_Y_bar_j_srs(N = 15800, n = 800, n_j = 375, y_bar_j = 4376, s_j = 755, alpha = 0.05); res

t f v_y_bar_jse_y_bar_jL_Y_bar_jU_Y_bar_j

1 1.959964 0.050632911443.101 37.988174301.5454450.455

3. 总结

Seven Very Practical R Functions in Simple Random Sampling[J]. 统计学与应用, 2022, 11(01): 53-66. https://doi.org/10.12677/SA.2022.111007

1. 1. 金勇进, 蒋妍, 李序颖. 抽样技术[M]. 北京: 中国人民大学出版社, 2002.

2. 2. [美] G.卡尔顿, 著. 抽样调查导论[M]. 郝虹生, 译. 北京: 中国统计出版社, 2003.

3. 3. 孙山泽. 抽样调查[M]. 北京: 北京大学出版社, 2004.

4. 4. 杜子芳. 抽样技术及其应用[M]. 北京: 清华大学出版社, 2005.

5. 5. 杜智敏. 抽样调查与MATLAB和SPSS应用[M]. 北京: 电子工业出版社, 2010.

6. 6. 李金昌. 应用抽样技术[M]. 第3版. 北京: 科学出版社, 2015.

7. 7. 杨贵军, 尹剑, 孟杰, 王维真. 应用抽样技术[M]. 第2版. 北京: 中国统计出版社, 2020.

8. 8. R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

9. 9. 薛毅, 陈丽萍. 统计建模与R软件[M]. 北京: 清华大学出版社, 2007.

10. NOTES

*共同第一作者。