Statistical and Application
Vol.3 No.04(2014), Article ID:14462,7 pages
DOI:10.12677/SA.2014.34018

Analyzing the Score Data of Five Wine Samples from Two Groups of Experts Based on R Software

He Ming, Yingying Zhang

Department of Statistics and Actuarial Science, College of Mathematics and Statistics, Chongqing University, Chongqing

Email: 373806737@qq.com, robertzhang@cqu.edu.cn

Received: Sep. 7th, 2014; revised: Oct. 6th, 2014; accepted: Oct. 15th, 2014

ABSTRACT

By using R software, we discuss the evaluations of five wine samples by two groups of specialists and the rationality of the evaluations. First of all, by using the hypothesis testing of two normal population means, we judge whether there are significant score differences between two groups of specialists. The test results show consistency of scores of two groups of specialists, and thus the evaluation result has certain fairness and rationality. Secondly, by using multiple t test of the mean, we can investigate the degree of differentiation of different samples by the specialists. Under the significance level of 0.05, the specialists can separate sample 1 from samples 2, 3, and 5, samples 2 and 4, samples 3 and 4. By ordering the levels of five samples from high to low, we find that the specialists can basically distinguish samples with levels with level difference by 1. But specialists do not effectively distinguish samples 1 and 4 (level difference 1.5), samples 3 and 5 (level difference 1). Then we use the hierarchical clustering method to classify five samples to three classes: excellent, good, and bad. Finally, by using the distance discriminant analysis method, the discriminant function is established based on the training sample, then by discrimination of the training sample, we get specialists’ misjudgment rate and accurate rate, and thus we can use the discriminant function to classify the new samples.

Keywords:R Software, Specialists Evaluation, Hypothesis Testing of Two Normal Populations’ Mean and Variance, Multiple t Test of the Mean, Hierarchical Clustering Method and Distance Discriminant Analysis Method

R软件，专家评分，两个正态总体均值及方差的假设检验，均值的多重t检验，系统聚类分析和距离判别分析

1. 引言

1.1. 问题提出

1.2. 问题分析

1) 根据数据比较两组专家评分结果的差异性。

2) 专家评分能否有效地区分不同样品。

3) 根据数据分析专家评分与样品品质间的关系。

2. 实证分析

2.1. 数据分析

2.2. 方差及均值的假设检验[2] [3]

2.3. 多重t检验考察样品区分度[2]

Figure 1. The relationships between sample grade and indexes

Table 1. The evaluation total scores of the first group of experts

Table 2. The evaluation total scores of the second group of experts

Table 3. Hypothesis testing P values of the variance of two normal populations

Table 4. Hypothesis testing P values of the mean of two normal populations

> pairwise.t.test(X, A, p.adjust.method = "holm")

Pairwise comparisons using t tests with pooled SD data:  X and A

1       2     3     4

2 2.8e-05 -     -     -

3 1.7e-05 0.883 -     -

4 0.243   0.019 0.015 -

5 0.019   0.243 0.229 0.481 P value adjustment method: holm

2.4. 系统聚类[2] [6] [7]

Table 5. Each level of sample mean

2.5. 判别分析[2] [3] [6]

> distinguish.distance(TrnX = data[, 1:10], TrnG = factor(data\$A))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | 21 22 23 24 25 26 blong 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  2  1  1 |  2  2  2  4  2  2

27 28 29 30 31 32 33 34 35 36 37 38 39 40 | 41 42 43 44 45 46 47 48 49 blong  3  2  2  2  2  2  2  4  2  4  2  3  2  2 |  3  3  3  4  1  3  3  2  2

50 51 52 53 54 55 56 57 58 59 60 | 61 62 63 64 65 66 67 68 69 70 71 72

Figure 2. Hierarchical clustering according to the score results of the five samples by two groups of experts

blong  3  3  3  3  3  3  3  3  3  4  4 |  4  4  4  4  4  4  4  4  4  5  4  4

73 74 75 76 77 78 79 80 | 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 blong  4  5  4  4  4  3  4  4 |  5  3  1  5  5  5  5  2  4  5  5  2  5  4  3

96 97 98 99 100 blong  3  1  5  5   5

distinguish.distance(TrnX = data3[, 1:10], TrnG = factor(data3\$A))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | 21 22 23 24 25 26 blong 3 3 3 3 3 3 3 3 3  3  2  3  3  3  3  3  3  2  3  3 |  1  1  1  2  2  1

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 blong  1  1  1  1  2  1  1  2  1  1  1  2  1  1  1  1  1  2  3  1  2  1  1

50 51 52 53 54 55 56 57 58 59 60 | 61 62 63 64 65 66 67 68 69 70 71 72 blong  1  1  2  1  2  1  1  1  2  2  2 |  2  2  2  2  2  2  2  2  2  2  2  2

73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 blong  2  2  2  2  2  1  2  2  2  2  3  2  2  2  2  1  2  2  2  2  1  2  2

96 97 98 99 100 blong  2  3  2  2   2

> ## 若分为5类

> distinguish.distance(TrnX = data[, 1:10], TrnG = factor(data\$A), TstX = TstX)

1 2 3 4 5 6 7 8 9 10 blong 4 2 3 4 2 3 4 4 4  3

> ## 若分为3类

> distinguish.distance(TrnX = data3[, 1:10], TrnG = factor(data3\$A), TstX = TstX)

1 2 3 4 5 6 7 8 9 10 blong 2 1 1 2 1 1 2 2 2  2

3. 结论