﻿ 一种融合T-Rank和Softmax的特征提取算法研究 The Research of Feature Extraction Algorithm by Integrating T-Rank and Softmax Methods

Modeling and Simulation
Vol.05 No.04(2016), Article ID:18922,8 pages
10.12677/MOS.2016.54017

The Research of Feature Extraction Algorithm by Integrating T-Rank and Softmax Methods

Zhe Liu1, Chunli Peng2, Peng Chen1, Youxi Luo3*

1School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan Hubei

2School of Economic and Management, Hubei University of Technology, Wuhan Hubei

3School of Science, Hubei University of Technology, Wuhan Hubei

Received: Oct. 19th, 2016; accepted: Nov. 8th, 2016; published: Nov. 11th, 2016

ABSTRACT

The paper proposed a new feature extraction algorithm by integrating T-rank and Softmax for the high dimensional biological data sets, which is more effective than traditional method when dealing with high dimensional data. It can not only extract a very few number of features, but also have fast computing speed. By using of this new algorithm, the paper obtains a high accuracy diagnosis model for psoriasis.

Keywords:High Dimensional, Softmax Algorithm, T-Rank Algorithm, Psoriasis, Gene Expression

1湖北工业大学电气与电子工程学院，湖北 武汉

2湖北工业大学经济与管理学院，湖北 武汉

3湖北工业大学理学院，湖北 武汉

1. 引言

2. 模型与算法

2.1. Softmax理论模型

Softmax regression中损失函数的偏导函数如下所示：

Softmax regression中对参数的最优化求解不只一个，每当求得一个优化参数时，如果将这个参数的每一项都减掉同一个数，其得到的损失函数值也是一样的。这说明这个参数不是唯一解，数学公式如下：

2.2. 基于T检验理论模型

T-test检验方法是比较独立样本的一种假设检验方法，此方法的零假设是是两总体的均值相等，备择假设是均值不等，通过T检验可以比较两个总体间的均值是否有着显著区别。

2.3. 融合T-rank的Softmax的特征提取算法

3. 实验材料

3.1. 实验数据及预处理

3.2. 特征选择方法

Figure 1. The flow chart by integrating T-rank and Softmax algorithm

3.3. 选用的分类器

3.4. 实验方式及评价指标

(1) 样本集总体的分类准确率：

(2) 样本集中正常人群体的分类准确率：

(3) 样本集中病人群体的分类准确率：

4. 实验结果分析

4.1. 特征提取结果

4.2. 特征验证与结果分析

Figure 2. Information genes selected by Softmax algorithm and T-rank

Table 1. The Acc, BAcc1 and BAcc2 of 8 selected genes in case 1

Table 2. The Acc, BAcc1 and BAcc2 of 8 selected genes in case 2

Table 3. The Acc, BAcc1 and BAcc2 of 8 selected genes in case 3

Table 4. The Acc, BAcc of 8 selected genes in comprehensive evaluation

5. 结论

The Research of Feature Extraction Algorithm by Integrating T-Rank and Softmax Methods[J]. 建模与仿真, 2016, 05(04): 123-130. http://dx.doi.org/10.12677/MOS.2016.54017

1. 1. 邹晶, 高磊, 李晋, 戴静珠, 李霞. 针对不同特征基因挖掘方法的特征基因功能一致性分析[J]. 中国生物医学工程学报, 2010, 29(2): 212-213.

2. 2. 李霞, 张田文, 郭政. 一种基于递归分类树的集成特征基因选择方法[J]. 计算机学报, 2004, 27(5): 675-682.

3. 3. 李颖新, 阮晓钢. 基于支持向量机的肿瘤分类特征基因基因选取[J]. 计算机研究与发展, 2005, 42(10): 1796- 1801.

4. 4. 吕飒丽, 汪强虎, 李霞, 郭政. 基于决策森林特征基因的两种识别方法[J]. 生物信息学, 2004, 2(3): 19-22.

5. 5. 张飞, 王世祥, 王玲, 宋凯. 肺鳞状细胞癌癌症发展模式识别分类模型及特征基因识[J]. 生物化学与生物物理进展, 2016, 43(1): 63-74.

6. 6. Villasenor-Park, J., Wheeler, D. and Grandinetti, L. (2012) Psoriasis: Evolving Treatment for a Complex Disease. Cleveland Clinic Journal of Medicine, 79, 413-423. http://dx.doi.org/10.3949/ccjm.79a.11133

7. 7. Yao, Y., et al. (2008) Type I Interferon: Potential Therapeutic Target for Psoriasis? PLoS ONE, 3, e2737. http://dx.doi.org/10.1371/journal.pone.0002737

8. 8. Swindell, W.R., et al. (2011) Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis. PLoS ONE, 6, e18266. http://dx.doi.org/10.1371/journal.pone.0018266

9. 9. Nair, R.P., et al. (2009) Genome-Wide Scan Reveals Association of Pso-riasis with IL-23 and NF-KappaB Pathways. Nature Genetics, 41, 199-204. http://dx.doi.org/10.1038/ng.311

10. 10. Barrett, T., et al. (2011) NCBI GEO: Archive for Functional Genomics Data Sets—10 Years on. Nucleic Acids Research, 39, D1005-D1010. http://dx.doi.org/10.1093/nar/gkq1184

11. 11. Irizarry, R.A., et al. (2003) Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics, 4, 249-264. http://dx.doi.org/10.1093/biostatistics/4.2.249

12. 12. Benito, M., et al. (2004) Adjustment of Systematic Microarray Data Biases. Bioinformatics, 20, 105-114. http://dx.doi.org/10.1093/bioinformatics/btg385