﻿ 复杂疾病恶性突变的数据挖掘及预警模型 Data Mining and Early-Warning Model for the Sudden Deterioration of Complex Disease

Vol.07 No.01(2018), Article ID:23549,18 pages
10.12677/AAM.2018.71008

Data Mining and Early-Warning Model for the Sudden Deterioration of Complex Disease

Xin Lv, Rui Liu

School of Mathematics, South China University of Technology, Guangzhou Guangdong

Received: Dec. 19th, 2017; accepted: Jan. 18th, 2018; published: Jan. 25th, 2018

ABSTRACT

Data mining and early-warning signals of prostate cancer and liver cancer are by dynamic network biomarker method based on multi-samples or single-samples. It’s vital to detect the critical point and signals of sudden deterioration, so as to diagnose the disease more accurately and put forward appropriate therapeutic plan in time. With time-course high-throughout biomolecular data, dynamic network biomarkers method based on multi-samples detected that the critical points of prostate cancer samples and liver cancer samples are the 6th time point and 2rd time point respectively, which agrees with the experiment data. In addition, 264,139 dynamical network biomakers including transcription factors were found. In fact, actual data are insufficient and the size of samples is small, and then dynamic network biomarkers method based on single-samples can be used to detect the early-warning of sudden deterioration. Also, the critical points of prostate cancer samples and liver cancer samples are the 6th time point and 2rd time point respectively based on single-samples. Finally, it shows that the found dynamic network biomakers based on multi-samples or single-samples could reflect the early-warning of sudden deterioration better after genes function analysis.

Keywords:High-Throughout Biomolecular Data, Network Analysis and Computation, Dynamic Network Biomarkers, Single-Samples Analysis

1. 引言

2. 方法

2.1. 基于多样本动态网络生物标志物法

1) 动态网络生物标志物中元素的标准差均值( $SD$ )大幅增加；

2) 动态网络生物标志物中元素间的皮尔逊相关系数的绝对值均值( $PC{C}_{in}$ )大幅增加；

3) 动态网络生物标志物中元素与非动态网络生物标志物中元素间的皮尔逊相关系数的绝对值均值 $PC{C}_{out}$ 减少。

$CI=\frac{SD×PC{C}_{in}}{PC{C}_{out}+\epsilon }$ ( $\epsilon$ 是一个充分小的正数)

2.2. 基于单样本动态网络生物标志物法

1) 动态网络生物标志物中元素的表达偏差的绝对值均值( $\Delta ED$ )大幅增加；

2) 动态网络生物标志物中元素间的皮尔逊相关系数差的绝对值均值( $\Delta PC{C}_{in}$ )大幅增加；

3) 动态网络生物标志物中元素与非动态网络生物标志物中元素间的皮尔逊相关系数差的绝对值均值( $\Delta PC{C}_{out}$ )大幅增加。

( ε 是一个充分小的正数)

3. 主要结果

3.1. 基于多样本动态网络生物标志物法有关结果

Figure 1. Line chart: early-warning signals based on multiple and single samples

3.2. 基于单样本动态网络生物标志物法有关结果

Figure 2. Dynamically changes in the network including the DNB and overturn network during the progression of prostate cancer

$\Delta CI$ 分别取均值确定临界点。根据前列腺癌数据，利用基于单样本动态网络生物标志物法检测4个样本都在第24小时发生突变(图1(b))，及分别有262，327，168，180个生物标记物，其中包括上游转录因子，它们两两间交集分别有103，38，37，37，84，45个基因，并对这183基因做生存分析等功能分析。根据肝癌数据，利用基于单样本动态网络生物标志物法得到5个样本都在第3天发生突变(图1(d))，及分别有190，138，186，131，170个生物标记物，它们两两间交集分别有6，9，14，19，15，9，5，28，11个基因，并对这83个基因做生存分析等功能分析，数据处理过程请详见附录4。

3.3. 功能分析

3.4. 生存分析

Table 1. Enrichment analysis by KEGG

Figure 3. Survival analysis based on multi-samples of prostate cancer

Figure 4. Survival analysis based on multi-samples of prostate cancer

Figure 5. Survival analysis based on single-samples of prostate cancer

Figure 6. survival analysis based on single-samples of liver cancer

4 讨论

Data Mining and Early-Warning Model for the Sudden Deterioration of Complex Disease[J]. 应用数学进展, 2018, 07(01): 56-73. http://dx.doi.org/10.12677/AAM.2018.71008

1. 1. Chen, L., Liu, R., Liu, Z.P., et al. (2012) Detecting Early-Warning Signals for Sudden Deterioration of Complex Diseases by Dynamical Network Biomarkers. Scientific Reports, 2, 342.
https://doi.org/10.1038/srep00342

2. 2. Strogatz, S.H. (2000) Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Perseus Books Publishing, New York.

3. 3. Liu, X., Xiao, C., Rui, L., et al. (2017) Quantifying Critical States of Complex Diseases Using Single-Samples Dynamic Network Biomarkers. PLoS Computational Biology, 13, e1005633.
https://doi.org/10.1371/journal.pcbi.1005633

4. 4. Louro, R., Nakaya, H.I., Amaral, P.P., et al. (2007) Androgen Responsive Intronic Non-Coding RNAs. BMC Biology, 5, 4.
https://doi.org/10.1186/1741-7007-5-4

5. 5. Lempiäinen, H., Couttet, P., Bolognani, F., et al. (2013) Identification of Dlk1-Dio3 Imprinted Gene Cluster Noncoding RNAs as Novel Candidate Biomarkers for Liver Tumor Promotion. Toxicological Sciences: An Official Journal of the Society of Toxicology, 131, 375.
https://doi.org/10.1093/toxsci/kfs303

6. 6. Nesbit, C.E., Tersak, J.M. and Prochownik, E.V. (1999) MYC Oncogenes and Human Neoplastic Disease. Oncogene, 18, 3004-3016.
https://doi.org/10.1038/sj.onc.1202746

7. 7. Jenkins, R.B., Qian, J., Lieber, M.M., et al. (1997) Detection of C-MYC Oncogene Amplification and Chromosomal Anomalies in Metastatic Prostatic Carcinoma by Fluorescence in Situ Hybridization. Cancer Research, 57, 524-531.

8. 8. Sun, C., Dobi, A., Mohamed, A., et al. (2008) TMPRSS2-ERG Fusion, a Common Genomic Alteration in Prostate Cancer Activates C-MYC and Abrogates Prostate Epithelial Differentiation. Oncogene, 27, 5348-5353.
https://doi.org/10.1038/onc.2008.183

9. 9. Chi, S.G., White, R.W.D., Meyers, F.J., et al. (1994) p53 in Prostate Cancer: Frequent Expressed Transition Mutations. Journal of the National Cancer Institute, 86, 926-933.
https://doi.org/10.1093/jnci/86.12.926

10. 10. Battaglia, S., Maguire, O., Thorne, J.L., et al. (2010) Elevated NCOR1 Disrupts PPARα/γ Signaling in Prostate Cancer and Forms a Targetable Epigenetic Lesion. Carcinogenesis, 31, 1650.
https://doi.org/10.1093/carcin/bgq086

11. 11. Lai, K.K., Shang, S., Lohia, N., et al. (2011) Extracellular Matrix Dynamics in Hepatocarcinogenesis: A Comparative Proteomics Study of PDGFC Transgenic and Pten Null Mouse Models. PLoS Genetics, 7, e1002147.
https://doi.org/10.1371/journal.pgen.1002147

12. 12. Clegg, N., Ferguson, C., True, L.D., et al. (2003) Molecular Characterization of Prostatic Small-Cell Neuroendocrine Carcinoma. Prostate, 55, 55-64.
https://doi.org/10.1002/pros.10217

13. 13. Tesori, V., Piscaglia, A.C., Samengo, D., et al. (2015) The Multikinase Inhibitor Sorafenib Enhances Glycolysis and Synergizes with Glycolysis Blockade for Cancer Cell Killing. Scientific Reports, 5, 9149.
https://doi.org/10.1038/srep09149

14. 14. Judson, R.S., Houck, K.A., Kavlock, R.J., et al. (2010) In Vitro Screening of Environmental Chemicals for Targeted Testing Prioritization: The ToxCast Project. Environmental Health Perspectives, 118, 485.
https://doi.org/10.1289/ehp.0901392

15. 15. Camarero, N., Mascaró, C., Mayordomo, C., et al. (2006) Ketogenic HMGCS2 Is a C-MYC Target Gene Expressed in Differentiated Cells of Human Colonic Epithelium and Down-Regulated in Colon Cancer. Molecular Cancer Research, 4, 645.
https://doi.org/10.1158/1541-7786.MCR-05-0267

16. 16. Eferl, R., Ricci, R., Kenner, L., et al. (2003) Liver Tumor Development. C-JUN Antagonizes the Proapoptotic Activity of p53. Cell, 112, 181-192.
https://doi.org/10.1016/S0092-8674(03)00042-4

1. 基于多样本动态网络生物标志物法

Step 1：标准化

$N=\frac{{D}_{\text{normal}}-\text{mean}\left({D}_{\text{normal}}\right)}{SD\left({D}_{\text{normal}}\right)}$ (1)

$D=\frac{{D}_{disease}-mean\left({D}_{normal}\right)}{SD\left({D}_{normal}\right)}$ (2)

${D}_{\text{normal}}$ 表示正常样本数据； ${D}_{\text{disease}}$ 表示疾病样本数据； $N,D$ 分别表示正常样本数据和疾病样本数据标准化后的数据； $\text{mean}\left({D}_{\text{normal}}\right)$$SD\left({D}_{\text{normal}}\right)$ 分别表示正常样本数据的均值和标准差。

Step 2：T-检验

Step 3：计算伪发现率FDR

Step 4：差异表达Fold-change

Step 5：聚类

Step 6：显著性分析

$CI=SD\frac{PC{C}_{in}}{PC{C}_{out}+\epsilon }$ (3)

2. 基于单样本动态网络生物标志物法

Step 1：计算表达偏差 $\Delta ED$ 筛选基因

$\Delta ED=|g-\overline{g}|$ (4)

Step 2：聚类

Step 3：计算 $\Delta PCC$

$\Delta PC{C}_{in}\left({g}_{1},{g}_{2}\right)=|PC{C}_{n+1}\left({g}_{1},{g}_{2}\right)-PC{C}_{n}\left({g}_{1},{g}_{2}\right)|$ (5)

$Z\left({g}_{1},{g}_{2}\right)=\frac{\Delta PCC\left({g}_{1},{g}_{2}\right)}{\left(1-PC{C}_{n}^{2}\left({g}_{1},{g}_{2}\right)\right)/\left(n-1\right)}$ (6)

Step 4：计算指标值 $\Delta CI$

(7)

3. 基于多样本动态网络生物标志物方法的应用

3.1. 前列腺癌

Step 1：数据预处理

Table A1. Descriptions of the two datasets

Step 2：筛选显著性基因

Step 3：聚类

Step 4：计算综合指标值 $CI$

3.2. 肝癌

Step 1：数据预处理

Step 2：筛选显著性基因

Step 3：聚类

Step 4：计算指标值 $CI$

4. 基于单样本动态网络生物标志物方法的应用

4.1. 前列腺癌

4.2. 肝癌

Figure A1. Survival analysis based on multi-samples of prostate cancer

Figure A2. Survival analysis based on multi-samples of liver cancer

Figure A3. Survival analysis based on single-samples of prostate cancer

Figure A4. Survival analysis based on single-samples of liver cancer