Phenotype-Gene Association Analysis and Prediction Based on Double-Layer Coupled Network

YU Yong; GU Jie; ZHAO Na; LUO Yong-jun; KAN Shi-lin

doi:10.12178/1001-0548.2019133

Volume 49 Issue 3

May 2020

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2020 > 49(3): 438-444

YU Yong, GU Jie, ZHAO Na, LUO Yong-jun, KAN Shi-lin. Phenotype-Gene Association Analysis and Prediction Based on Double-Layer Coupled Network[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(3): 438-444. doi: 10.12178/1001-0548.2019133

Citation:

YU Yong, GU Jie, ZHAO Na, LUO Yong-jun, KAN Shi-lin. Phenotype-Gene Association Analysis and Prediction Based on Double-Layer Coupled Network[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(3): 438-444. doi: 10.12178/1001-0548.2019133

Phenotype-Gene Association Analysis and Prediction Based on Double-Layer Coupled Network

doi: 10.12178/1001-0548.2019133

School of Software, Key Laboratory in Software Engineering of Yunnan Province, Yunnan University 　Kunming　650091

Received Date: 2019-06-03
Rev Recd Date: 2019-11-07

Available Online: 2020-05-28

Publish Date: 2020-05-01

Abstract

With the completion of genome sequencing and the continuous development of gene technology, the pathogenic genes of some diseases are gradually identified. At present, people have grasped the pathogenic causes of some diseases through scientific experiments, but the pathogenic causes of most diseases, especially those related to genes, are still unknown. In this paper, the mouse data with 85% homology similarity to human is used as the research object. The disease phenotype data set, pathogenic gene data set and confirmed phenotype-gene association data set are constructed into a double-layer coupled network. The data are analyzed and mined by meta-path random walk method, and the uncertainties are predicted on the basis of confirmed phenotype-gene association data. The proposed algorithm achieves better prediction results compared with other algorithms.
- correlation,
- disease phenotype,
- double-layer coupled network,
- pathogenic gene

References

[1]	REBBECK T R, FRIEBEL T M, MITRA N, et al. Inheritance of deleterious mutations at both BRCA1 and BRCA2 in an international sample of 32, 295 women[J]. Breast Cancer Research, 2016, 18(1): 112. doi: 10.1186/s13058-016-0768-3
[2]	CHABON J J, SIMMONS A, LOVEJOY A F, et al. Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients[J]. Nature Communications, 2016, 7(1): 11815-11815. doi: 10.1038/ncomms11815
[3]	LIU Y, ZENG X, HE Z, et al. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(4): 905-915. doi: 10.1109/TCBB.2016.2550432
[4]	ZOU S, ZHANG J, ZHANG Z, et al. A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network[J]. PLOS ONE, 2017, DOI: 10.1371/joural.pone.0184394.
[5]	SHEN X, ZHU H, JIANG X, et al. A novel approach based on bi-random walk to predict microbe-disease associations[M]//Intelligent Computing Methodologies.[S.l.]: Springer, 2018: 746-752.
[6]	TIAN Z, GUO M, WANG C, et al. Constructing an integrated gene similarity network for the identification of disease genes[J]. Journal of Biomedical Semantics, 2017, 8(1): 27-41.
[7]	CHEN L, YANG J, XING Z, et al. An integrated method for the identification of novel genes related to oral cancer[J]. PLOS ONE, 2017, DOI: 10.1371/joural.pone.0175185.
[8]	LU C, YANG M, LUO F, et al. Prediction of lncRNA-disease associations based on inductive matrix completion[J]. Bioinformatics, 2018, 34(19): 3357-3364. doi: 10.1093/bioinformatics/bty327
[9]	SHEN Z, JIANG Z, BAO W. CMFHMDA: Collaborative matrix factorization for human microbe-disease association prediction[C]//International Conference on Intelligent Computing. [S.l.]: Springer, 2017: 261-269.
[10]	浦建宇, 陈蕾, 邵楷. 基于Katz增强归纳型矩阵补全的基因-疾病关联关系预测[J]. 计算机科学与探索, 2019(7): 1154-1164. doi: 10.3778/j.issn.1673-9418.1806013 PU Jian-yu, CHEN Lei, SHAO Kai. Exploiting Katz method to boost inductive matrix completion for predicting gene-disease associations[J]. Journal of Frontiers of Computer Science and Technology, 2019(7): 1154-1164. doi: 10.3778/j.issn.1673-9418.1806013
[11]	CHEN X, YAN G. Novel human lncRNA-disease association inference based on lncRNA expression profiles[J]. Bioinformatics, 2013, 29(20): 2617-2624. doi: 10.1093/bioinformatics/btt426
[12]	CHEN X, YAN C C, ZHANG X, et al. HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction[J]. Oncotarget, 2016, 7(40): 65257-65269.
[13]	HUANG Z, CHEN X, ZHU Z, et al. PBHMDA: Path-based human microbe-disease association prediction[J]. Frontiers in Microbiology, 2017, 8(2): 233.
[14]	LI X, LIN Y, GU C, et al. SRMDAP: SimRank and density-based clustering recommender model for mirna-disease association prediction[J]. BioMed Research International, 2018, DOI: 10.1155/2018/5747489.
[15]	CHEN X, YAN C C, ZHANG X A, et al. WBSMDA: Within and between score for mirna-disease association prediction[J]. Scientific Reports, 2016, 6(1): 21106. doi: 10.1038/srep21106
[16]	WANG F, HUANG Z A, CHEN X, et al. LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction[J]. Scientific Reports, 2017, 7(1): 7601. doi: 10.1038/s41598-017-08127-2
[17]	郑经龙. 基于链路预测的长非编码RNA-疾病关联预测方法[D]. 西安: 西安电子科技大学, 2015. ZHENG Jing-long. Method for prediction of LncRNA-disease associations based on link prediction[D]. Xi’an: Xidian University, 2015
[18]	郭茂祖, 王诗鸣, 刘晓燕, 等. MiRNA与疾病关联关系预测算法[J]. 软件学报, 2017, 28(11): 3094-3102. GUO Mao-Zu, WANG Shi-ming, LIU Xiao-yan, et al. Algorithm for predicting the associations between MiRNAs and diseases[J]. Journal of Software, 2017, 28(11): 3094-3102.
[19]	唐明, 崔爱香, 龚凯. 关注耦合网络及其传播动力学研究[J]. 复杂系统与复杂性科学, 2011(2): 87-91. doi: 10.3969/j.issn.1672-3813.2011.02.012 TANG Ming, CUI Ai-xiang, GONG Kai. On spreading dynamics on coupled networks[J]. Complex Systems and Complexity Science, 2011(2): 87-91. doi: 10.3969/j.issn.1672-3813.2011.02.012
[20]	SUN Y, HAN J, YAN X, et al. PathSim: Meta path-based Top-K similarity search in heterogeneous information networks[J]. Very Large Data Bases, 2011, 4(11): 992-1003.
[21]	LAO N, COHEN W W. Relational retrieval using a combination of path-constrained random walks[J]. European Conference on Machine Learning, 2010, 81(1): 53-67. doi: 10.1007/s10994-010-5205-8
[22]	李敏, 王晓桐, 罗慧敏, 等. 随机游走技术在网络生物学中的研究进展[J]. 电子学报, 2018, 46(8): 2035-2048. doi: 10.3969/j.issn.0372-2112.2018.08.033 LI Min, WANG Xiao-tong, LUO Hui-min, et al. Progress on random walk and its application in network biology[J]. Acta Electronica Sinica, 2018, 46(8): 2035-2048. doi: 10.3969/j.issn.0372-2112.2018.08.033
[23]	LI Y, PATRA J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network[J]. Bioinformatics, 2010, 26(9): 1219-1224. doi: 10.1093/bioinformatics/btq108
[24]	METZ C E. Basic principles of ROC analysis[J]. Seminars in Nuclear Medicine, 1978, 8(4): 283-298. doi: 10.1016/S0001-2998(78)80014-2
[25]	KOHLER S, BAUER S, HORN D, et al. Walking the interactome for prioritization of candidate disease genes[J]. American Journal of Human Genetics, 2008, 82(4): 949-958. doi: 10.1016/j.ajhg.2008.02.013
[26]	LI A, GE M, ZHANG Y, et al. Predicting long noncoding RNA and protein interactions using heterogeneous network model[J]. BioMed Research International, 2015, DOI: 10.1155/2015/671950.
[27]	VANUNU O, MAGGER O, RUPPIN E, et al. Associating genes and protein complexes with disease via network propagation[J]. PLOS Computational Biology, 2010, DOI: 10.1371/journal.pcbi.1000641.

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views(6281) PDF downloads(57) Cited by()

Proportional views

HTML

人类第三代测序技术的迅速发展，让生命系统组成元件间的相互作用关系信息得到更加快速的积累。基因数据的不断丰富，表型数据的不断增加，为理解疾病与致病基因之间的关系提供了大量有效的数据。在生物数据大量涌现的前提下，利用相关计算技术和模型对数据进行分析与挖掘，加快了生物学研究前进的步伐，可以深层次挖掘疾病表型与致病基因之间的关系，为了解疾病发病机理、疾病临床诊断和疾病预防与治疗提供了便利。

通过几十年的努力，人类已经发现了一些疾病的致病基因，如BRCA1和BRCA2基因在乳腺癌的发生中发挥重要的作用^[1]，EGFR在肺癌的发生中发挥重要作用^[2]。如果能够知道更多疾病的致病基因，则可以在发病前期进行基因检测预防，在发病过程中进行相应的治疗，后续也可以将发病机理应用到药物设计中，从而有效提高疾病的控制与治愈能力。通过疾病表型和致病基因关系的挖掘，使得疾病发病机理一目了然，在疾病发现过程中能直击疾病发病原因，后续治疗能做到药到病除。

1. 疾病基因预测算法研究现状

目前，挖掘疾病表型与致病基因的关联关系是一个极具挑战的课题。如果能够设计出高精度的致病基因预测方法，对于生物学家、临床医师和遗传学家等相关人员来说具有非常重要的意义。这不但有助于提高发现致病基因的准确率，缩短发现致病基因的周期，节省大量的人力物力，同时也为将来的生物医学和基因治疗诊断等技术的发展奠定重要基础。

随着计算机和生物技术的迅猛发展，大量的生物信息数据的产生，疾病和基因知识的可用性大幅度提高，科研人员也相应提出了一系列疾病与基因预测的计算方法。其中，随机游走是疾病与基因关联关系预测中较为常见的办法，主要包括重启随机游走和双向随机游走等几种类型。文献[3]在双层耦合网络上提出了重启随机游走，用于推断潜在的miRNA与疾病的相关性。文献[4]开发了BiRWHMDA的计算模型，通过在双层耦合网络上的双向随机游走来预测潜在的微生物与疾病关联。文献[5]提出在双层耦合网络上基于多路径的双向随机游走预测微生物与疾病相关性。文献[6]结合表型相似网络、基因相似网络和表型基因关联网络构成表型基因双层耦合网络，并在其上采用重启随机游走算法，推出了一种新的预测疾病致病基因的方法。文献[7]采用了带重启的随机游走算法和最短路径这两种广泛使用的算法，构造了两种参数化计算方法，即基于RWR的方法和基于SP的方法，并在此基础上构建了一种新的疾病基因识别的集成方法。

利用矩阵预测疾病与基因关系也是一个不错的办法。文献[8]提出了一种基于归纳式矩阵补全预测潜在lncRNA与疾病相关性的方法(predict lncRNA-disease associations from known data using IMC, SIMCLDA)。文献[9]开发了一种利用协同矩阵因子分解预测人类微生物疾病相关性的模型(collaborative matrix factorization for human microbe-disease association, CMFHMDA)。文献[10]提出一种基于Katz方法的预估计和基于归纳型矩阵补全方法的精化估计两步骤的Katz增强归纳型矩阵补全的基因−疾病关联预测模型。

把高斯相互作用应用于预测之中，文献[11]应用高斯相互作用轮廓核相似测度确定微生物相似性和疾病相似性。文献[12]建立了用于miRNAs与疾病相关性预测的双层耦合网络推理的计算模型，通过整合miRNAs功能相似性、疾病语义相似性、高斯相互作用来揭示潜在的miRNAs与疾病相关性。

将路径作为预测分数，文献[13]引入PBHMDA(path-based human microbe-disease association)，通过对微生物与疾病之间的所有路径进行评估，得出每个候选微生物与疾病对的预测得分。

研究人员还提出了其他一些疾病与基因关系预测的办法。文献[14]提出了一种基于SimRank和密度聚类推荐模型的miRNA与疾病相关性预测方法(based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction, SRMDAP)。文献[15]基于miRNA与疾病关联预测评分模型(within and between score for MiRNA-disease association prediction, WBSMDA)预测与各种复杂疾病关联的miRNAs。文献[16]采用拉普拉斯正则化最小二乘分类器(Laplacian regularized least squares for human microbe–disease association, LRLSHMDA)建立预测模型。文献[17]将链路预测的思想引入到长非编码RNA−疾病关联预测中。文献[18]提出一种基于密度聚类的二分网络投影算法(bipartite network projection based on density clustering to predict miRNA-disease associations, BNPDCMDA)来预测miRNA−疾病关联。

以随机游走为主导思想的预测方法能够扩大候选基因的范围，可以避免遗漏连接度低和网络边缘的节点，尤其是在多基因疾病的预测中，可以大大提高预测候选致病基因方法的性能；在矩阵预测中，数据的稀疏对预测有很大的影响，PU问题也是需要面对的另一个问题，加入Katz方法也只缓解部分影响；使用高斯相互作用预测将疾病或者基因的相互作用信息作为特征向量，引入高斯核函数，计算疾病或基因间的相似度后在进行疾病和基因之间的相似预测，但是对高斯相互作用相似度参数标准化后，基因或疾病高斯核相互作用相似值就不在依赖于数据集；路径预测利用了生物信息节点之间的拓扑结构，在拓扑结构的基础上预测；其他一些算法都是基于机器学习的一些思想进行关联预测的，然而有监督的机器学习算法，需要假设与疾病相关的基因和不相关的基因是不关联的，但是被证明与疾病相关的基因数量较少，且很少有实验能够证明那些关系是不存在的。

进行多种算法比较研究后，可知基于随机游走的方法相比矩阵预测或聚类的方法存在一定优越性。本文根据疾病表型和疾病基因数据节点属于不同类型节点这一特点，基于疾病表型和疾病基因数据来构成双层耦合网络，提出了在表型−基因的双层耦合网络基础上进行带有元路径的随机游走，从而实现关联关系的预测与分析算法。

6. 结束语

随着基因数据和表型数据的不断增加，为理解疾病与致病基因之间的关系提供了大量有效的数据，也为利用数据分析与挖掘的手段找出疾病表型与致病基因之间的关系提供了便利。为此，旨在设计一种算法来找到表型节点与基因节点的更多关联关系。本文在经典的随机游走方法上加入了元路径的概念，充分利用先验知识及网络中包含的生物关系来预测发现表型与基因的关联关系。从实验结果可以看出，本文算法的正确率高于RWR、LPIHN和PRINCE等算法，能够得到较好的预测效果。

在后续的工作中，有如下几方面可以做进一步研究：1) 整合更可靠的生物网络数据。生物信息知识的缺乏和实验数据的假阳性都会对实验的预测结果造成误差，整合其他有用的生物数据将会提高生物网络数据的可靠性。2) 整合多重生物网络数据。如将序列相似性、功能注释、微阵列表达、蛋白质域、通路成员等数据库整合为一个完整数据进行相应的预测。3) 改变生物网络的拓扑特征。可以适当改变网络的拓扑特征，如介数中心性、紧密中心性、聚类系数等，再进行关联预测。

Reference (27)

[1]	REBBECK T R, FRIEBEL T M, MITRA N, et al. Inheritance of deleterious mutations at both BRCA1 and BRCA2 in an international sample of 32, 295 women[J]. Breast Cancer Research, 2016, 18(1): 112.
[2]	CHABON J J, SIMMONS A, LOVEJOY A F, et al. Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients[J]. Nature Communications, 2016, 7(1): 11815-11815.
[3]	LIU Y, ZENG X, HE Z, et al. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(4): 905-915.
[4]	ZOU S, ZHANG J, ZHANG Z, et al. A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network[J]. PLOS ONE, 2017, DOI: 10.1371/joural.pone.0184394.
[5]	SHEN X, ZHU H, JIANG X, et al. A novel approach based on bi-random walk to predict microbe-disease associations[M]//Intelligent Computing Methodologies.[S.l.]: Springer, 2018: 746-752.
[6]	TIAN Z, GUO M, WANG C, et al. Constructing an integrated gene similarity network for the identification of disease genes[J]. Journal of Biomedical Semantics, 2017, 8(1): 27-41.
[7]	CHEN L, YANG J, XING Z, et al. An integrated method for the identification of novel genes related to oral cancer[J]. PLOS ONE, 2017, DOI: 10.1371/joural.pone.0175185.
[8]	LU C, YANG M, LUO F, et al. Prediction of lncRNA-disease associations based on inductive matrix completion[J]. Bioinformatics, 2018, 34(19): 3357-3364.
[9]	SHEN Z, JIANG Z, BAO W. CMFHMDA: Collaborative matrix factorization for human microbe-disease association prediction[C]//International Conference on Intelligent Computing. [S.l.]: Springer, 2017: 261-269.
[10]	浦建宇, 陈蕾, 邵楷. 基于Katz增强归纳型矩阵补全的基因-疾病关联关系预测[J]. 计算机科学与探索, 2019(7): 1154-1164.	PU Jian-yu, CHEN Lei, SHAO Kai. Exploiting Katz method to boost inductive matrix completion for predicting gene-disease associations[J]. Journal of Frontiers of Computer Science and Technology, 2019(7): 1154-1164.
[11]	CHEN X, YAN G. Novel human lncRNA-disease association inference based on lncRNA expression profiles[J]. Bioinformatics, 2013, 29(20): 2617-2624.
[12]	CHEN X, YAN C C, ZHANG X, et al. HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction[J]. Oncotarget, 2016, 7(40): 65257-65269.
[13]	HUANG Z, CHEN X, ZHU Z, et al. PBHMDA: Path-based human microbe-disease association prediction[J]. Frontiers in Microbiology, 2017, 8(2): 233.
[14]	LI X, LIN Y, GU C, et al. SRMDAP: SimRank and density-based clustering recommender model for mirna-disease association prediction[J]. BioMed Research International, 2018, DOI: 10.1155/2018/5747489.
[15]	CHEN X, YAN C C, ZHANG X A, et al. WBSMDA: Within and between score for mirna-disease association prediction[J]. Scientific Reports, 2016, 6(1): 21106.
[16]	WANG F, HUANG Z A, CHEN X, et al. LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction[J]. Scientific Reports, 2017, 7(1): 7601.
[17]	郑经龙. 基于链路预测的长非编码RNA-疾病关联预测方法[D]. 西安: 西安电子科技大学, 2015.	ZHENG Jing-long. Method for prediction of LncRNA-disease associations based on link prediction[D]. Xi’an: Xidian University, 2015
[18]	郭茂祖, 王诗鸣, 刘晓燕, 等. MiRNA与疾病关联关系预测算法[J]. 软件学报, 2017, 28(11): 3094-3102.	GUO Mao-Zu, WANG Shi-ming, LIU Xiao-yan, et al. Algorithm for predicting the associations between MiRNAs and diseases[J]. Journal of Software, 2017, 28(11): 3094-3102.
[19]	唐明, 崔爱香, 龚凯. 关注耦合网络及其传播动力学研究[J]. 复杂系统与复杂性科学, 2011(2): 87-91.	TANG Ming, CUI Ai-xiang, GONG Kai. On spreading dynamics on coupled networks[J]. Complex Systems and Complexity Science, 2011(2): 87-91.
[20]	SUN Y, HAN J, YAN X, et al. PathSim: Meta path-based Top-K similarity search in heterogeneous information networks[J]. Very Large Data Bases, 2011, 4(11): 992-1003.
[21]	LAO N, COHEN W W. Relational retrieval using a combination of path-constrained random walks[J]. European Conference on Machine Learning, 2010, 81(1): 53-67.
[22]	李敏, 王晓桐, 罗慧敏, 等. 随机游走技术在网络生物学中的研究进展[J]. 电子学报, 2018, 46(8): 2035-2048.	LI Min, WANG Xiao-tong, LUO Hui-min, et al. Progress on random walk and its application in network biology[J]. Acta Electronica Sinica, 2018, 46(8): 2035-2048.
[23]	LI Y, PATRA J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network[J]. Bioinformatics, 2010, 26(9): 1219-1224.
[24]	METZ C E. Basic principles of ROC analysis[J]. Seminars in Nuclear Medicine, 1978, 8(4): 283-298.
[25]	KOHLER S, BAUER S, HORN D, et al. Walking the interactome for prioritization of candidate disease genes[J]. American Journal of Human Genetics, 2008, 82(4): 949-958.
[26]	LI A, GE M, ZHANG Y, et al. Predicting long noncoding RNA and protein interactions using heterogeneous network model[J]. BioMed Research International, 2015, DOI: 10.1155/2015/671950.
[27]	VANUNU O, MAGGER O, RUPPIN E, et al. Associating genes and protein complexes with disease via network propagation[J]. PLOS Computational Biology, 2010, DOI: 10.1371/journal.pcbi.1000641.

序号	元路径
$M{P_1}$	$P \to P \to G$
$M{P_2}$	$P \to G \to G$
$M{P_3}$	$P \to P \to G \to G$
$M{P_4}$	$P \to G \to P \to G$

Phenotype-Gene Association Analysis and Prediction Based on Double-Layer Coupled Network

doi: 10.12178/1001-0548.2019133

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views