论文标题
朝着与人类衰老相关基因的数据综合监督预测的未来方向
Towards future directions in data-integrative supervised prediction of human aging-related genes
论文作者
论文摘要
由于许多年龄疾病的发生率,对衰老过程中涉及的人类基因的识别至关重要。用于此目的的最先进的方法通过将不同年龄的基因表达(GE)水平映射到蛋白质 - 蛋白质相互作用网络(PPIN)中,从而渗透了加权动态衰老特异性子网。然后,它通过训练一个预测模型来了解已知衰老与非年龄相关基因的网络拓扑如何在各个年龄段变化。最后,它使用训练有素的模型来预测新型与衰老相关的基因。但是,通过这种方法产生的最佳当前子网仍然可以得出次优的预测准确性。这可能是因为它是使用过时的GE和PPIN数据推断出来的。在这里,我们评估了从新的GE和PPIN数据中推断出的加权动态衰老特异性子网,是否可以通过分析从过时的数据中推断出最佳的当前子网络来提高预测准确性。出乎意料的是,我们发现情况并非如此。为了理解这一点,我们执行与衰老相关的途径和基因本体论(GO)术语富集分析。我们发现,无论使用哪种GE或PPIN数据,次优的预测准确性都可能是由于当前有关哪些基因与衰老相关的知识而引起的,或者是由于当前推断或分析衰老特异性子网络无法捕获所有与老化相关的知识的方法。这些发现可以潜在地指导未来的方向,以通过 - 组数据集成来改善与衰老相关基因的监督预测。
Identification of human genes involved in the aging process is critical due to the incidence of many diseases with age. A state-of-the-art approach for this purpose infers a weighted dynamic aging-specific subnetwork by mapping gene expression (GE) levels at different ages onto the protein-protein interaction network (PPIN). Then, it analyzes this subnetwork in a supervised manner by training a predictive model to learn how network topologies of known aging- vs. non-aging-related genes change across ages. Finally, it uses the trained model to predict novel aging-related genes. However, the best current subnetwork resulting from this approach still yields suboptimal prediction accuracy. This could be because it was inferred using outdated GE and PPIN data. Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data. Unexpectedly, we find that not to be the case. To understand this, we perform aging-related pathway and Gene Ontology (GO) term enrichment analyses. We find that the suboptimal prediction accuracy, regardless of which GE or PPIN data is used, may be caused by the current knowledge about which genes are aging-related being incomplete, or by the current methods for inferring or analyzing an aging-specific subnetwork being unable to capture all of the aging-related knowledge. These findings can potentially guide future directions towards improving supervised prediction of aging-related genes via -omics data integration.
