高维组学数据包含固有的生物医学信息,这对于个性化医学至关重要。然而,由于大量的分子特征和少量的可用样本,从全基因组数据中捕获它们具有挑战性,这在机器学习中也被称为“维数的诅咒”。为了解决此问题并为机器学习辅助精密医学铺平道路,我们提出了一个统一的多任务深度学习框架,称为OmiEmbed,以从高维组学数据中捕获表型的整体且相对精确的概况。OmiEmbed的深度嵌入模块学习了组学嵌入,该组学将多种组学数据类型映射到具有较低维的潜在空间中。基于多组学数据的新表示形式,OmiEmbed的不同下游网络与多任务策略一起进行了培训,以预测每个样品的综合表型概况。我们在两个公开的组学数据集上训练了该模型,以评估OmiEmbed的性能。OmiEmbed模型在多个下游任务(包括降维,肿瘤类型分类,多组学整合,人口统计学和临床特征重建以及生存预测)上取得了可喜的结果。多任务策略无需单独培训和应用不同的下游网络,而是将它们组合在一起并同时高效地执行多个任务。与单独训练它们相比,该模型通过多任务策略获得了更好的性能。
High-dimensional omics data contains intrinsic biomedical information that is
crucial for personalised medicine. Nevertheless, it is challenging to capture
them from the genome-wide data due to the large number of molecular features
and small number of available samples, which is also called "the curse of
dimensionality" in machine learning. To tackle this problem and pave the way
for machine learning aided precision medicine, we proposed a unified multi-task
deep learning framework called OmiEmbed to capture a holistic and relatively
precise profile of phenotype from high-dimensional omics data. The deep
embedding module of OmiEmbed learnt an omics embedding that mapped multiple
omics data types into a latent space with lower dimensionality. Based on the
new representation of multi-omics data, different downstream networks of
OmiEmbed were trained together with the multi-task strategy to predict the
comprehensive phenotype profile of each sample. We trained the model on two
publicly available omics datasets to evaluate the performance of OmiEmbed. The
OmiEmbed model achieved promising results for multiple downstream tasks
including dimensionality reduction, tumour type classification, multi-omics
integration, demographic and clinical feature reconstruction, and survival
prediction. Instead of training and applying different downstream networks
separately, the multi-task strategy combined them together and conducted
multiple tasks simultaneously and efficiently. The model achieved better
performance with the multi-task strategy comparing to training them
individually. OmiEmbed is a powerful tool to accurately capture comprehensive
phenotypic information from high-dimensional omics data and has a great
potential to facilitate more accurate and personalised clinical decision
making.