数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (3): 149-161.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.03.012

doi: 10.11871/jfdc.issn.2096-742X.2025.03.012

• 技术与应用 • 上一篇    下一篇

一种基于GPU加速的非小细胞肺癌分型框架

韩鑫胤1,2(),韩子栋3,冀德韬4,李晨1,陆忠华1,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100190
    3.人工智能与数字经济广东省实验室,广东 深圳 518107
    4.北京理工大学珠海校区,广东 珠海 519088
  • 收稿日期:2024-12-05 出版日期:2025-06-20 发布日期:2025-06-25
  • 通讯作者: *陆忠华(E-mail:zhlu@cnic.cn
  • 作者简介:韩鑫胤,中国科学院计算机网络信息中心,博士研究生。主要从事计算基因组学研究,及肿瘤标志物检测算法与软件研发工作。
    本文负责,论文撰写,科研绘图。
    HAN Xinyin is currently a Ph.D. candidate at the Computer Network Information Center, Chinese Academy of Sciences, China. His research interests include computational genomics, and the development of algorithms and software for tumor biomarker detection. In this paper, he is responsible for manuscript writing and scientific illustration.
    E-mail: hanxinyin@cnic.cn|陆忠华,中国科学院计算机网络信息中心,研究员,主要研究方向为高性能计算技术和在计算金融中的应用。
    本文中负责把握文章总体方向与框架。
    LU Zhonghua is currently a professor at the Computer Network Information Center, Chinese Academy of Sciences, China. Her current research interests include high-performance computing technology and its applications in computational finance.
    In this paper, she is responsible for the overall direction and framework of the paper.
    E-mail: zhlu@cnic.cn
  • 基金资助:
    光合基金A类(202407012934)

AGPU-Accelerated Framework for Non-Small Cell Lung Cancer Subtype Identification

HAN Xinyin1,2(),HAN Zidong3,JI Detao4,LI Chen1,LU Zhonghua1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100190, China
    3. Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen,Guangdong 518107, China
    4. Beijing institute of technology, Zhuhai, Guangdong 519088, China
  • Received:2024-12-05 Online:2025-06-20 Published:2025-06-25

摘要:

【目的】本研究基于Morphgene框架,优化其计算性能,以解决非小细胞肺癌(Non-small-cell Lung Cancer,NSCLC)分型过程中大规模病理图像与多组学数据处理效率低下的问题。【方法】通过CPU线程池调度、张量计算与深度学习推理优化技术,对框架的病理图像子块处理、特征提取以及K-means聚类模块进行全面优化。实验采用TCGA数据库的NSCLC样本,验证了优化效果和分型性能。【结果】优化后的框架在大规模数据处理中实现了67.81倍以上的加速比,并保证了分型准确性。优化后的框架成功识别出多个与患者预后相关的亚型,为个性化治疗和生存预测提供了重要支持。【局限】当前优化方案针对特定文件格式和子块尺寸,尚需进一步研究以适应更小文件或更大子块的处理需求。【结论】GPU加速策略显著提升了Morphgene框架的计算效率,为精准医学中的NSCLC亚型分类提供了强有力支持。未来将重点优化其多模态数据融合和广泛适应性,以拓展临床应用场景。

关键词: GPU, 非小细胞肺癌, 多组学数据融合, 病理图像分析, 精准医学

Abstract:

[Objective] This study aims to optimize the computational performance of the Morphgene framework to address inefficiencies in processing large-scale pathology images and multi-omics data for non-small cell lung cancer (NSCLC) subtyping. [Methods] Comprehensive optimizations are applied to the framework’s pathology image patch processing, feature extraction, and K-means clustering modules through CPU thread pool scheduling, PyTorch tensor programming, GPU acceleration, and deep learning inference optimization techniques. Experiments conducted on NSCLC samples from the TCGA database validate the effectiveness of these optimizations and the framework’s subtyping performance. [Results] The optimized framework achieved over a 10-fold speedup in large-scale data processing while maintaining high subtyping accuracy. It successfully identified prognostically relevant subtypes, providing strong support for personalized treatments and survival predictions. [Limitations] The current optimizations are designed for specific file formats and patch sizes, requiring further research to adapt to smaller files or larger patches. [Conclusions] The GPU acceleration strategy significantly improves the computational efficiency of the Morphgene framework, making it a robust tool for NSCLC subtyping in precision medicine. Future work will focus on enhancing its multi-modal data integration and adaptability to broaden its clinical applications.

Key words: GPU, non-small cell lung cancer (NSCLC), multi-omics data integration, digital pathology image analysis, precision oncology