数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (1): 99-107.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.01.007

doi: 10.11871/jfdc.issn.2096-742X.2025.01.007

• 技术与应用 • 上一篇    下一篇

面向高性能计算环境的智能任务编排架构研究

吴璨*(),肖海力,王小宁,卢莎莎,和荣   

  1. 中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2024-06-25 出版日期:2025-02-20 发布日期:2025-02-21
  • 通讯作者: *吴璨(E-mail: wucan@sccas.cn
  • 作者简介:吴璨,中国科学院计算机网络信息中心高性能计算技术与应用发展部,工程师,CCF会员,主要研究方向为高性能计算、分布式系统。
    负责论文初稿撰写与任务编排架构设计。
    WU Can is an engineer at the Department of High Performance Computing Technology & Application Development, Computer Network Information Center, Chinese Academy of Sciences. She is a CCF member. Her research interests include high performance computing, distributed system.
    In this paper, she is responsible for the paper drafting and task orchestrate architecture development.
    E-mail: wucan@sccas.cn
  • 基金资助:
    国家重点研发计划(2023YFB3002302);中国科学院计算机网络信息中心项目“面向国产异构超级计算机的智能任务编排架构研究”

Research on Intelligent Task Orchestration for High Performance Computing Environment

WU Can*(),XIAO Haili,WANG Xiaoning,LU Shasha,HE Rong   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2024-06-25 Online:2025-02-20 Published:2025-02-21

摘要:

【目的】一个大规模科学计算任务往往包括多个计算作业或一个作业组,且多个计算作业之间有执行顺序、有依赖关系,用户需要等待上一个作业完成再提交下一个作业。为了减少用户的等待时间,急需一种新的作业提交方式,允许用户同时提交多个有依赖关系的作业。【方法】提出了面向高性能计算环境的智能任务编排架构,可以自动解析作业之间的依赖关系,智能编排作业提交顺序,监控作业状态,当被依赖作业完成后提交下一个作业。【结果】从实际应用效果来看,智能任务编排服务可以有效简化用户操作。【结论】具备较好的应用效果。

关键词: 高性能计算环境, 作业组, 作业依赖, 智能任务编排

Abstract:

[Objective] A large-scale scientific computing task often includes multiple computing jobs or a job group, and there are execution orders and dependencies between multiple computing jobs. Users need to wait for the previous job to complete before submitting the next one. In order to reduce the user waiting time, there is an urgent need for new ways of submitting jobs that allows users to submit multiple jobs with dependencies at the same time. [Methods] This paper proposes an intelligent task orchestration scheme for high-performance computing environments, which can automatically resolve dependencies between jobs, intelligently orchestrate job submission sequences, monitor job status, and submit the subsequent job after the depending job is completed. [Results] From the perspective of practical application effects, the intelligent task orchestration service can effectively simplify user operations. [Conclusions] The scheme proposed achieves a good application effect.

Key words: high performance computing environment, job group, job dependency, intelligent task orchestration