Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (5): 102-112.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.05.008

doi: 10.11871/jfdc.issn.2096-742X.2025.05.008

• Special Issue: New Domestic Computing Power Empowers the Development of Scientific Computing Applications • Previous Articles     Next Articles

E+A Galaxy Search Based on a Domestic Heterogeneous Acceleration Platform: Parallelization Strategy and Implementation

ZHENG Aiyu(),MENG Xiangyu,ZHANG Boyu,ZHOU Lichan,YANG Haifeng*()   

  1. Taiyuan University of Science and Technology, Taiyuan, Shanxi 030024, China
  • Received:2025-02-25 Online:2025-10-20 Published:2025-10-23
  • Contact: YANG Haifeng E-mail:zheng_aiyu@tyust.edu.cn;hfyang@tyust.edu.cn

Abstract:

[Objective] E+A galaxies are rare, short-lived post-starburst galaxies whose observational samples hold critical value for understanding galactic evolution and cosmic history. While modern sky surveys have amassed vast astronomical datasets, efficiently detecting these transient objects remains a key challenge in contemporary astrophysical research. [Methods] This study proposes PEAS (Parallel E+A Searcher), a novel pipeline for accelerated E+A galaxy detection implemented on a domestic heterogeneous computing platform. Our methodology involves three phases. First, we analyze dependencies in the serial search algorithm to decompose it into three parallelizable task operators. Second, leveraging the software stack of the target platform, we design a hierarchical distributed architecture for PEAS. Finally, we implement two parallelization schemes: a multi-core task-parallel approach using OpenMP and a multi-node data-parallel strategy using MPI. [Results] Validation on the LAMOST DR2 dataset confirms PEAS’s accuracy. Performance benchmarks conducted on 260,000 galaxies from LAMOST DR10 demonstrate significant speedups. The results indicate that, compared to a single-core CPU, PEAS achieves a speedup of 22.30 on a 32-core system and of up to 107.06 on a single accelerator card. In terms of performance scalability, 4 acceleration cards achieve a speedup 1.89 compared to 1 acceleration card, while 4 nodes achieve 1.83 speedup compared to 1 node. In terms of data scalability, the speedup is 6.93, approaching the data ratio of 8.6.

Key words: domestic heterogeneous acceleration platform, Parallel computing, E+A galaxy, rare astronomical target search