Frontiers of Data and Computing ›› 2025, Vol. 7 ›› Issue (2): 49-59.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.02.006

doi: 10.11871/jfdc.issn.2096-742X.2025.02.006

• Special Issue: 10th Anniversary of China Science & Technology Cloud • Previous Articles     Next Articles

Research on Storage Optimization and Efficient Pre-Processing Methods for SKA-MWA Astronomical Data

ZHOU Han1(),TANG Jianing1,XUE Mengyao2,WU Kaichao1,*(),ZHANG Bo1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China
  • Received:2025-02-19 Online:2025-04-20 Published:2025-04-23
  • Contact: WU Kaichao E-mail:zhouhan221@mails.ucas.ac.cn;kaichao@cnic.cn

Abstract:

[Context] The Murchison Widefield Array (MWA) is a low-frequency precursor telescope for the Square Kilometre Array (SKA), which is widely used in the study of astronomical phenomena such as pulsars. Due to the large scale of data transmission and storage, coupled with challenges in data processing, the read-write performance is low, thereby affecting the efficiency of data processing. [Object] To enhance the data processing efficiency of the MWA, a pre-processing optimization of storage layout is proposed to alleviate the read-write bottlenecks. [Methods] By analyzing the data characteristics and computational workflows of the MWA, a vertical data layout strategy is introduced. This approach, combining local computation modes with a pipeline architecture, achieves efficient data pre-processing and layout adjustment. [Results] The proposed solution optimizes the data access strategy by incorporating packing and compression techniques that reduce the number of files by a factor of 40 and the data volume by 70%. With the local computation mode, the shared storage I/O load is reduced, significantly enhancing the efficiency of astronomical data analysis. The data pre-processing solution using local computation mode achieves threefold improvement in computational efficiency. [Conclusions] The data layout strategy and pre-processing optimization methods proposed in this study can significantly improve the storage performance of SKA-MWA astronomical data and the computational efficiency of subsequent beamforming. This approach provides high-quality data support for astronomical computations and is universally applicable with broad prospects for future application.

Key words: data preprocessing, distributed parallelism, storage optimization, local storage