Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (2): 145-154.

doi: 10.11871/jfdc.issn.2096-742X.2020.02.012

Special Issue: “数据分析技术与应用”专刊

• Technology and Applicaton • Previous Articles     Next Articles

A Data Prediction Method Based on Feature Selection and Transfer Learning

Chen Tongbao1,2,Wen Liangming1,2,Li Jianhui1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-01-29 Online:2020-04-20 Published:2020-06-03
  • Contact: Jianhui Li


[Objective] The Sustainable Development Goals (SDGs) have become the most important sustainable development issue in the world. However, the high rate of missing data related to SDGs indicators has affected the UN’s effective monitoring of implementation of sustainable development goals in various countries. Completion of the missing data in SDGs is technically challenging, and is of great significance in urging countries to achieve sustainable development goals. [Methods] This paper proposes a transfer learning method named TLM, which incorporates with MIC (maximal information coefficient) for feature selection. It can construct features for the target data from other public data and build a prediction model with related regression technology to predict the missing values of the target data. [Results] This article takes the data set of SDGs indicator 3.2.1 in a specific country as an example and uses TLM to predict the missing values of target data. The effectiveness of TLM is verified. [Limitations] Due to the many factors that can affect SDGs indicators, exploring more correlation analysis methods which can be combined with TLM to make more accurate predictions of missing values is the focus of our future research. [Conclusions] The TLM method which combines with MIC and transfer learning can improve the accuracy of data prediction. Besides, it can provide effective reference value predictions for researchers in the related fields of SDGs when dealing with data missing problems.

Key words: sustainable development goals, transfer learning, regression, data missing, data completion methods