Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (4): 105-120.

doi: 10.11871/jfdc.issn.2096-742X.2020.04.009

Special Issue: 下一代互联网络技术与应用

• Technology and Applicaton • Previous Articles     Next Articles

Integration and Optimization of Material Data Mining and Machine Learning Tools

Dong Jiayuan1,2(),Yang Xiaoyu1,2,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-03-25 Online:2020-08-20 Published:2020-09-10
  • Contact: Yang Xiaoyu E-mail:dongjiayuan@cnic.cn;kxy@cnic.cn

Abstract:

[Objective] Aiming at handling the current situation that there are high barriers impeding materials science researchers to take advantages of machine learning algorithms, this article focuses on developing a user-friendly and highly automated machine learning system for material data mining named Auto-Mat. [Methods] We have integrated some existing methods and machine learning algorithms in MatMiner and scikit-learn, and defined a data dictionary to read data from different material calculation databases. At the same time, we have developed some algorithms for feature selection and processing. [Results] It can provide the system with a visual interaction and display interface for data mining and machine learning modules under a unified data format. With the optimized algorithms, the performances of models are improved. [Limitations] For data acquisition, currently only the data in the MatMiner API can be obtained, and the writing of related code is also fully synchronized with the MatMiner API. So the scalability is poor. Moreover, at present, the execution speed of some core algorithms needs to be improved. [Conclusions] Through this system, users can read data from several mainstream databases such as Materials Project in one shot and quickly build their own material data mining workflow. In the end, a comparative analysis of two cases shows that our platform has a positive effect on reducing the barriers for users to use machine learning methods on material data mining.

Key words: materials science, data mining, visual interactive interface, data summary, feature extraction, simulated annealing algorithm, MatCloud