Frontiers of Data and Domputing ›› 2022, Vol. 4 ›› Issue (3): 66-77.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.03.005

doi: 10.11871/jfdc.issn.2096-742X.2022.03.005

• Special Issue: Advanced Intelligent Computing Platform and Application • Previous Articles     Next Articles

Optimizing Monte Carlo Tree Search for General Game Playing

LIANG Sili1(),JIANG Guifei1,*(),CHEN Taijie2(),DENG Yichao3(),ZHAN Yufan4(),ZHANG Yuzhi1()   

  1. 1. College of Software, Nankai University, Tianjin 300350, China
    2. Department of Computer Science, University of Hong Kong, Hong Kong 999077, China
    3. School of Computing, National University of Singapore, Singapore 117417, Singapore
    4. School of Finance, Nankai University, Tianjin 300381, China
  • Received:2022-02-07 Online:2022-06-20 Published:2022-06-20
  • Contact: JIANG Guifei E-mail:2120210551@mail.nankai.edu.cn;g.jiang@nankai.edu.cn;ctj21@connect.hku.hk;e0792453@u.nus.edu;2120202478@mail.nankai.edu.cn;zyz@nankaiedu.cn

Abstract:

[Background] General Game Playing (GGP) is concerned with creating intelligent agents that understand the rules of previously unknown games and learn to play these games well without human intervention. [Objective] Unlike specialized systems, a general game player cannot rely on algorithms designed in advance for specific games. Such a system rather requires a form of general intelligence that enables it to autonomously generate strategies based on the given game rules. With the decade of development, GGP provides an important testbed for AI, especially artificial general intelligence. The main problem of GGP is how to build an efficient general game player. Strategy generation is the core technique for building a general game player. [Scope of the literature] The main algorithms used in previous successful general game players are the Monte Carlo Tree Search (MCTS) algorithm and its variants. [Methods] To improve MCTS during the online real-time search, this paper incorporates it with a memory structure, where each entry contains information about a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. [Results] Based on this method, we implement and evaluate a general game player. The experimental results show that it can outperform the original Monte Carlo player in a variety of games. Especially, in two-person zero-sum and turn-based games with symmetric information, the built general game player achieves a winning rate of more than 55%, and its performance improves significantly with the increase of the game size, even 100% winning rate in large-scale games such as Connect 5 and Breakthrough. [Conclusions] These results have confirmed the feasibility of the proposed method to use game-dependent information for improving the performance of MCTS.

Key words: general game playing, Monte Carlo tree search, algorithmic game theory, multi-agent systems