数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (4): 142-155.

CSTR: 21.86101.2/jfdc.2096-742X.2022.04.014

doi: 10.11871/jfdc.issn.2096-742X.2022.04.014

• 技术与应用 • 上一篇    下一篇

基于近端策略优化算法的车载边缘计算网络频谱资源分配

赵佳楠*(),胡晓辉,杜欣欣   

  1. 兰州交通大学,电子与信息工程学院,甘肃 兰州 730070
  • 收稿日期:2021-11-03 出版日期:2022-08-20 发布日期:2022-08-10
  • 通讯作者: 赵佳楠
  • 作者简介:赵佳楠,兰州交通大学电子与信息工程学院,硕士研究生,主要研究方向为车载自组织网络、边缘计算。
    本文主要承担工作为论文写作,算法设计,仿真实验环境搭建以及验证。
    ZHAO Jianan is a master's student of the School of Electronics and Information Engineering, Lanzhou Jiaotong University. His main research interests include Vehi-cle Ad-hoc Network and edge computing.
    In this paper, he is mainly responsible for thesis writing, algo-rithm design, simulation experiment environment construction, and verification.
    E-mail: 956189148@qq.com
  • 基金资助:
    国家科学自然基金(11461038)

Spectrum Resource Allocation of Vehicle Edge Computing Network Based on Proximal Policy Optimization Algorithm

ZHAO Jianan*(),HU Xiaohui,DU Xinxin   

  1. Department of Electronics & Information Engineering, Lanzhou Jiaotong University, Lanzhou, Gansu 730070, China
  • Received:2021-11-03 Online:2022-08-20 Published:2022-08-10
  • Contact: ZHAO Jianan

摘要:

【目的】在车载网络边缘计算中,合理地分配频谱资源对改善车辆通讯质量具有重要意义。频谱资源稀缺是影响车辆通讯质量的重要原因之一,车辆的高移动性以及在基站处准确收集信道状态信息的困难给频谱资源分配带来了挑战性。【方法】针对以上问题,优化目标设定为车对车(Vehicle-to-Vehicle, V2V)链路传输速率和车对基础设施(Vehicle-to-Infrastructure, V2I)容量大小,提出一种基于近端策略优化(Proximal Policy Optimization, PPO)强化学习算法的多智能体频谱资源动态分配方案。【结果】面对多个V2V链路共享V2I链路所占用的频谱资源从而缓解频谱稀缺问题。这一问题被进一步制定为马尔可夫决策过程(Markov Decision Process, MDP),并对状态、动作和奖励进行了设计,以优化频谱分配策略。【结论】仿真结果表明,在信道传输速率和车辆信息传递成功率方面,所提出的基于PPO算法的优化方案与基线算法相比具有更优的效果。

关键词: 车载网络边缘计算, 频谱分配, 马尔可夫决策过程, 近端策略优化

Abstract:

[Objective] In the edge computing of vehicles, a reasonable allocation of spectrum resources is of great significance to improving the quality of vehicle communication. The scarcity of spectrum resources is a crucial issue that affects the quality of vehicle communication. The high mobility of vehicles and the difficulty of accurately collecting channel state information at the base station are challenging for spectrum resource allocation. [Methods] In view of the above problems, the optimization goal is set to the transmission rate of the vehicle-to-vehicle (V2V) link and the capacity of the vehicle-to-infrastructure (V2I) link. This paper proposed a optimization based on the Proximal Policy Optimization (PPO) reinforcement learning algorithm for multi-agent dynamic allocation of spectrum resources. [Results] Multiple V2V links sharing the spectrum resources occupied by V2I links can alleviate the problem of spectrum scarcity. Thus, this problem is further formulated as a Markov Decision Process, and the state, action, and reward are designed to optimize the spectrum allocation strategy. [Conclusions] The simulation results show that, compared with the baseline algorithm, the optimization scheme based on the PPO algorithm proposed in this paper has better performance in terms of channel transmission rate and vehicle information transmission success rate.

Key words: vehicle edge computing, spectrum allocation, Markov Decision Process, Proximal Policy Optimization