欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (4): 543-550.DOI: 10.7523/j.ucas.2020.0045

• 电子信息与计算机科学 • 上一篇    下一篇

基于深度强化学习的低轨卫星下行功率分配方案

张华明1,2, 李强1   

  1. 1. 中国科学院上海微系统与信息技术研究所, 上海 201800;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2020-06-22 修回日期:2020-09-02 发布日期:2021-05-31
  • 通讯作者: 李强
  • 基金资助:
    国家重点研发计划项目(2019YFB1803101)资助

Downlink power allocation scheme for LEO satellites based on deep reinforcement learning

ZHANG Huaming1,2, LI Qiang1   

  1. 1. Shanghai Institute of Microsystem & Information Technology, Chinese Academy of Sciences, Shanghai 201800, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-06-22 Revised:2020-09-02 Published:2021-05-31

摘要: 当前的卫星资源分配方案大多为同步轨道卫星设计,针对低轨卫星的高动态特性,以及存在频率和功率资源受限的问题,提出一种基于深度强化学习的功率分配算法。首先对低轨卫星功率分配场景进行建模,引入一种时隙划分方案来简化低轨卫星的动态特性模型,进一步提出一种基于深度强化学习算法的功率分配策略,该策略通过调节单颗低轨卫星各个波束中子载波的功率值,降低同频干扰,能达到提升低轨卫星频谱效率的目的。仿真结果表明,所提算法能够在较短时间内收敛并达到稳定状态,在总功率一定的条件下,该方案能有效提升单颗低轨卫星的吞吐量,其频谱效率明显高于注水算法和Q学习算法。

关键词: 低轨卫星, 频谱效率, 功率分配, 深度强化学习

Abstract: Most of the current satellite resource allocation schemes are designed for geosynchronous orbit satellites. In view of the highly dynamic characteristics and limitation of frequency and power resources in LEO satellites, a power allocation algorithm based on deep reinforcement learning is proposed. First of all, we model the LEO satellite power allocation scenario, and introduce a time slot division scheme to simplify the dynamic characteristics model of the LEO satellite. Then a power allocation policy is proposed based on deep reinforcement learning algorithm which can reduce the co-channel interference by adjusting the power value of the subcarriers in each beam of a single LEO satellite, thus improving the spectral efficiency of the LEO satellite. Simulation results illustrate that the proposed algorithm can converge and reach a stable state in a relatively short time. Under the condition of constant total power, this scheme can effectively improve the throughput of a single LEO satellite. The spectral efficiency based on deep reinforcement learning algorithm is significantly higher than that of water-filling algorithm and Q-learning algorithm.

Key words: LEO satellite, spectrum efficiency, power allocation, deep reinforcement learning

中图分类号: