Continuous vs Discrete: Phase Performance Comparison of RIS-Assisted Millimeter Wave Communication Based on Deep Reinforcement Learning

HU Langtao; YANG Rui; LIU Quanjin; WU Jianlan; JI Wen; WU Lei

doi:10.12178/1001-0548.2022285

In this paper, in the distributed Reconfigurable Intelligence Surface (RIS) assisted multi-user millimeter wave (mmWave) system, the deep reinforcement learning (DRL) theory is used to learn and adjust transmit beamforming matrix at the base station and phase shift matrix at the RIS, and jointly optimize the transmit beamforming matrix and phase shift matrix to maximize the weighted sum-rate. Specifically, in the discrete action space, we first design the power codebook and the phase codebook, and propose the Deep Q Network(DQN) algorithm to optimize the beamforming matrix and phase shift matrix; then, in the continuous action space, the Twin Delayed Deep Deterministic (TD3) policy gradient algorithm is used to optimize the beamforming matrix and phase shift matrix. The weighted sum-rates of the system in discrete action space and continuous action space with different number of codebook bits are compare through simulation. In addition, compared with the traditional convex optimization algorithm and the zero-forcing precoding with a random PBF algorithm, the sum-rate performance of DRL algorithm is significantly improved, and the sum-rate of the continuous TD3 algorithm exceeds the convex optimization algorithm by 23.89%, and the performance of the discrete DQN algorithm exceeds the traditional convex optimization algorithm when the number of codebook bits is 4.

HTML

与传统的6 GHz以下的通信相比，具有千兆赫带宽可用性的毫米波（Millimeter Wave, mmWave）通信具有更高的容量和传输速率^[1-2]。但毫米波信号的传输距离较短，且易受到障碍物的影响。因此，引入智能反射面（Reconfigurable Intelligence Surface, RIS）来增强毫米波信号的传输和接收。与有源放大转发中继不同，RIS基本由无源反射元件组成，没有RF射频单元，具有低成本、低功耗、可编程、易部署等特点^[3]。此外，RIS的每个智能超表面单元可以调整其振幅和相位参数，以增强基站（Base Station, BS）的输入信号实时反射给用户，从而经济有效地提高网络性能^[3-6]。

最近，RIS辅助通信的场景已得到了广泛的关注^[7-8]。文献[7]研究了RIS辅助的无人机通信系统的物理层安全。文献[8]将RIS部署到多用户MIMO通信中，并提出了一个基于并行因子分解的信道估计框架，以展开所产生的级联信道模型。RIS处的无源波束赋形可由BS通过RIS控制。因此，为了使RIS的增益最大化，基站和RIS的波束赋形通常是联合设计的^[9-10]。文献[9-10]的波束赋形设计均为连续相位；在文献[11-13]中，波束赋形设计问题被推广到离散相位，其中文献[11-12]研究了RIS处的离散反射波束赋形，文献[13]研究了基站处的离散发射波束赋形。大多数研究假设BS和RIS之间存在丰富的散射^[15-16]，但涉及毫米波传输时，应考虑低阶BS-RIS信道^[9]。文献[17]从mmWave的角度研究了RIS的潜在应用，其中弱BS—用户链路可通过RIS的反射增益进行补偿。

上述研究主要是通过传统凸优化算法来解决RIS的波束赋形问题，而传统凸优化算法求解问题时大多采用交替迭代的方式，求解的结果强烈依赖于初始值，且计算复杂性会因通信的复杂度增加而急速增加，对大规模系统效率较低。受深度强化学习（Deep Reinforcement Learning, DRL）可解决无线通信中具有非凸特性的复杂问题、允许通信实体学习、能够提供自主决策以及对高维数据处理等优点的启发，一些研究者尝试利用DRL来解决无线通信中的一些问题^{[13-14, 18-20]}。文献[13]研究了同构蜂窝网络中干扰信道的信道容量，利用DRL提出了一种分布式动态下行波束赋形协调方法，并根据码本设计了离散化的基站发射波束赋形矩阵。文献[14]研究了基于DRL的多小区非正交多址接入（Non-orthogonal Multiple Access, NOMA）能效优化功率分配问题。文献[18]研究了基于DRL的异构蜂窝网络中用户关联与资源分配。文献[19-20]分别研究了基于DRL的RIS 辅助多用户多输入单输出系统和RIS辅助隐蔽通信系统，并且均利用DRL联合设计基站发射波束赋形与RIS相位偏转矩阵，以提高系统性能。然而，文献[13]虽引入了RIS，但仅研究了基站处的离散发射波束赋形；文献[18]并没有引入RIS这一先进技术；文献[19-20]研究的联合设计均为连续波束赋形。此前的RIS辅助mmWave通信系统中，基于码本的离散波束赋形向量和离散相位的联合设计还未被研究。现阶段，大多数研究还是围绕连续的算法，但使用离散的算法也有其优点，离散算法的复杂度低，且连续相位和离散相位的性能对比也有很重要的意义。

基于上述研究背景，本文研究了在无直视链路的场景下分布式RIS辅助多用户mmWave通信系统，目标是实现最大化加权和速率。本文基于DRL提出两种联合优化方法，一种是基于深度Q网络（Deep Q Network, DQN）算法的离散化发射波束赋形和相位偏转矩阵联合优化方法，另一种是基于双延迟策略梯度（Delayed Deep Deterministic Policy Gradient, TD3）算法的连续发射波束赋形和相位偏转矩阵联合优化方法。本文主要研究工作如下：

1）基于DRL的RIS辅助多用户mmWave通信系统中，采用离散的动作空间，设计了功率码本和相位码本，通过DQN算法设计了发射波束赋形和相位偏转矩阵联合优化算法，实现最大化加权和速率；

2）基于DRL的RIS辅助多用户mmWave通信系统中，采用连续的动作空间，通过TD3算法设计了发射波束赋形和相位偏转矩阵联合优化算法，实现最大化加权和速率；

3）对比分析离散动作空间和连续动作空间的DRL算法的系统和速率、两种算法的复杂度，以及与传统凸优化算法、迫零随机波束赋形算法进行了仿真对比分析。

5. 结束语

为了支持毫米波多用户通信传输，本文引入了分布式部署RIS单元来辅助毫米波通信，并基于DRL技术的最新进展，提出了离散化和连续两种情况下的发射波束赋形和相位偏转的联合设计，实现最大化RIS辅助毫米波通信系统的加权和速率。本文提出的基于DRL的算法具有较强的鲁棒性，因此很容易适应各种通信系统设置。

Reference (26)

[1]	GUAN K, PENG B, HE D P, et al. Channel sounding and ray tracing for intrawagon scenario at mmwave and sub-mmwave bands[J]. IEEE Transactions on Antennas and Propagation, 2021, 69(2): 1007-1019.
[2]	肖振宇, 刘珂, 朱立鹏. 无人机机间毫米波阵列通信技术[J]. 通信学报, 2022, 43(10): 196-209.	XIAO Z Y, LIU K, ZHU L P. Millimeter-Wave array enabled UAV-to-UAV communication technology[J]. Journal on Communications, 2022, 43(10): 196-209.
[3]	HUANG C W, ZAPPONE A, ALEXANDROPOULOS G C, et al. Reconfigurable intelligent surfaces for energy efficiency in wireless communication[J]. IEEE Transactions on Wireless Communications, 2019, 18(8): 4157-4170.
[4]	SHAO X D, YOU C S, MA W Y, et al. Target sensing with intelligent reflecting surface: Architecture and performance[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(7): 2070-2084.
[5]	ZHANG Z J, DAI L L, CHEN X B, et al. Active RIS vs Passive RIS: Which will prevail in 6G?[J]. IEEE Transactions on Communications, 2023, 71(3): 1707-1725.
[6]	HUANG C W, YANG Z H, ALEXANDROPOULOS G C, et al. Multi-hop RIS-empowered terahertz communications: A DRL-based hybrid beamforming design[J]. IEEE Journal on Selected Areas in Communications. 2021, 39(6): 1663-1677.
[7]	胡浪涛, 毕松姣, 刘全金, 等. 基于强化学习的智能超表面辅助无人机通信系统物理层安全算法[J]. 电子与信息学报, 2022, 44(7): 2407-2415.	HU L T, BI S J, LIU Q J, et al. Physical layer security algorithm of reconfigurable intelligent surface-assisted unmanned aerial vehicle communication system based on reinforcement learning[J]. Journal of Electronics & Information Technology, 2022, 44(7): 2407-2415.
[8]	WEI L, HUANG C W, ALEXANDROPOULOS G C, et al. Channel estimation for RIS-empowered multi-user MISO wireless communications[J]. IEEE Transactions on Communications, 2021, 69(6): 4144-4157.
[9]	郭海燕, 杨震, 邹玉龙, 等. 基于主被动波束成形联合优化的双RIS辅助抗干扰通信方法[J]. 通信学报, 2022, 43(7): 21-30.	GUO H Y, YANG Z, ZOU Y L, et al. Double-RIS assisted anti-jamming communication method based on joint active and passive beamforming optimization[J]. Journal on Communications, 2022, 43(7): 21-30.
[10]	WU Q, ZHANG R. Intelligent reflecting surface enhanced wireless network via joint active and passive beamforming[J]. IEEE Transactions on Wireless Communications, 2019, 18(11): 5394-5409.
[11]	XU P, CHEN G J, YANG Z, et al. Reconfigurable intelligent surfaces-assisted communications with discrete phase shifts: how many quantization levels are required to achieve full diversity?[J]. IEEE Wireless Communications Letters, 2021, 10(2): 358-362.
[12]	WU Q Q, ZHANG R. Beamforming optimization for wireless network aided by intelligent reflecting surface with discrete phase shifts[J]. IEEE Transactions on Communications, 2020, 68(3): 1838-1851.
[13]	GE J G, LIANG Y C, JOUNG J, et al. Deep reinforcement learning for distributed dynamic MISO downlink-beamforming coordination[J]. IEEE Transactions on Communications, 2020, 68(10): 6070-6085.
[14]	胡浪涛, 毕松姣, 刘全金, 等. 基于深度强化学习的多小区NOMA能效优化功率分配算法[J]. 电子科技大学学报, 2022, 51(3): 384-391.	HU L T, BI S J, LIU Q J, et al. Multi-Cell NOMA energy efficiency optimization power allocation algorithm based on deep reinforcement learning[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(3): 384-391.
[15]	GUO H Y, LIANG Y C, CHEN J, et al. Weighted sum-rate maximization for reconfigurable intelligent surface aided wireless networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(5): 3064-3076.
[16]	HAN Y, TANG W K, JIN S, et al. Large intelligent surface-assisted wireless communication exploiting statistical CSI[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 8238-8242.
[17]	WANG P L, FANG J, YUAN X J, et al. Intelligent reflecting surface-assisted millimeter wave communications: Joint active and passive precoding design[J]. IEEE Transactions on Vehicular Technology, 2020, 69(12): 14960-14973.
[18]	ZHAO N, LIANG Y C, NIYATO D, et al. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(11): 5141-5152.
[19]	HUANG C W, MO R H, YUEN C. Reconfigurable intelligent surface assisted multiuser miso systems exploiting deep reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(8): 1839-1850.
[20]	YANG H L, XIONG Z H, ZHAO J, et al. Deep reinforcement learning-based intelligent reflecting surface for secure wireless communications[J]. IEEE Transactions on Wireless Communications, 2021, 20(1): 375-388.
[21]	MEI H B, YANG K, LIU Q, et al. 3D-Trajectory and phase-shift design for ris-assisted UAV systems using deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2022, 71(3): 3020-3029.
[22]	CHU Z, HAO W M, XIAO P, et al. Intelligent reflecting surface aided multi-antenna secure transmission[J]. IEEE Wireless Communications Letters, 2020, 9(1): 108-112.
[23]	SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. Cambridge: MIT press, 2018.
[24]	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning. New York: PMLR, 2018: 1587-1596.
[25]	ZHOU W X, CUI Z F, LI B, et al. Beamforming codebook design and performance evaluation for 60GHz wireless communication[C]//2011 11th International Symposium on Communications & Information Technologies (ISCIT). Piscataway: IEEE, 2011: 30-35.
[26]	AKDENIZE M R, LIU Y P, SAMIMI M K, et al. Millimeter wave channel modeling and cellular capacity evaluation[J]. IEEE Journal on Selected Areas in Communications, 2014, 32(6): 1164-1179.

参数	描述	值
$\gamma $	对未来奖励的折扣率	0.6
$\mu $	网络的学习率	0.00005
${\rm{batch} }\_{\rm{size} }$	批处理数据的大小	32
${T_{{\rm{step}}} }$	目标网络延迟同步更新的步数	100
$\mathcal{M}$	经验回放缓冲池的大小	50000
$E$	回合数	1000
$T$	每回合的步数	10000

参数	描述	值
$\gamma $	对未来奖励的折扣率	0.99
$\alpha $	更新训练评价网络的学习率	0.001
$\; \beta$	更新训练策略网络的学习率	0.0001
$\tau $	更新目标策略网络与目标价值网络的学习率	0.001
$\lambda $	训练评价网络和训练策略网络的衰减率	0.00001
${\rm{batch}}\_{\rm{size}}$	批处理数据的大小	16
$\mathcal{M}$	经验回放缓冲池的大小	100000
${T_{{\rm{step}}} }$	策略网络延迟更新的步数	4
$E$	回合数	1000
$T$	每回合的步数	10000

Continuous vs Discrete: Phase Performance Comparison of RIS-Assisted Millimeter Wave Communication Based on Deep Reinforcement Learning

doi: 10.12178/1001-0548.2022285

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views