Energy Efficiency Modeling of Massive MIMO Baseband Processing with Different Base Station Computing Architectures

DENG Ailin; FENG Gang; LIU Mengjie

doi:10.12178/1001-0548.2021313

Massive multiple-input multiple-output (MIMO) is a key enabling technology for future 5G-Advanced/5G mobile networks to effectively increase spectrum utilization by using large-scale antennas. It is expected that with the evolution to 6G massive MIMO will support more antennas and more complex algorithms, and thus baseband energy efficiency (EE) will be one of the crucial challenges to improve network energy efficiency. In such a system, base station (BS) computing architectures consist of dedicated (ASIC) and general-purpose (CPU) computing architectures. It is very difficult to choose the optimal computing architecture due to the lack of quantitative modeling of the computational requirements and EE of the baseband. Hence, it is necessary to study the power consumption model of different computing architectures related to combined logic units and processing cycles. Based on the proposed power consumption model, the closed forms of EE equations are derived with unit floating point operations per-second per-Watt. Numerical results show that the current EE of dedicated computing is 30 times and 200 times higher than that of the general-purpose computing (with hardware acceleration) and CPU general-purpose computing architecture respectively.

HTML

碳排放增加导致的全球变暖已成为制约人类社会可持续发展的主要障碍，全球已有超过40个国家和经济体正式宣布了碳中和目标^[1]。中国提出2030年前实现碳达峰，2060年前实现碳中和。

对于移动通信行业来说，践行碳中和也已势在必行。移动通信行业在节能减排、绿色低碳发展方面进行了积极探索，一方面是为了彰显了通信行业的社会责任，另一方面是因为网络能耗支出在OPEX的占比不断提升。5G网络能效相比4G提升了10～20倍，但随着网络承载数据量的急剧增加，将导致5G通信设备总能耗增长，基站能耗成为运营商实现碳中和目标的重要挑战。大规模多输入多输出(massive multiple-input multiple-output，massive MIMO)是5G系统的标志性技术，该技术利用大规模天线有效提高空口容量和频谱利用率。随着未来5G Advanced和6G业务需求和性能要求的大幅提升，massive MIMO将向超大规模MIMO和超大孔径阵列(extremely large aperture array, ELAA)演进，需要支持更多的天线数和更复杂的算法，而基带算法复杂度一般随着天线数的平方至立方增长，因此基带计算能耗会成为持续提升网络能效的主要挑战之一^[2-3]。

现有支持massive MIMO技术的底层硬件架构方案主要有两种，即专用计算架构和通用计算架构。专用计算架构以ASIC芯片为主，通过面向5G的定制化芯片持续提升性能和能效，是当前业界的主流方案。通用计算以CPU通用处理器为主，可利用FPGA或GPU进行基带硬件加速，是Open RAN的技术主张之一。当前专用计算架构的能效优于通用计算架构，通用计算在性能和能效上仍然有较大的挑战^[4]，但业界对未来不同基站架构的能效优劣和发展趋势仍存在争议：一种观点认为随着技术发展，未来两者的能效比差距会缩小甚至反转，另一种观点认为两者的能效比差距不变甚至拉大。随着基站计算复杂度的增加，不同计算架构对碳排放的影响将成为未来基站架构选择的关键因素，因此针对不同基站计算架构的能效比差值趋势做量化研究非常有必要。

当前已有文献研究某个5G空口算法在不同计算架构下的性能和能效，如文献[5] 提出了一种基于ASIC的专用FFT算法，证明基于ASIC实现FFT相对通用CPU有超过180倍的能效提升。文献[6] 从功能可重构角度，提出基于FPGA硬件加速实现massive MIMO功能，但是没有给出量化能效分析。文献[7] 利用FPGA的多千兆收发器(multi gigabit transceivers, MGTs)来实现C-RAN集中信号处理，但只评估了FPGA硬件吞吐能力，没有分析如何将硬件能力转化为无线空口性能或能效的收益。综上，现有文献主要分析单点5G空口算法的计算性能，虽然能够定性体现出专用计算在能效上的优势，但由于缺少对基站基带全系统的计算需求和能效的量化建模，从现有文献的研究结果并不能得到不同计算架构的整体系统能效差异和演进趋势。

基于通信基站碳排放最小化的目标，本文通过计算需求建模、计算架构能效建模和量化功耗分析的研究方法，给出了不同基站计算架构的能效对比，并对基站计算能效的发展趋势进行量化研究。

4. 结束语

不同计算架构有各自适用的场景，针对计算需求相对确定的计算密集型应用场景采用专用ASIC芯片提升能效比，已经在深度神经网络、区块链等很多行业的实践中得到证明。

本文通过对massive MIMO基站计算需求建模，同时对不同计算架构的能效建模，给出了定量的分析结果。结果表明当前专用计算相对通用计算(有硬件加速)具有接近30倍的能效优势，而相对纯CPU通用计算架构有200倍以上的能效优势。随着未来基站天线数和小区带宽的增加，通用计算相对专用计算的功耗差值会进一步增加，两者的差距将越来越大，而不是越来越小。即使采用FPGA对通用计算做加速，相对专用计算的差距依然是越来越大。

因此，无论现在还是未来，massive MIMO基站采用通用计算相对专用计算，都不利于节能减排目标。Open RAN主张的基站软硬件解耦、通用计算取代专用计算，也不利于移动通信行业绿色低碳发展。从移动通信行业降低碳排放角度，更应该加大基站专用计算技术的投入。

Reference (18)

[1]	International Energy Agency. Global Energy & CO2, status report: the latest trends in energy and emissions in 2018 [EB/OL]. [2019-03]. https://iea.blob.core.windows.net/assets/23f9eb39-7493-4722-aced-61433cbffe10/Global_Energy_and_CO2_Status_Report_2018.pdf.
[2]	ALSHARIF M, KELECHI A, KIM J, et al. Energy efficiency and coverage trade-Off in 5G for eco-Friendly and sustainable cellular networks[J]. Symmetry, 2019, 11(3): 408.
[3]	PRASAD K N R S V, HOSSAIN E, BHARGAVA V K. Energy efficiency in massive MIMO-based 5G networks: Opportunities and challenges[J]. IEEE Wireless Communications, 2017, 24(3): 86-94.
[4]	DINGES M, HOFER M, LEITNER K, et al. Commission publishes study for the future 5G supply ecosystem in Europe[EB/OL]. [2021-08-09]. https://digital-strategy.ec.europa.eu/en/library/commission-publishes-study-future-5g-supply-ecosystem-europe.
[5]	NSAME P, BOIS G, SAVARIA Y. Analysis and characterization of data energy tradeoffs: For VLSI architectural agility in C-RAN platforms[C]//2015 IEEE International Symposium on Circuits and Systems (ISCAS). Lisbon, Portugal: IEEE, 2015: 1466-1469.
[6]	CHAMOLA V, PATRA S, KUMAR N, et al. FPGA for 5G: Re-configurable hardware for next generation communication[J]. IEEE Wireless Communications, 2020, 27(3): 140-147.
[7]	BENZIN A, CAIRE G. Centralized signal processing zeroforcing capable massive MIMO SDR hardware using multi gigabit transceivers[C]//21st International ITG Workshop on Smart Antennas (WSA 2017). Berlin: VDE Verlag Gmbh, 2017: 1-6.
[8]	CHEN J, DHOLAKIA A, ELEFTHERIOU E, et al. Reduced-complexity decoding of LDPC codes[J]. IEEE Transactions on Communications, 2005, 53(8): 1288-1299.
[9]	3GPP. NR: Multiplexing and channel coding V16.3.0: TS 38.212[S]. Sophia Antipolis, Valbonne: 3GPP, 2020.
[10]	李妍. 基于信号处理系统统计特性的高可靠低功耗电路设计技术研究[D]. 成都: 电子科技大学, 2017.	LI Y. Research on high-reliability low-power circuits design methodology based on probabilistic characteristics for signal processing system[D]. Chengdu: University of Electronic Science and Technology of China, 2017.
[11]	TSE D, VISWANATH P. Fundamentals of wireless communication[M]. Cambridge: Cambridge University Press, 2005: 343-346.
[12]	GENE H C, CHARLES F V L. Matrix Computations 4th Edition[M]. Baltimore: The Johns Hopkins University Press, 2013: 486-496.
[13]	SPENCER Q, SWINDLEHURST A, HAARDT M. Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels[J]. IEEE Transactions on Signal Processing, 2004, 52(2): 461-471.
[14]	XIE Q, LIN X, WANG Y Z, et al. Performance comparisons between 7-nm FinFET and conventional bulk CMOS standard cell libraries[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2015, 62(8): 761-765.
[15]	LEE J, KANG B, JOO S, et al. 6.1 A low-power and lowcost 14nm FinFET RFIC supporting legacy cellular and 5G FR1[C]//2021 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco: IEEE , 2021: 90-92.
[16]	BAILEY J, SHAKIBA H, NIR E, et al. A 112Gb/s PAM-4 low-power 9-Tap sliding-block DFE in a 7nm FinFET wireline receiver[C]//2021 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco: IEEE , 2021: 140-142.
[17]	Intel Corporation. Intel® Xeon® W-11865MLE Processor (24M Cache, up to 4.50 GHz)[EB/OL]. [2021-10-08]. https://www.intel.com/content/www/us/en/products/sku/2 17368/intel-xeon-w11865mle-processor-24m-cache-up-to- 4-50-ghz/specifications.html.
[18]	IEEE. International roadmap for devices and systems more moore(2021 UPDATE)[EB/OL]. [2021-10-08]. https:// irds.ieee.org/images/files/pdf/2018/2018IRDS_MM.pdf.

功能	操作	1 s内操作数	备注
LDPC译码	加法次数	$(2{d}_{\mathrm{v},1}{N}_{1}+2{d}_{\mathrm{c} }{M}+{M}) \times $$ {N}_{\mathrm{i}\mathrm{t}\mathrm{e}\mathrm{r} }{N}_{\mathrm{C}\mathrm{B} }{N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t} }$	$ {N}_{\mathrm{i}\mathrm{t}\mathrm{e}\mathrm{r}} $为LDPC迭代次数；$ {N}_{\mathrm{C}\mathrm{B}} $是1个Slot内 LDPC CB块个数；$ {N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $ 为1秒内Slot个数。
MIMO 均衡	加法次数	$ \dfrac{89}{24}{{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $	$ {N}_{\mathrm{B}\mathrm{S}} $为基站天线数； $ {N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为1秒内Slot个数； $ {N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为1个Slot内的RB 个数；$ {N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $为一个RB上的RE个数。
MIMO 均衡	乘法次数	$ \dfrac{91}{24}{{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $
权值计算	加法次数	$ \left(4\left({n}_{\mathrm{P}\mathrm{M}}+1\right){N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} + $$ 4{n}_{\mathrm{P}\mathrm{M}}{\mathrm{N}}_{\mathrm{S}\mathrm{R}\mathrm{S}}\right){{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $	$ {N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为1个Slot内的 RB个数；$ {N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为 1秒内Slot个数。
权值计算	乘法次数	$ \left(4\left({n}_{\mathrm{P}\mathrm{M}}+1\right){N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} + $$ 4{n}_{\mathrm{P}\mathrm{M}}{\mathrm{N}}_{\mathrm{S}\mathrm{R}\mathrm{S}}\right){{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{R}\mathrm{B}} $
下行加权	加法次数	$ {2{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $	$ {N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为1秒内Slot个数； $ {N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}} $为1个Slot内的RB 个数；$ {N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $为一个RB上的RE个数。
下行加权	乘法次数	$ {2{N}_{\mathrm{B}\mathrm{S}}}^{3}{N}_{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{B}}^{\mathrm{S}\mathrm{l}\mathrm{o}\mathrm{t}}{N}_{\mathrm{R}\mathrm{E}}^{\mathrm{R}\mathrm{B}} $

单小区	操作数/FLOPs		单位功耗	总功耗/W
ASIC专用计算	加法	1.90×10¹⁴	6.5fJ	1.14	合计 13.92
ASIC专用计算	乘法	1.59×10¹⁴	80.36fJ	12.78	合计 13.92
CPU通用计算	3.48×10¹⁴		10.85pJ	3621.99
CPU+FPGA加速通用计算	3.48×10¹⁴		1.17pJ	390.57

Energy Efficiency Modeling of Massive MIMO Baseband Processing with Different Base Station Computing Architectures

doi: 10.12178/1001-0548.2021313

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views