
Citation: | ZHANG Jiewei, WANG Jing, YANG Hongzhi, LI Tong, MAO Zuquan, LIU Xiangyang. Construction of locally piggybacking codes with lower repair bandwidth overhead[J]. Journal of University of Electronic Science and Technology of China. DOI: 10.12178/1001-0548.2024150 |
The current research on Piggybacking codes mainly focuses on the fast repair of single systematic nodes, the repair bandwidth rates of the failed parity nodes remain higher, and fast repair algorithm has not been proposed for multi-node fault repair. For this reason, this paper proposes the construction scheme of locally Piggybacking codes. Specifically, on the basis of Piggybacking framework, two local parity sub-strips are added, the data blocks placed in a staggered manner, the systematic node grouped in sequential piggybacking, to reduce the repair bandwidth overhead and repair degree of the failed nodes. Performance analyses show that, the locally Piggybacking codes constructed in this paper have reduced the repair bandwidth rate of both the systematic nodes and parity nodes prominently, and compared to the existing Piggybacking codes, the performance of the repair degree also has a significant improvement.
随着云计算的发展,数据海量化已经成为一种趋势[1],分布式存储系统因其在数据存储方面的可靠性和高效性,近年来已经被各大互联网公司广泛使用[2]。然而,随着系统存储规模的增大,存储节点故障导致的数据失效越来越常见。在此背景下如何提高分布式存储系统的可靠性,采用性能更好的容错技术成为目前的研究热点。大部分容错技术都是通过增加冗余数据来提高系统的可靠性,产生冗余的方式最常见的有“复制”和“纠删码”策略[3-5]。复制即存储数据的多个副本,最常见的为三副本复制,存储开销较大。相比于复制,纠删码策略有效降低了存储开销,但由于其(n, k)性质,发生节点故障时,需下载原文件大小的数据量,修复带宽开销过大。为了降低故障节点的修复带宽开销,寻找数据节点存储开销和修复带宽开销的最优权衡,2010年文献[6]使用信息流图将网络编码引入到分布式存储系统中,提出了节点存储开销与修复带宽开销的最优权衡曲线,进而提出最小存储再生(minimum storage regenerating, MSR)码和最小带宽再生(minimum bandwidth regenerating, MBR)码。相比纠删码,再生码能够有效降低故障节点的修复带宽,但对于MSR码来说,高码率意味着指数级的子条带数,计算复杂度较高。
针对上述问题,构造能对故障节点进行精确修复、高码率、低复杂度、低修复带宽开销的存储编码成为了研究热点。2013年,文献[7]首次提出了Piggybacking码,并于2017对其进行了进一步完善,其编码满足RBT (repair-by-transfer)属性[8-9],即修复故障节点期间下载的数据量与读取的数据量相同,同时能实现数据的精确修复。文献[10]提出一种可以高效修复信息节点的Piggybacking码,即REPB (repair efficient, piggybacking)码,信息节点的平均修复带宽率最低可达到
2019年,文献[12-13]提出一种“one-to-one”的Piggybacking码(one-to-one pigggybacking, OOP),在进一步降低信息节点修复带宽的同时,考虑了奇偶校验节点的有效修复,但OOP只使用一个子条带对校验节点进行辅助修复,所以校验节点的修复性能依旧不佳。文献[1]构造的RSR-I*、RSR-II*和文献[10]构造的REPB*通过对子条带进行整合,将前一个子条带中校验节点的数据块捎带到后一个子条带的校验节点中,虽然提高了校验节点的修复性能,但导致子条带数目成倍增长,影响到系统的I/O性能。2020年,文献[14-15]通过牺牲容错能力,将信息节点和校验节点进行分层修复,提出双层Piggybacking (double-layered piggybacking, DPB)编码,但其对故障节点的修复度没有改善。2022年,文献[16]将子条带数与校验节点数设为相等值,以此为基础,利用OOP未充分利用到的校验节点进行捎带,进一步降低了修复带宽开销。
针对目前Piggybacking码存在的局限性,本文在Piggybacking框架中加入2个局部校验子条带,构造了一种具有较低修复带宽开销的局部Piggybacking码。所构造的局部Piggybacking码保留了MDS码的优点,加入局部特性,使得故障节点的修复带宽开销更小,同时构造Piggyback函数捎带对应的信息节点数据块,在保证其MDS容错性的同时提出了一种多节点故障的快速修复算法。与现有的Piggybacking码相比,本文构造的局部Piggybacking码在降低信息节点修复带宽开销和修复度的同时降低了校验节点的修复带宽开销,且可以实现双信息节点的快速修复。
Piggybacking框架是一种在现有码的基础上,以现有码为基础码,将部分子条带的数据块经过Piggyback函数形成Piggyback块,之后将Piggyback块嵌入,以设计具有高码率、低子条带数、低修复带宽开销的Piggybacking码。基于存储效率、计算复杂度等多方面考虑,现有的Piggybacking码通常选取MDS码为基础码,同时为了保证其MDS属性,在捎带时只选取子条带之前的数据块形成Piggyback函数。图1为采用MDS码为基础码的Piggybacking框架中校验节点结构,其中,
当出现故障节点时,通过MDS码译码以及其中嵌入的Piggyback块进行修复即可。这一修复方式通过将部分MDS译码替换为Piggyback块译码减少了修复带宽开销。具体的衡量指标包括信息节点平均修复带宽率
γsys=k∑i=1Bikkα,γpar=k+r∑i=k+1Birkα, |
ℑsys=k∑i=1ηikk,ℑpar=k+r∑i=k+1ηirk |
RSR-II码以MDS码为基本码[9],其中信息节点的个数k可以取任意值,校验节点的个数
图3为局部Piggybacking码的一般结构图,包括区域A、B、C共3个区域,其中区域A为信息数据块区域,区域B为全局校验块区域,区域C为局部检验块区域。将区域A中的数据块通过Piggyback函数形成Piggyback块,嵌入到区域B,区域C由区域A和区域B中的数据块通过分组生成局部校验块。设
1) 将信息节点分为t+1个不相交的集合
|Ax|=r−1x=1,2,⋯,t |
|Ax|=y,x=t+1k=t(r−1)+y |
2) 定义长度为k的向量
qj,h=[0...0⏟j−110...0⏟r−210...0⏟r−210......]Tgj,h=[0...0⏟r−210...0⏟r−210...0⏟r−210......]T |
3) 定义Piggyback函数
4) 最后将除第1个以外校验节点的前r−2个子条带和第r+1到第2r−2个子条带分别作为1个局部组,生成局部校验块,局部组内部整体进行错位放置。
图3为具有较低修复带宽开销的局部Piggybacking码的一般性结构图。以(15, 10)MDS码作为基本码,子条带数目
以下是该局部Piggybacking码的故障节点修复过程:1)单信息节点故障:若节点1发生故障,除第
图5为具有较低修复带宽开销的局部Piggybacking码的特殊结构图,即当k可以整除r−1时,此时令
与局部Piggybacking码的一般结构相比,此构造,第1步对信息节点进行分组,第2步生成局部校验块,其余构造步骤与一般结构相同,如图5所示。在此结构中,每个集合的数据块和局部校验块在该集合进行错位循环放置。当节点故障时,仅需要在故障节点所在集合进行修复,无需连接其他集合的节点,可以在一定程度上减少信息节点的修复度及多个信息节点故障的修复复杂度。
以(13, 8)MDS码作为基本码,子条带数目
局部Piggybacking码在单个信息节点故障时,统一利用局部校验块进行修复,在单个校验节点故障时,利用MDS性质和局部校验块进行混合修复。信息节点i
定理1 局部Piggybacking码在单信息节点故障时的平均修复带宽率为:
γsys=2k(r−2)(r−1)+⌈k2⌉2+⌊k2⌋2k2(2r−3) | (1) |
证明:设第i个信息节点故障需要下载的数据块为
γsys=k∑i=1Bik2(2r−3)=[2(r−2)(r−1)+⌈k2⌉]⌈k2⌉+[2(r−2)(r−1)+⌊k2⌋]⌊k2⌋k2(2r−3)=2k(r−2)(r−1)+⌈k2⌉2+⌊k2⌋2k2(2r−3) |
假设校验节点故障,修复算法如下:1)第1个校验节点故障,通过MDS性质修复,即下载所有原始信息数据块修复;2)其余校验节点故障,利用校验节点的局部校验块和MDS属性进行混合修复。
定理2 局部Piggybacking码在单校验节点故障时的平均修复带宽率为:
γpar=2k(r−1)+[2(r−1)2+k+m](r−2)kr(2r−3) | (2) |
式中,
证明:设第i个校验节点,即第k+i个节点故障需要下载的数据块为
γpar=k+r∑i=k+1Bikr(2r−3)=k(2r−3)+2(r−1)2(r−2)+k+(k+m)(r−2)kr(2r−3)=2k(r−1)+[2(r−1)2+k+m](r−2)kr(2r−3) |
考虑到存储系统中2节点故障占比为1.87%,3个或更多节点故障占比仅为0.05%[17],所以本小节主要分析2节点故障的快速修复。对于多节点故障,当故障节点数大于等于3时,使用MDS码译码修复。当故障节点数为2时,分为以下3种情况:1) 2个故障节点都为校验节点时,使用MDS码译码;2) 2个故障节点为1个信息节点和1个校验节点时,修复方式同单节点修复方式;3) 2个故障节点都为信息节点时,若2个故障节点不存在下标相同的数据块,采用局部校验块修复;若2个故障节点在1个局部组中存在j个下标相同的数据块,
定理3 设
则2信息节点故障的最大修复带宽率:
γsysmax=Q1S1+Q2(Z−S1)Zk(2r−3) | (3) |
证明:设两信息节点故障,共有Z(
1) 2信息节点无下标相同数据块,共有
Q1=4(r−1)(r−2)+k |
2) 2信息节点中含有2j个数据块下标相同,
令
Q2=(j+1)k+2(r−1−j)(r−2−j)+j(t+1)+j(r−3)+2(r−1−j)(r−2) |
所以:
γsysmax=Q1S1+Q2(Z−S1)Zk(2r−3) |
对于单信息节点故障,考虑第r个子条带的修复方式与其他子条带的修复方式不同,在
定理4 局部Piggybacking码信息节点的修复度率为:
ℑsys={2(k+r−2)(r−1)+k(k2−2r+2)k2k为偶数且k2≥2(r−2)+12[(3r−5+k2)(k2−r+2)+(2r−k2−4)(2r−3)]k2k为偶数且k2<2(r−2)+12(r−1)(k+r−2)+k+12(k+12−2r+2)+k−12(k−12−2r+2)k2k为奇数且⌊k2⌋≥2(r−2)+1(k+6r−9)(k−2r+5)+(k+6r−11)(k−2r+3)+(4r−6)(8r−2k−16)4k2k为奇数且⌈k2⌉≤2(r−2)+1 | (4) |
证明:当k为偶数且
ℑsys=(k2+k2+r−2)(k2+r−2−k2+1)2k2+ |
2[k2−2(k2+r−2−k2+1)]k2k2=2(k+r−2)(r−1)+k(k2−2r+2)k2 |
当k为偶数且
ℑsys=2[2(r−2)+1+k2+r−2][(k2+r−2)−[2(r−2)+1]+1]k2+2[k2−2[k2+r−2−[2(r−2)+1]+1]][2(r−2)+1]k2=2[(3r−5+k2)(k2−r+2)+(2r−k2−4)(2r−3)]k2 |
定理5 局部Piggybacking码校验节点的修复度率为:
ℑpar = k+(r−1)(k+r−2)kr | (5) |
证明:第1个校验节点故障,需连接前k个信息节点,下载所有信息数据块。其余节点,除第r个条带以外的数据块,访问除第1个校验节点以及故障节点以外的
假设文件大小为M,将M分为
针对信息节点故障时的平均修复带宽率
图7给出了节点的平均修复带宽率,可以看出,本文构造的局部Piggybacking码同时显著降低了信息节点和校验节点的修复带宽率,这是因为本文构造引入了局部修复组的思想。由图7a可以看出,本文构造的局部Piggybacking码在节点取值个数较小时,修复带宽率就相对较低。这是因为在信息节点修复过程中仅使用局部校验块进行修复。由图7b可知,本文构造的校验节点修复带宽率整体随节点个数的增加而降低。这是因为校验节点修复时虽然利用了局部的思想,但第r个子条带中的数据块的修复依旧取决于MDS译码和Piggyback函数。
图8给出了本结构在双信息节点故障时的修复带宽率。现有的Piggybacking算法并未提出2信息节点故障时的快速修复算法,默认其2节点故障修复时采用MDS修复,即修复带宽率为1。本结构下的双信息节点采用顺序修复,在
表1给出了现有的几种Piggybacking码和本文构造的局部Piggybacking码关于子条带数和修复度率的对比,其中局部Piggybacking码信息节点修复度率由定理4给出。从表1可以看出,现有的Piggybacking码信息节点的平均修复度率与k成反比,与r成正比。
Piggybacking编码 | 子条带数α | 修复度率 | |
信息节点ℑsys | 校验节点ℑpar | ||
RSR-II | (2r−3)τ | k+r−1k | k+(r−1)(k+r−1)rk |
OOP | √r−1+r−1 | k+r−1k | k+(r−1)(k+r−1)rk |
SPB | r | (k+r)k−L∑l=1lφlk2 | 1 |
局部Piggybacking码 | 2r−1 | ℑsys | k+(r−1)(k+r−2)kr |
图9给出了信息节点和校验节点平均修复度率的仿真结果。与SPB、RSR-II以及OOP相比,本文所构造的局部Piggybacking码信息节点的平均修复度率显著降低,且最高降低了61.7%。SPB码的校验节点修复带宽率恒为1,而本文所构造的局部Piggybacking码因校验节点的第r个子条带并未生成可以进行局部修复的校验块,所以每个校验节点故障修复时都需要连接所有的信息节点,以至于修复度率大于SPB码的修复度率,但与RSR-II码和OOP相比,其修复度率更低。
分布式存储系统中Piggybacking编码作为一种实用方法,显著降低了故障节点的修复带宽开销。但现有Piggybacking码大部分聚焦在信息节点的快速修复,而校验节点及多节点故障的修复依旧存在修复带宽开销较大的问题。为此,本文引入局部修复的思想,增加2个局部校验条带,同时通过对信息节点进行分组,采用局部组内按序匹配的方式进行捎带,构造的局部Piggybacking码对信息节点的修复度率和修复带宽率都具有显著改善,且同时降低了校验节点的修复带宽率。
[1] |
SIDDIQA A, KARIM A, GANI A. Big data storage technologies: A survey[J]. Frontiers of Information Technology and Electronic Engineering, 2017, 18(8): 1040-1070. DOI: 10.1631/FITEE.1500441
|
[2] |
YAVARI E, ESMAEILI M. Locally repairable codes: Joint sequential-parallel repair for multiple node failures[J]. IEEE Transactions on Information Theory, 2019, 66(1): 222-232.
|
[3] |
LI J, LI B. Demand-aware erasure coding for distributed storage systems[J]. IEEE Transactions on Cloud Computing, 2021, 9(2): 532-545. DOI: 10.1109/TCC.2018.2885306
|
[4] |
LEE O T, KUMAR S D M, CHANDRAN P. Erasure coded storage systems for cloud storage—challenges and opportunities[C]// IEEE International Conference on Data Science and Engineering (ICDSE). Cochin: IEEE, 2016: 1-7.
|
[5] |
PAPAILIOPOULOS D S, DIMAKIS A G, CADAMBE V R. Repair optimal erasure codes through hadamard designs[J]. IEEE Transactions on Information Theory, 2013, 59(5): 3021-3037. DOI: 10.1109/TIT.2013.2241819
|
[6] |
DIMAKIS A G, GODFREY P B, WU Y, et al. Network coding for distributed storage systems[J]. IEEE Transactions on Information Theory, 2010, 56(9): 4539-4551. DOI: 10.1109/TIT.2010.2054295
|
[7] |
RASHMI K V, SHAH N B, RAMCHANDRAN K. A piggybacking design framework for read-and download-efficient distributed storage codes [C]//Proceedings of IEEE International Symposium on Information Theory Istanbul: IEEE, 2013: 331-335.
|
[8] |
SHAH N B, RASHMI K V, KUMAR P V, et al. Distributed storage codes with repair-by-transfer and non-achievability of interior points on the storage-bandwidth tradeoff[J]. IEEE Transactions on Information Theory, 2011, 58(3): 1837-1852.
|
[9] |
RASHMI K, SHAH N B, RAMCHANDRAN K. A piggybacking design framework for readand download-efficient distributed storage codes[J]. IEEE Transactions on Information Theory, 2017, 63(9): 5802-5820.
|
[10] |
YUAN S, HUANG Q, WANG Z. A repair-efficient coding for distributed storage systems under piggybacking framework[J]. IEEE Transactions on Communications, 2018, 66(8): 3245-3254. DOI: 10.1109/TCOMM.2018.2805347
|
[11] |
SHANG G C, GE G. A new piggybacking design for systematic MDS storage codes[J]. Designs, Codes and Cryptography, 2019, 87(12): 2753-2770. DOI: 10.1007/s10623-019-00650-9
|
[12] |
LI G Y, LIN X, TANG X H. An efficient one-to-one piggybacking design for distributed storage systems[J]. IEEE Transactions on Communications, 2019, 67(12): 8193-8205. DOI: 10.1109/TCOMM.2019.2941933
|
[13] |
周悦, 李贵洋, 韩鸿宇等. 一种均衡分配的修复校验节点的Piggybacks捎带设计[J]. 电子学报, 2021, 49(4): 812-816. DOI: 10.12263/DZXB.20200056
ZHOU Y, LI G Y, HAN H Y, et al. A piggybacks design for equally-assigned repair check nodes[J]. Acta Electronica Sinica, 2021, 49(4): 812-816. DOI: 10.12263/DZXB.20200056
|
[14] |
SUN R, LI X, ZHANG L, et al. Distributed storage codes based on double-layered piggybacking framework[J]. IEEE Access, 2020, 8: 150447-150464. DOI: 10.1109/ACCESS.2020.3002824
|
[15] |
SUN R, ZHANG L, LIU J. A new piggybacking design with low-repair bandwidth and complexity[J]. IEEE Communications Letters, 2021, 25(7): 2099-2103. DOI: 10.1109/LCOMM.2021.3071855
|
[16] |
SHI H, HOU H X, HAN Y S, et al. New piggybacking codes with lower repair bandwidth for any single-node failure[C]//The 2022 IEEE International Symposium on Information Theory (ISIT). Espoo: IEEE, 2022: 2601-2606.
|
[17] |
RASHMI K V, SHAH N B, GU D, et al. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the facebook warehouse cluster[C]// In Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems. San Jose: USENIX Association, 2013: 1.
|
[1] | WANG Jing, LI Jinghui, YANG Jiarong, WANG E. Construction of Optimal Locally Repairable Codes Based on RBIBD[J]. Journal of University of Electronic Science and Technology of China, 2023, 52(3): 366-371. DOI: 10.12178/1001-0548.2022158 |
[2] | WANG Jing, TIAN Songtao, LEI Ke, WANG Xianglong, REN Yaqian. Construction of Optimal Locally Repairable Codes Based on Hadamard Matrix[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(6): 856-861. DOI: 10.12178/1001-0548.2022037 |
[3] | WANG Jing, LEI Ke, LI Jiayi, TIAN Songtao, WANG Xianglong. Construction of Group Repairable Codes Based on Non-Uniform Cyclic Coding[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(1): 57-64. DOI: 10.12178/1001-0548.2021013 |
[4] | JIANG Xiao-ping, ZHANG Xian-di. The Wide-Diameter of Circulant Graph of Degree 4[J]. Journal of University of Electronic Science and Technology of China, 2006, 35(4): 560-563. |
[5] | YANG Ting, LUO Guang-chun. A Distributed Intrusion Detection System Repairing Mechanism[J]. Journal of University of Electronic Science and Technology of China, 2005, 34(5): 634-637. |
[6] | SUN Jing, LIU Xiao-ming. A Software Reliability Model with Exponential Repair Time[J]. Journal of University of Electronic Science and Technology of China, 2005, 34(1): 53-56. |
[7] | Tang Yinghui, Liu Yan. Reliability Analysis of a Two Different Units Cold Standby Repairable System with Delay Repair[J]. Journal of University of Electronic Science and Technology of China, 2004, 33(3): 331-333. |
[8] | Liu Hongyan, Li Qi, Yang Dongwei. A Method of Repairing Figure Strokes Based On End-point Direction[J]. Journal of University of Electronic Science and Technology of China, 2001, 30(5): 516-519. |
[9] | Mao Yong, Li Cailiang, Tang yinghui. Reliability Analysis of One Unit Repairable System with Delay Repair[J]. Journal of University of Electronic Science and Technology of China, 2000, 29(5): 545-548. |
[10] | He Mingxing. A Degree of Generalized S-A-proper Mappings and Applications[J]. Journal of University of Electronic Science and Technology of China, 1998, 27(1): 105-108. |
Piggybacking编码 | 子条带数α | 修复度率 | |
信息节点ℑsys | 校验节点ℑpar | ||
RSR-II | (2r−3)τ | k+r−1k | k+(r−1)(k+r−1)rk |
OOP | √r−1+r−1 | k+r−1k | k+(r−1)(k+r−1)rk |
SPB | r | (k+r)k−L∑l=1lφlk2 | 1 |
局部Piggybacking码 | 2r−1 | ℑsys | k+(r−1)(k+r−2)kr |