Semantic-Preserving Pre-Processing Method for C Clone Code

BIAN Yi-xin; ZHAO Song; DU Jun

doi:10.3969/j.issn.1001-0548.2017.06.023

Volume 46 Issue 6

Dec. 2017

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2017 > 46(6): 926-933

BIAN Yi-xin, ZHAO Song, DU Jun. Semantic-Preserving Pre-Processing Method for C Clone Code[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(6): 926-933. doi: 10.3969/j.issn.1001-0548.2017.06.023

Citation:

BIAN Yi-xin, ZHAO Song, DU Jun. Semantic-Preserving Pre-Processing Method for C Clone Code[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(6): 926-933. doi: 10.3969/j.issn.1001-0548.2017.06.023

Semantic-Preserving Pre-Processing Method for C Clone Code

doi: 10.3969/j.issn.1001-0548.2017.06.023

1.
Institute of Software Chinese Academy of Sciences Haidian Beijing 100190
2.
College of Computer Science and Information Engineering, Harbin Normal University Harbin 150025

Received Date: 2016-09-14
Rev Recd Date: 2017-05-18
Publish Date: 2017-11-30

Abstract

The output results of clone code detection tool cannot be directly refactored because of the two reasons:one is the false positives of clone inconsistency related bugs detection and the other is that all the detected clones cannot be suitable for refactoring. Therefore, the output results of clone code detection tool need to be pre-processed for reducing the error checking of cloning inconsistencies defect. A pre-processing approach combing adaptive K-nearest neighbor clustering with program dependence graph is proposed in this paper to solve these problems. First, adaptive K-nearest neighbor clustering and program dependence graph are used to reduce the false positives of clones inconsistency related bugs detection. And then the refactorable clone code is identified to reduce the cost of clone maintenance. The results of the study show that our approach not only effectively prunes the false positives of clone inconsistency related bugs but also eliminates the gap between clone code detection and clone refactoring. Therefore, our method contributes to improving the quality of the software and decreasing the cost of software maintenance.
- adaptive K-nearest neighbor clustering,
- clone code,
- clone inconsistency related bugs detection,
- program dependence graph,
- refactoring

References

[1]	CAI D, KIM M. An empirical study of long-lived code clones[J]. Fundamental Approaches to Software Engineering Lecture Notes in Computer Science, 2011, 6603:432-446. doi: 10.1007/978-3-642-19811-3
[2]	于冬琦, 彭鑫, 赵文耘.使用抽象语法树和静态分析的重复代码自动重构方法[J].小型微型计算机2009, 30(9):1752-1760. http://d.wanfangdata.com.cn/Periodical/xxwxjsjxt200909012 YU Dong-qi, PENG Xin, ZHAO Wen-yun. Automatic refactoring method of cloned code using abstract syntax tree and static analyis[J]. Journal of Chinese Computer Systems, 2009, 30(9):1752-1760. http://d.wanfangdata.com.cn/Periodical/xxwxjsjxt200909012
[3]	苏小红, 马培军, 王倩, 等. C克隆代码缺陷检测工具: 中国, [CPBugdetector] V1. 0[CP/DK]. 2010. SU Xiao-hong, MA Pei-jun, WANG Qian, et al. A defects detection tool for C clones, China, [CPBugdetector] V1.0[CP/DK]. 2010.
[4]	KIM M, BERG L, LAU T, et al. An ethnographic study of copy and paste programming practices in oopl[C]//International Symposium on Empirical Software Engineering. Washington DC, USA:IEEE, 2004:83-92. http://dl.acm.org/citation.cfm?id=1021568
[5]	CODE N, KOS R. Frequency and risks of changes of clones[C]//33rd Internaitonal Conference on Software Engineering. Hawaii, USA:IEEE, 2011:311-320. http://dl.acm.org/citation.cfm?id=1985836
[6]	HIGO Y, KUSU S, INOUE K. Identifying refactoring opportunities for removing code clones with a metrics-based approach[M]. Hong Kong, China:CreateSpace Independent Publising Platform, 2011:1-26.
[7]	CHOU A, YANG J, CHELF B, et al. An empirical study of operating systems error[J]. SIGOPS Operating Systems Review, 2001, 35(1):73-88. doi: 10.1145/371455
[8]	LI Z, LU S, MYAGMAR S, et al. CP-miner:Finding copy-paste and related bugs in large-scale software code[J]. IEEE Transactions on Software Engineering, 2006, 32(3):176-192. doi: 10.1109/TSE.2006.28
[9]	王倩. 基于序列挖掘的C克隆代码及相关软件缺陷的检测[D]. 哈尔滨: 哈尔滨工业大学, 2009. http://cdmd.cnki.com.cn/Article/CDMD-10213-2010030720.htm WANG Qian. Detection of clones and related software defects of C programs via sequential pattern mining[D]. Harbin:Harbin Institute of Technology, 2009. http://cdmd.cnki.com.cn/Article/CDMD-10213-2010030720.htm
[10]	MA Pei-jun, BIAN Yi-xin. A clustering method for pruning false positive of clone code detection[C]//International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC 2013). Shenyang, China:IEEE, 2013:1917-1920. http://ieeexplore.ieee.org/xpl/abstractCitations.jsp?reload=true&tp=&arnumber=6885366
[11]	KAPSER C, GODFREY M W. "Cloning considered harmful" considered harmful[C]//The 13th Working Conference on Reverse Engineering. Washington DC:USA:IEEE, 2006:19-28. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4023973&filter%3DAND(p_IS_Number%3A4023960)
[12]	YANG L, LIU HUI, NIU Z. Identifying fragments to be extracted from long methods[C]//Asia-Pacific Software Engineering Conference. Washington DC, USA, IEEE, 2009:43-49. https://www.researchgate.net/publication/224092861_Identifying_Fragments_to_be_Extracted_from_Long_Methods?_sg=zFE8WVMyY5TUH8euZXXxRz8OHjhyBJaaqPEZwZiEdwB2rYk7WBiHy_C9ld9BFJfxcEncbxiKIAVRlsSdBS4PPQ
[13]	BIAN Yi-xin, SU Xiao-hong, MA Pei-jun. Identifying accurate refactoring opportunities using metrics[C]//International Conference on Soft Computing Techniques and Techniques and Engineering Application. Shenyang, China:IEEE, 2013:141-146. doi: 10.1007/978-81-322-1695-7_17
[14]	ALKHALID A, ALSHAYEB M, MAHMOUD S. Software refactoring at the function level using new adaptive K-nearest neighbor algorithm[J]. Advances in Engineering Software, 2010, 41(10-11):1160-1178. doi: 10.1016/j.advengsoft.2010.08.002
[15]	FERRANTE J, OTTENSTEIN K J, WARREN J D. The program dependence graph and its use in optimization[J]. ACM Transaction on Program Language System, 1987, 9(3):319-349. doi: 10.1145/24039.24041

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views(4077) PDF downloads(73) Cited by()

Proportional views

HTML

克隆代码(clone code)是程序源代码中多个具有相似语法或语义的代码片段，是程序员的拷贝-粘贴行为造成的^[1]，被列为软件中低质量、难以理解、难以维护的代码中最著名的一种“坏味道(bad smell)”。研究显示，克隆代码增加了源代码的长度和软件系统的复杂性，使其更加难以维护，并可能在降低系统运行效率的同时，引入大量缺陷^[2]。通过对代码进行重构可消软件系统中的部分克隆代码，提高程序的质量。但是，克隆代码检测工具的输出结果，一般情况下不能直接用于重构，原因如下：首先，本文使用的检测工具CPBugdetector^[3]在检测克隆代码的同时，还能对拷贝-粘贴的相关缺陷进行检测。但是，工具的输出结果存在一定的误检，对拷贝-粘贴相关缺陷检测结果的准确性，及对后续克隆代码的重构结果都会产生负面影响。其次，并非所有检测出来的克隆代码都适于重构^[4-6]。为解决以上问题，本文提出了一种预处理方法，该方法结合程序依赖图和自适应K-最近邻聚类两种方法，减少克隆不一致性相关缺陷检测的误检，然后，使用基于代价-收益分析的评估方法，在消除缺陷后的克隆代码片段中识别可重构的克隆代码，目的是降低维护克隆代码的代价。

3. 实验结果与分析

本文选取了9个由C语言编写的开源程序作为实验对象。首先采用克隆代码检测工具CPBugdetector^[3]对这9个开源程序进行克隆代码检测及其相关缺陷的检测，克隆代码检测结果如表 3所示。然后，使用本文提出的方法对测试程序进行误检消除，误检消除的实验结果如表 4所示。

测试程序	忘记修改某标识符		错误修改某标识符		本文算法可以处理的误检
	误检消除之前报告的bug	误检消除之后报告的bug	误检消除之前报告的bug	误检消除之后报告的bug	可交换顺序的语句		插入或删除相同结构的语句
	误检消除之前报告的bug	误检消除之后报告的bug	误检消除之前报告的bug	误检消除之后报告的bug	忘记修改某标识符	错误修改某标识符	忘记修改某标识符	错误修改某标识符
linux 2.6.6/kernel	6	6	4	4	0	0	0	0
linux 2.6.6/arch	38	38	214	179	0	6	0	29
linux 2.6.6/net	23	23	121	85	0	14	0	22
linux/sound/drivers	1	1	3	0	0	0	0	0
unix/make 3.82	0	0	3	3	0	0	0	0
httpd 2.2.2/server	1	1	5	5	0	0	0	0
devecot 2.0.8	84	83	155	143	0	1	1	11
iptables 1.4.10	5	5	4	4	0	0	0	0
Nginx 0.8.15	10	10	74	54	0	0	0	20

表 4中记录了误检消除前后，分别针对检测出的“忘记修改某标识符”和“错误修改某标识符”的缺陷，检测工具报告的bug数量。表 4的最后4列数据记录了本文提出的算法能够消除的误检的成因及缺陷类型。

结合表 3和表 4进行分析，被测程序的规模越大，检测出来的bug越多，同时能被消除的误检也越多。最后，采用基于代价-收益分析的方法对消除误检之后的克隆代码识别适于重构的克隆代码，实验结果如表 5所示。误检消除之后增加了适于重构的克隆代码的数量，同时，检测出来的克隆代码约70%是适于提取的。

测试程序	检测出来的克隆代码组n₁	适于重构的克隆代码组(消除误检之前)	适于重构的克隆代码组n₂(消除误检之后)	$\frac{{{n_2}}}{{{n_1}}}/\% $
inux 2.6.6/arch	5 534	4 554	4 537	82.6
Linux 2.6.6/sound/drivers	75	61	61	81.3
Unix/make 3.82	68	57	57	83.8
Httpd 2.2.2/server	121	81	81	66.9

Reference (15)

[1]	CAI D, KIM M. An empirical study of long-lived code clones[J]. Fundamental Approaches to Software Engineering Lecture Notes in Computer Science, 2011, 6603(): 432-446. doi: 10.1007/978-3-642-19811-3
[2]	于冬琦, 彭鑫, 赵文耘. 使用抽象语法树和静态分析的重复代码自动重构方法[J]. 小型微型计算机, 2009, 30(9): 1752-1760.	YU Dong-qi, PENG Xin, ZHAO Wen-yun. Automatic refactoring method of cloned code using abstract syntax tree and static analyis[J]. Journal of Chinese Computer Systems, 2009, 30(9): 1752-1760.
[3]	苏小红, 马培军, 王倩, 等. C克隆代码缺陷检测工具: 中国, [CPBugdetector] V1. 0[CP/DK]. 2010.	SU Xiao-hong, MA Pei-jun, WANG Qian, et al. A defects detection tool for C clones, China, [CPBugdetector] V1.0[CP/DK]. 2010.
[4]	KIM M, BERG L, LAU T, et al. An ethnographic study of copy and paste programming practices in oopl[C]//International Symposium on Empirical Software Engineering. Washington DC, USA:IEEE, 2004:83-92.
[5]	CODE N, KOS R. Frequency and risks of changes of clones[C]//33rd Internaitonal Conference on Software Engineering. Hawaii, USA:IEEE, 2011:311-320.
[6]	HIGO Y, KUSU S, INOUE K. Identifying refactoring opportunities for removing code clones with a metrics-based approach[M]. Hong Kong, China:CreateSpace Independent Publising Platform, 2011:1-26.
[7]	CHOU A, YANG J, CHELF B. An empirical study of operating systems error[J]. SIGOPS Operating Systems Review, 2001, 35(1): 73-88. doi: 10.1145/371455
[8]	LI Z, LU S, MYAGMAR S. CP-miner:Finding copy-paste and related bugs in large-scale software code[J]. IEEE Transactions on Software Engineering, 2006, 32(3): 176-192. doi: 10.1109/TSE.2006.28
[9]	王倩. 基于序列挖掘的C克隆代码及相关软件缺陷的检测[D]. 哈尔滨: 哈尔滨工业大学, 2009.	WANG Qian. Detection of clones and related software defects of C programs via sequential pattern mining[D]. Harbin:Harbin Institute of Technology, 2009.
[10]	MA Pei-jun, BIAN Yi-xin. A clustering method for pruning false positive of clone code detection[C]//International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC 2013). Shenyang, China:IEEE, 2013:1917-1920.
[11]	KAPSER C, GODFREY M W. "Cloning considered harmful" considered harmful[C]//The 13th Working Conference on Reverse Engineering. Washington DC:USA:IEEE, 2006:19-28.
[12]	YANG L, LIU HUI, NIU Z. Identifying fragments to be extracted from long methods[C]//Asia-Pacific Software Engineering Conference. Washington DC, USA, IEEE, 2009:43-49.
[13]	BIAN Yi-xin, SU Xiao-hong, MA Pei-jun. Identifying accurate refactoring opportunities using metrics[C]//International Conference on Soft Computing Techniques and Techniques and Engineering Application. Shenyang, China:IEEE, 2013:141-146.
[14]	ALKHALID A, ALSHAYEB M, MAHMOUD S. Software refactoring at the function level using new adaptive K-nearest neighbor algorithm[J]. Advances in Engineering Software, 2010, 41(10-11): 1160-1178. doi: 10.1016/j.advengsoft.2010.08.002
[15]	FERRANTE J, OTTENSTEIN K J, WARREN J D. The program dependence graph and its use in optimization[J]. ACM Transaction on Program Language System, 1987, 9(3): 319-349. doi: 10.1145/24039.24041

实体	属性										控制属性 if
实体	error	out_putf	lastdirent	buf.previous	buf.error	file-＞f_pos	lastdirent-＞d_off	put_usser	count	buf.count	控制属性 if
12	0	0	2	0	0	0	0	0	0	0	0
13	1	0	0	0	0	0	0	0	0	0	1
14	0	2	0	0	0	0	0	0	0	0	1
15	0	0	2	2	0	0	0	0	0	0	0
16	2	0	0	0	0	0	0	0	0	0	0
17	0	0	0	0	0	0	0	0	0	0	1
18	0	0	0	0	0	2	2	2	0	0	1
19	2	0	0	0	0	0	0	0	2	2	1

测试程序	C文件数量	代码行	克隆代码行	检测出的克隆代码组
linux 2.6.6/kernel	47	30 629	1 887	140
linux 2.6.6/arch	2 363	725 681	133 598	55 34
linux 2.6.6/net	536	333 741	61 585	2 543
linux/sound/drivers	24	12 380	493	75
unix/make 3.82	38	33 864	876	68
httpd 2.2.2/server	44	36 926	2 005	121
devecot 2.0.8	705	233 113	39 544	2 838
iptables 1.4.10	104	32 497	5 905	288
nginx 0.8.15	150	101 226	8 732	557

Semantic-Preserving Pre-Processing Method for C Clone Code

doi: 10.3969/j.issn.1001-0548.2017.06.023

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views