K-Means算法最优聚类数量的确定

何选森; 何帆; 徐丽; 樊跃平

doi:10.12178/1001-0548.2021393

K-Means算法最优聚类数量的确定

doi: 10.12178/1001-0548.2021393

何选森^{1, 2, ,},
何帆³,
徐丽¹,
樊跃平¹

1.
广州商学院信息技术与工程学院　广州　511363
2.
湖南大学信息科学与工程学院　长沙　410082
3.
北京理工大学管理与经济学院　北京海淀区　100081

基金项目: 广东省普通高校重点领域专项(新一代信息技术) (2021ZDZX1035)

详细信息

作者简介:
何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

通讯作者: 何选森，E-mail：xshe2010@163.com

中图分类号: TP39

Determination of the Optimal Number of Clusters in K-Means Algorithm

HE Xuansen^{1, 2
, ,},
HE Fan³,
XU Li¹,
FAN Yueping¹

1.
School of Information Technology and Engineering, Guangzhou College of Commerce　Guangzhou　511363
2.
College of Information Science and Engineering, Hunan University　Changsha　410082
3.
School of Management and Economics, Beijing Institute of Technology　Haidian Beijing　100081

摘要: K-均值(K-means)聚类算法是学术与工业领域的经典算法。然而，它却具有两个明显缺陷：1) 需要预先知道聚类的数量；2) 对算法的随机初始化非常敏感。为了解决这两个问题，首先归纳了K-均值算法的基本步骤，并对聚类有效性进行了分析；然后以数据样本点的欧几里德距离为基础，定义了以聚类数量k为自变量的类间质心距离之和以及类内距离之和，由此构造了聚类有效性评价函数；最后根据经验规则，在聚类数量的可能范围内通过求解聚类有效性评价函数的最小值以确定数据集的最优聚类数量。对UCI的3个数据集Iris、Seeds和Wine的仿真结果说明，提出的聚类有效性评价函数不仅能够准确地反映数据的真实聚类结构，还能有效地抑制算法对随机初始化的敏感性，通过对K-均值算法的多次运行，其结果也验证了聚类有效性评价函数的鲁棒性。
- 聚类有效性评价函数 /
- K-均值聚类 /
- 最优聚类数量 /
- 类间质心距离之和 /
- 类内距离之和
Abstract: K-means clustering algorithm is a classic algorithm in academic and industrial fields. However, it has two most obvious defeats: one is that the number of clusters needs to be known in advance; the other is that it is very sensitive to the random initialization of the algorithm. In order to solve these problems, this paper summarizes the basic step of K-means algorithm and analyzes the clustering validity. Then, based on the Euclidean distance of the data points, the sum of centroid distances between classes and the sum of distances within cluster with the number of clusters k as the independent variable are defined, and the cluster validity evaluation function is constructed. Finally, according to the empirical rules, the optimal number of clusters in the data set is determined by solving the minimum value of the cluster validity evaluation function within the possible range of number of clusters. The simulation results of the three UCI datasets Iris, Seeds, and Wine shows that the proposed cluster validity evaluation function can not only accurately reflect the true cluster structure of the data, but also effectively suppress the sensitivity of the algorithm to random initialization. The multiple runs of the K-means algorithm also verify the robustness of the cluster validity evaluation function.
- cluster validity evaluation function /
- K-means clustering /
- the optimal number of clusters /
- the sum of centroid distances between clusters /
- the sum of distances within clusters
图 1 3种鸢尾花特征变量均值的条形图

下载: 全尺寸图片幻灯片

图 2 Iris数据的f(k)随聚类数量k的变化曲线

下载: 全尺寸图片幻灯片

图 3 10次运行中Iris数据的f(k)随k变化的曲线

下载: 全尺寸图片幻灯片

图 4 10次运行中Iris数据的E[f(k)]随k变化的曲线

下载: 全尺寸图片幻灯片

图 5 3种小麦特征变量均值的条形图

下载: 全尺寸图片幻灯片

图 6 Seeds数据的f(k)随聚类数量k的变化曲线

下载: 全尺寸图片幻灯片

图 7 15次运行中Seeds数据的f(k)随k变化的曲线

下载: 全尺寸图片幻灯片

图 8 15次运行中Seeds数据的E[f(k)]随k变化的曲线

下载: 全尺寸图片幻灯片

图 9 Wine数据的f(k)随聚类数量k的变化曲线

下载: 全尺寸图片幻灯片

图 10 15次运行中Wine数据的f(k)随k变化的曲线

下载: 全尺寸图片幻灯片

图 11 15次运行中Wine数据的E[f(k)]随k变化的曲线

下载: 全尺寸图片幻灯片

表 1 3种UCI数据集的有关信息

数据集名称样本数量属性数量真实聚类数量

Iris 150 4 3
Seeds 210 7 3
Wine 178 13 3

下载: 导出CSV

表 2 3种花4个属性的均值 cm

花种类 E[x₁] E[x₂] E[x₃] E[x₄]

Setosa 5.006 3.428 1.462 0.246
Versicolor 5.936 2.770 4.260 1.326
Virginica 6.588 2.974 5.553 2.026

下载: 导出CSV

表 3 Iris数据的f(k)与k的对应表

k f(k) k f(k) k f(k)

2 0.6469 6 8.4878 10 24.7449
3 0.0831 7 4.6767 11 10.5197
4 0.8377 8 9.6060 12 11.6496
5 3.4491 9 9.4557 − −

下载: 导出CSV

表 4 Iris数据的E[f(k)]值与k的对应表

k E[f(k)] k E[f(k)] k E[f(k)]

2 0.4940 6 6.4861 10 21.2041
3 0.2399 7 7.8881 11 17.9314
4 1.0678 8 13.6567 12 18.4198
5 2.7753 9 10.0362 − −

下载: 导出CSV

表 5 Wine(全部种类)数据各属性的平均值

x₁ x₂ x₃ x₄ x₅ x₆ x₇

E[x] 13.00 2.34 2.37 19. 50 99.74 2.30 2.03
x₈ x₉ x₁₀ x₁₁ x₁₂ x₁₃
E[x] 0.36 1.59 5.06 0.96 2.61 746.90

下载: 导出CSV

表 6 Wine数据的f(k)与k的对应表

k f(k) k f(k) k f(k)

2 0.6299 7 16.4146 12 50.1047
3 0.0044 8 14.4553 13 45.5446
4 1.0391 9 19.4455 14 35.0423
5 9.5062 10 18.4569 15 44.2383
6 9.5934 11 21.0099 16 78.6142

下载: 导出CSV

表 7 Wine数据的E[f(k)]值与k的对应表

k f(k) k f(k) k f(k)

2 0.5939 7 11.2073 12 40.0710
3 0.1627 8 16.0550 13 43.3987
4 1.1079 9 19.0451 14 64.5943
5 2.9447 10 22.5938 15 59.2443
6 6.5854 11 36.4730 16 78.6860

下载: 导出CSV

[1]	CHEN M, MAO S, LIU Y. Big data: A survey[J]. Mobile Networks & Applications, 2014, 19: 171-209.
[2]	XU R, WUNSCH II D C. Clustering[J]. IEEE Computational Intelligence Magazine, 2009, 4(3): 92-95. doi: 10.1109/MCI.2009.933101
[3]	PASUPATHI S, SHANMUGANATHAN V, MADASAMY K, et al. Trend analysis using agglomerative hierarchical clustering approach for time series big data[J]. The Journal of Supercomputing, 2021, 77: 6505-6524. doi: 10.1007/s11227-020-03580-9
[4]	ISHIZAKA A, LOKMAN B, TASIOU M. A stochastic multi-criteria divisive hierarchical clustering algorithm[J]. Omega, 2021, 103: 102370.
[5]	YANG M S, SINAGA K P. Collaborative feature-weighted multi-view fuzzy c-means clustering[J]. Pattern Recognition, 2021, 119: 108064. doi: 10.1016/j.patcog.2021.108064
[6]	ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland: AAAI, 1996: 1-6.
[7]	STEINLY D. K-means clustering: A half-century synthesis[J]. British Journal of Mathematical and Statistical Psychology, 2006, 59: 1-34. doi: 10.1348/000711005X48266
[8]	ROUX M. A Comparative study of divisive and agglomerative hierarchical clustering algorithms[J]. Journal of Classification, 2018, 35: 345-366. doi: 10.1007/s00357-018-9259-9
[9]	SMIEJA M, WIERCIOCH M. Constrained clustering with a complex cluster structure[J]. Advances in Data Analysis and Classification, 2017, 11: 493-518. doi: 10.1007/s11634-016-0254-x
[10]	MORLINI I, ZANI S. Dissimilarity and similarity measures for comparing dendrograms and their applications[J]. Advances in Data Analysis and Classification, 2012, 6: 85-105. doi: 10.1007/s11634-012-0106-2
[11]	LEONARDI M, GREGORIO L D, FAUSTO D D. Air traffic security: Aircraft classification using ADS-B message’s phase-pattern[J]. Aerospace, 2017, 4(51): 1-13.
[12]	SELIM S Z, ISMAIL M A. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 6(1): 81-87.
[13]	ZHU E, MA R. An effective partitional clustering algorithm based on new clustering validity index[J]. Applied Soft Computing, 2018, 71: 608-621. doi: 10.1016/j.asoc.2018.07.026
[14]	GORDON A D. Cluster validation[C]//Proceedings of the 5th Conference of the International Federation of Classification Societies (IFCS-96). [S. l. ]: Springer, 1998: 22-39.
[15]	SERGIOS T, KONSTANTINOS K. Pattern recognition[M]. 4th ed. San Diego: Academic Press, 2009.
[16]	JAIN A K, DUBES R C. Algorithms for clustering data[M]. Englewood Cliffs NJ: Prentice Hall, 1988.
[17]	CHEN C H. Handbook of pattern recognition and computer vision[M]. 6th ed. Hackensack: World Science Publishing Company, 2020.
[18]	余冬华, 郭茂祖, 刘扬, 等. 基于距离不等式的K-medoids聚类算法[J]. 软件学报, 2017, 28(12): 3115-3128. doi: 10.13328/j.cnki.jos.005237 YU D H, GUO M Z, LIU Y, et al. K-medoids clustering algorithm based on distance inequality[J]. Journal of Software, 2017, 28(12): 3115-3128. doi: 10.13328/j.cnki.jos.005237
[19]	周恩波, 毛善君, 李梅, 等. GPU加速的改进PAM聚类算法研究与应用[J]. 地球信息科学学报, 2017, 19(6): 782-791. doi: 10.3969/j.issn.1560-8999.2017.06.007 ZHOU E B, MAO S J, LI M, et al. Research and application of accelerating improved PAM clustering algorithm by GPU[J]. Journal of Geo-Information Science, 2017, 19(6): 782-791. doi: 10.3969/j.issn.1560-8999.2017.06.007
[20]	HUANG W T, CHANG Y P. Some empirical Bayes rules for selecting the best population with multiple criteria[J]. Journal of Statistical Planning and Inference, 2006, 136: 2129-2143. doi: 10.1016/j.jspi.2005.08.032
[21]	李霓, 齐琦, 王凯华. 基于改进型经验法则的工艺偏差统计[J]. 海南师范大学学报(自然科学版), 2016, 29(1): 22-25. LI N, QI Q, WANG K H. Deviation analysis of process parameter based on improved empirical rule[J]. Journal of Hainan Normal University (Natural Science), 2016, 29(1): 22-25.
[22]	BALAKRISHNAN N, MA Y. Empirical Bayes rules for selecting the most and least probable multivariate hypergeometric event[J]. Statistics & Probability Letters, 1996, 27: 181-188.
[23]	GUPTA S S, HSIAO P. Empirical Bayes rules for selecting good populations[J]. Journal of Statistical Planning and Inference, 1983, 8: 87-101. doi: 10.1016/0378-3758(83)90064-2

[1]	李海林, 张丽萍. 时间序列数据挖掘中的聚类研究综述 . 电子科技大学学报, 2022, 51(3): 416-424. doi: 10.12178/1001-0548.2022055
[2]	张林兵, 郭强, 吴行斌, 梁耀洲, 刘建国. 基于多维行为分析的用户聚类方法研究 . 电子科技大学学报, 2020, 49(2): 315-320. doi: 10.12178/1001-0548.2018212
[3]	钱志森, 黄瑞章, 魏琴, 秦永彬, 陈艳平. 半监督语义动态文本聚类算法 . 电子科技大学学报, 2019, 48(6): 803-808. doi: 10.3969/j.issn.1001-0548.2019.06.001
[4]	李海林, 魏苗. 自适应属性加权近邻传播聚类算法 . 电子科技大学学报, 2018, 47(2): 247-255. doi: 10.3969/j.issn.1001-0548.2018.02.014
[5]	于娟, 曹晓. 基于百科词条的本体概念聚类方法研究 . 电子科技大学学报, 2017, 46(3): 636-640. doi: 10.3969/j.issn.1001-0548.2017.03.026
[6]	吴一全, 李海杰, 宋昱. 基于引导核聚类的非局部均值图像去噪算法 . 电子科技大学学报, 2016, 45(1): 36-42. doi: 10.3969/j.issn.1001-0548.2016.01.005
[7]	杨燕, 冯晨菲, 贾真, 王红军. 基于链接的模糊聚类集成方法 . 电子科技大学学报, 2014, 43(6): 887-892. doi: 10.3969/j.issn.1001-0548.2014.06.016
[8]	邓晓政, 焦李成. 流形距离的自动免疫克隆聚类图像分割算法 . 电子科技大学学报, 2014, 43(5): 742-748. doi: 10.3969/j.issn.1001-0548.2014.05.019
[9]	施侃晟, 刘海涛, 白英彩, 宋文涛, 洪亮亮. 余弦度量和适应度函数改进的聚类方法 . 电子科技大学学报, 2013, 42(4): 621-624. doi: 10.3969/j.issn.1001-0548.2013.04.017
[10]	曾翎, 王美玲, 陈华富. 遗传模糊C-均值聚类算法应用于MRI分割 . 电子科技大学学报, 2008, 37(4): 627-629.
[11]	舒红平, 徐振明, 邹书蓉, 何嘉. 网格聚类在多雷达数据融合算法中的应用 . 电子科技大学学报, 2007, 36(6): 1253-1256.
[12]	朵春红, 王翠茹. 网格和密度的聚类算法在CRM中的应用 . 电子科技大学学报, 2007, 36(6): 1289-1291,1314.
[13]	姜斌, 潘景昌, 郭强, 衣振萍. PCA和相融性度量在聚类算法中的应用 . 电子科技大学学报, 2007, 36(6): 1292-1295.
[14]	郑晓鸣, 吕士颖, 王晓东. 免疫接种粒子群的聚类算法 . 电子科技大学学报, 2007, 36(6): 1264-1267.
[15]	祝金荣, 胡望斌. 聚类电价预测方法研究 . 电子科技大学学报, 2007, 36(6): 1278-1281.
[16]	牛强, 夏士雄, 周勇, 张磊. 改进的模糊C-均值聚类方法 . 电子科技大学学报, 2007, 36(6): 1257-1259,1272.
[17]	耿技, 印鉴. 改进的共享型最近邻居聚类算法 . 电子科技大学学报, 2006, 35(1): 70-72.
[18]	董韵涵, 杨万麟. 改进最优聚类中心雷达目标识别法 . 电子科技大学学报, 2006, 35(2): 183-185,192.
[19]	叶茂, 陈勇. 基于分布模型的层次聚类算法 . 电子科技大学学报, 2004, 33(2): 171-174.
[20]	李秀森, 韩静轩, 马力. 增长因素为聚类变量的因素分析 . 电子科技大学学报, 2002, 31(2): 204-206.

点击查看大图

图(11) / 表(7)

计量

文章访问数: 4395
HTML全文浏览量: 1089
PDF下载量: 80
被引次数: 0

全文HTML

在大数据时代^[1]，数据分类是数据应用的基础，由于无监督的分类(unsupervised classification)或聚类(clustering)^[2]不需要对数据进行训练，因而获得了广泛应用。聚类是采用多元统计方法，依据数据间的相似性或距离测度直接把性质相近的数据归为一类，性质差异较大的样本归属于不同的类。聚类分析中的聚类结构有3种：分区(partitional)聚类、层次(hierarchical)聚类和单个(individual)集群。层次聚类又分为凝聚层次聚类^[3]和分裂层次聚类^[4]。常用的聚类法有模糊C均值聚类^[5]、密度基(density-based)聚类^[6]以及K-均值(K-Means)类的聚类^[7]等。

在无先验知识情况下对数据分析的关键是找出数据中的固有划分(inherent partitions)，尽管聚类算法可以划分数据，但不同算法或同一种算法采用不同的参数将产生出不同的数据划分或揭示不同的聚类结构(clustering structures)。因此，客观、定量地评价算法的聚类结果就显得十分重要。换句话说，由一种聚类算法得到的聚类结构是否有意义，即聚类验证(cluster validation)非常重要。层次聚类是基于邻近矩阵(proximity matrix)将数据组织到层次结构中，其结果通常用树状图^[8]表示。与层次聚类相比，分区聚类将一组数据对象分配到没有任何层次结构的 k 个聚类中^[9]，而且这个过程通常伴随着对一个准则函数的不断优化。在分区聚类算法中，应用最广泛的一种准则函数是平方误差和准则(sum-of-squared-error criterion)^[2]。使得平方误差和为最小的划分被认为是最优的，一般称其为最小方差(minimum variance)划分^[7]。数据的聚类是指：在同一类中数据对象具有很高的相似度(similarity)，而不同聚类之间的数据则具有较高的相异性(dissimilarity)^[10]。显然，相似性与相异性(或称距离)可概括为邻近性，它既可以描述数据点之间、数据类之间的远近关系，又可以描述数据点与数据类之间的远近关系。对于聚类分析，常用的距离是欧几里得(欧氏)距离，利用欧氏距离形成的聚类对特征空间中的平移和旋转变换具有不变性^[11]。

4. 结束语

为了克服K-均值聚类算法需要用户预先指定聚类数量的缺陷，本文对K-均值算法的基本迭代步骤和聚类有效性进行了分析；然后，基于数据点的欧几里得距离，给出了类间质心距离之和、类内距离之和的定义，用于度量不同聚类间和同一聚类的数据距离；最后，提出了一种由类间质心距离之和与类内距离之和构造而成的聚类有效性评价函数，用以确定数据最优的聚类数量。在数据可能的聚类数量范围内，利用求解聚类有效性评价函数的最小值来确定K-均值算法的最佳聚类数量。通过对UCI中Iris、Seeds和Wine数据集的仿真，证明了所提出的聚类有效性评价函数不仅能够准确地反映原始数据的真实聚类结构，而且还能有效地降低K-均值算法对随机初始化的敏感性。

参考文献 (23)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

K-Means算法最优聚类数量的确定

doi: 10.12178/1001-0548.2021393

作者简介:
何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

通讯作者: 何选森，E-mail：xshe2010@163.com

Determination of the Optimal Number of Clusters in K-Means Algorithm

计量

K-Means算法最优聚类数量的确定

doi: 10.12178/1001-0548.2021393

1. 广州商学院信息技术与工程学院　广州　511363

2. 湖南大学信息科学与工程学院　长沙　410082

3. 北京理工大学管理与经济学院　北京海淀区　100081

作者简介:
何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

通讯作者: 何选森，E-mail：xshe2010@163.com

English Abstract

Determination of the Optimal Number of Clusters in K-Means Algorithm

1. School of Information Technology and Engineering, Guangzhou College of Commerce　Guangzhou　511363

2. College of Information Science and Engineering, Hunan University　Changsha　410082

3. School of Management and Economics, Beijing Institute of Technology　Haidian Beijing　100081

全文HTML

3.1. 数据集Iris的仿真

3.2. 数据集Seeds的仿真

3.3. 数据集Wine的仿真

目录

期刊在线

编辑办公

友情链接

数据集名称	样本数量	属性数量	真实聚类数量
Iris	150	4	3
Seeds	210	7	3
Wine	178	13	3

花种类	E[x₁]	E[x₂]	E[x₃]	E[x₄]
Setosa	5.006	3.428	1.462	0.246
Versicolor	5.936	2.770	4.260	1.326
Virginica	6.588	2.974	5.553	2.026

k	f(k)	k	f(k)	k	f(k)
2	0.6469	6	8.4878	10	24.7449
3	0.0831	7	4.6767	11	10.5197
4	0.8377	8	9.6060	12	11.6496
5	3.4491	9	9.4557	−	−

k	E[f(k)]	k	E[f(k)]	k	E[f(k)]
2	0.4940	6	6.4861	10	21.2041
3	0.2399	7	7.8881	11	17.9314
4	1.0678	8	13.6567	12	18.4198
5	2.7753	9	10.0362	−	−

	x₁	x₂	x₃	x₄	x₅	x₆	x₇
E[x]	13.00	2.34	2.37	19. 50	99.74	2.30	2.03

	x₈	x₉	x₁₀	x₁₁	x₁₂	x₁₃
E[x]	0.36	1.59	5.06	0.96	2.61	746.90

k	f(k)	k	f(k)	k	f(k)
2	0.6299	7	16.4146	12	50.1047
3	0.0044	8	14.4553	13	45.5446
4	1.0391	9	19.4455	14	35.0423
5	9.5062	10	18.4569	15	44.2383
6	9.5934	11	21.0099	16	78.6142

k	f(k)	k	f(k)	k	f(k)
2	0.5939	7	11.2073	12	40.0710
3	0.1627	8	16.0550	13	43.3987
4	1.1079	9	19.0451	14	64.5943
5	2.9447	10	22.5938	15	59.2443
6	6.5854	11	36.4730	16	78.6860

k	f(k)	k	f(k)	k	f(k)
2	0.6299	7	16.4146	12	50.1047
3	0.0044	8	14.4553	13	45.5446
4	1.0391	9	19.4455	14	35.0423
5	9.5062	10	18.4569	15	44.2383
6	9.5934	11	21.0099	16	78.6142

k	f(k)	k	f(k)	k	f(k)
2	0.5939	7	11.2073	12	40.0710
3	0.1627	8	16.0550	13	43.3987
4	1.1079	9	19.0451	14	64.5943
5	2.9447	10	22.5938	15	59.2443
6	6.5854	11	36.4730	16	78.6860

留言板

K-Means算法最优聚类数量的确定

doi: 10.12178/1001-0548.2021393

作者简介: 何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

通讯作者: 何选森，E-mail：xshe2010@163.com

Determination of the Optimal Number of Clusters in K-Means Algorithm

计量

出版历程

K-Means算法最优聚类数量的确定

doi: 10.12178/1001-0548.2021393

1. 广州商学院信息技术与工程学院 广州 511363 2. 湖南大学信息科学与工程学院 长沙 410082 3. 北京理工大学管理与经济学院 北京 海淀区 100081

作者简介: 何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

通讯作者: 何选森，E-mail：xshe2010@163.com

English Abstract

Determination of the Optimal Number of Clusters in K-Means Algorithm

1. School of Information Technology and Engineering, Guangzhou College of Commerce Guangzhou 511363 2. College of Information Science and Engineering, Hunan University Changsha 410082 3. School of Management and Economics, Beijing Institute of Technology Haidian Beijing 100081

全文HTML

3.1. 数据集Iris的仿真

3.2. 数据集Seeds的仿真

3.3. 数据集Wine的仿真

目录

期刊在线

编辑办公

友情链接

作者简介:
何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

1. 广州商学院信息技术与工程学院　广州　511363

2. 湖南大学信息科学与工程学院　长沙　410082

3. 北京理工大学管理与经济学院　北京海淀区　100081

作者简介:
何选森，男，教授，主要从事统计信号处理、盲源分离、无线通信、机器学习等方面的研究

1. School of Information Technology and Engineering, Guangzhou College of Commerce　Guangzhou　511363

2. College of Information Science and Engineering, Hunan University　Changsha　410082

3. School of Management and Economics, Beijing Institute of Technology　Haidian Beijing　100081