时间序列数据挖掘中的聚类研究综述

李海林; 张丽萍

doi:10.12178/1001-0548.2022055

时间序列数据挖掘中的聚类研究综述

doi: 10.12178/1001-0548.2022055

李海林^{1, 2, ,},
张丽萍¹

1.
华侨大学信息管理与信息系统系　福建泉州　362021
2.
华侨大学应用统计与大数据研究中心　福建厦门　361021

基金项目: 国家自然科学基金面上项目(71771094)

详细信息

作者简介:
李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

通讯作者: 李海林，E-mail：blihailin@163.com

中图分类号: TP273

Summary of Clustering Research in Time Series Data Mining

LI Hailin^{1, 2
, ,},
ZHANG Liping¹

1.
Department of Information Management and Information Systems, Huaqiao University　Quanzhou Fujian　362021
2.
Research Center for Applied Statistics and Big Data, Huaqiao University　Xiamen Fujian　361021

摘要: 鉴于时间序列数据的高维性和复杂性给数据挖掘带来的困扰以及聚类分析在时间序列数据挖掘领域中的重要性，对目前该领域国内外相关时间序列数据聚类研究的状况进行综述。时间序列聚类总体上可分为整体时间序列聚类、子序列聚类和时间点聚类3种，分别从特征表示、相似性度量、聚类算法和簇原型等方面来研究，同时也结合了具体的应用分析。根据时间序列数据挖掘中聚类存在的主要问题，提出了部分未来值得关注和研究的内容和方向，以便更好地促进时间序列数据聚类分析的研究与发展。
- 聚类分析 /
- 数据挖掘 /
- 高维性 /
- 时间序列 /
- 时间序列聚类
Abstract: In view of the high dimensionality and complexity of time series data bringing trouble to data mining and the importance of clustering analysis in the field of time series data mining, this paper summarizes the research status of time series data clustering at home and abroad. Time series clustering can be divided into the whole-time-series clustering, the subsequence clustering, and it can be studied from the aspects of feature representation, similarity measurement, clustering algorithm and cluster prototype, as well as the specific applications analysis. According to the main problems existed in the time series clustering, this work proposes some contents and directions that are worthy of being researched in the future. All the work is to better promote the research and development of time series data clustering.
- clustering analysis /
- data mining /
- high dimensionality /
- time series /
- time series clustering
图 1 两种方法对整体时间序列数据层次聚类

下载: 全尺寸图片幻灯片

图 2 时间序列数据聚类的主要研究问题

下载: 全尺寸图片幻灯片

图 3 欧氏距离与动态时间弯曲度量

下载: 全尺寸图片幻灯片

图 4 簇平均代表序列

下载: 全尺寸图片幻灯片

[1]	李学龙, 龚海刚. 大数据系统综述[J]. 中国科学: 信息科学, 2015, 45(1): 1-44. LI X L, GONG H G. A survey on big data systems[J]. Scientia Sinica (Information Sciences), 2015, 45(1): 1-44.
[2]	ESLING P, AGON C. Time-series data mining[J]. ACM Computing Surveys, 2012, 45(1): 1-12.
[3]	YANG Q, WU X. 10 challenging problems in data mining research[J]. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604.
[4]	EZUGWUA A E, IKOTUNA A M, OYELADE O O, et al. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects[J]. Engineering Applications of Artificial Intelligence, 2022, 110: 104743. doi: 10.1016/j.engappai.2022.104743
[5]	CHAO X, KOU G, PENG Y, et al. Large-scale group decision-making with non-cooperative behaviors and heterogeneous preferences: An application in financial inclusion[J]. European Journal of Operational Research, 2021, 288(1): 271-293. doi: 10.1016/j.ejor.2020.05.047
[6]	CHAO X, KOU G, PENG Y, et al. An efficient consensus reaching framework for large-scale social network group decision making and its application in urban resettlement[J]. Information Sciences, 2021, 575: 499-527. doi: 10.1016/j.ins.2021.06.047
[7]	MÖRCHEN F, ULTSCH A, HOOS O. Extracting interpretable muscle activation patterns with time series knowledge mining[J]. International Journal of Knowledge Based Intelligent Engineering Systems, 2005, 9(3): 197-208. doi: 10.3233/KES-2005-9304
[8]	LI H L. Asynchronism-based principal component analysis for time series data mining[J]. Expert Systems with Applications, 2014, 41(C): 2842-2850.
[9]	KEOGH E, LIN J. Clustering of time-series subsequences is meaningless: Implications for previous and future research[J]. Knowledge and Information Systems, 2005, 8(2): 154-177. doi: 10.1007/s10115-004-0172-7
[10]	LIAO T W. Clustering of time series data—a survey[J]. Pattern Recognition, 2005, 38(11): 1857-1874. doi: 10.1016/j.patcog.2005.01.025
[11]	李海林, 梁叶, 王少春. 时间序列数据挖掘中的动态时间弯曲研究综述[J]. 控制与决策, 2018, 33(8): 1345-1353. LI H L, LIANG Y, WANG S C. Review on dynamic time warping in time series data mining[J]. Control and Decision, 2018, 33(8): 1345-1353.
[12]	郭崇慧, 苏木亚. 基于独立成分分析的时间序列谱聚类方法[J]. 系统工程理论与实践, 2011, 31(10): 1921-1931. doi: 10.12011/1000-6788(2011)10-1921 GUO C H, SU M Y. Spectral clustering method based on independent component analysis for time series[J]. Systems Engineering-Theory & Practice, 2011, 31(10): 1921-1931. doi: 10.12011/1000-6788(2011)10-1921
[13]	LAI C P, CHUNG P C, TSENG V S. A novel two-level clustering method for time series data analysis[J]. Expert Systems with Applications, 2010, 37(9): 6319-6326. doi: 10.1016/j.eswa.2010.02.089
[14]	王骏, 王士同, 邓赵红. 聚类分析研究中的若干问题[J]. 控制与决策, 2012, 27(3): 321-328. WANG J, WANG S T, DENG Z H. Survey on challenges in clustering analysis research[J]. Control and Decision, 2012, 27(3): 321-328.
[15]	张旭, 张亮, 金博, 等. 基于不确定性的多元时间序列分类算法研究[J]. 自动化学报, 2021, 48(11): 1-15. ZHANG X, ZHANG L, JIN B, et al. Uncertainty-based multivariate time series classification[J]. Acta Automatica Sinica, 2021, 48(11): 1-15.
[16]	施沫寒, 王志海. 一种基于时间序列特征的可解释步态识别方法[J]. 中国科学:信息科学, 2020, 50(3): 438-460. SHI M H, WANG Z H. An interpretable gait recognition method based on time series features[J]. Science in China (Information Sciences), 2020, 50(3): 438-460.
[17]	SORIANO-VARGAS A, WERNECK R, MOURA R, et al. A visual analytics approach to anomaly detection in hydrocarbon reservoir time series data[J]. Journal of Petroleum Science and Engineering, 2021, 206: 108988. doi: 10.1016/j.petrol.2021.108988
[18]	LI X, LIN J, ZHAO L. Time series clustering in linear time complexity[J]. Data Mining and Knowledge Discovery, 2021, 35: 2369-2388. doi: 10.1007/s10618-021-00798-w
[19]	孙冬璞, 曲丽. 时间序列特征表示与相似性度量研究综述[J]. 计算机科学与探索, 2021, 15(2): 195-205. doi: 10.3778/j.issn.1673-9418.2003063 SUN D P, QU L. Survey on feature representation and similarity measurement of rime series[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 195-205. doi: 10.3778/j.issn.1673-9418.2003063
[20]	李海林. 时间序列数据挖掘中的特征表示与相似性度量方法研究[D]. 大连: 大连理工大学, 2012. LI H L. Research on feature representation and similarity measure methods in time series data mining[D]. Dalian: Dalian University of Technology, 2012.
[21]	LI H L. Accurate and efficient classification based on common principal components analysis for multivariate time series[J]. Neurocomputing, 2016, 171(1): 744-753.
[22]	范晓诗, 雷英杰, 路艳丽, 等. 基于DTW的长期直觉模糊时间序列预测模型[J]. 通信学报, 2016, 37(8): 95-104. doi: 10.11959/j.issn.1000-436x.2016160 FAN X S, LEI Y J, LU Y L, et al. Long-term intuitionistic fuzzy time series forecasting model based on DTW[J]. Journal on Communications, 2016, 37(8): 95-104. doi: 10.11959/j.issn.1000-436x.2016160
[23]	江艺羡. 基于GM(1, 1)模型的时间序列分段表示方法[J]. 系统工程, 2014(7): 137-142. JIANG Y X. Method of time series piecewise representation based on model GM(1, 1)[J]. Systems Engineering, 2014(7): 137-142.
[24]	SHIEH J, KEOGH E. ISAX: Indexing and mining terabyte sized time series[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2008: 623-631.
[25]	FUCHS E, GRUBER T, NITSCHKE J, et al. On-line segmentation of time series based on polynomial least-squares approximation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(12): 2232-2245. doi: 10.1109/TPAMI.2010.44
[26]	D'URSO P, GIOVANNI L D, MASSARI R. GARCH-based robust clustering of time series[J]. Fuzzy Sets and Systems, 2016, 305(15): 1-28.
[27]	KARAMITOPOULOS L, EVANGELIDIS G. PCA-based time series similarity search[J]. Data Mining Annals of Information Systems, 2010, 8: 255-276.
[28]	WENG X, SHEN J. Classification of multivariate time series using two dimensional singular value decomposition[J]. Knowledge-Based Systems, 2008, 21(7): 535-539. doi: 10.1016/j.knosys.2008.03.014
[29]	ZOU Y, DONNER R V, MARWAN N, et al. Complex network approaches to nonlinear time series analysis[J]. Physics Reports, 2019, 787: 1-97. doi: 10.1016/j.physrep.2018.10.005
[30]	ZAKARIA J, MUEEN A, KEOGH E. Clustering time series using unsupervised-shapelets[C]//Proceedings of IEEE 12th International Conference on Data Mining (ICDM). [S.l.]: IEEE, 2012: 785-794.
[31]	谢聪慧, 吴世新, 张晨, 等. 基于谱系聚类的全球各国新冠疫情时间序列特征分析[J]. 地球信息科学学报, 2021, 23(2): 236-245. doi: 10.12082/dqxxkx.2021.200470 XIE C H, WU S X, ZHANG C, et al. Analysis of time series features of COVID-19 in various countries based on pedigree clustering[J]. Journal of Geo-Information Science, 2021, 23(2): 236-245. doi: 10.12082/dqxxkx.2021.200470
[32]	AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases[J]. Foundation of Data Organization Algorithms, 1993, 46: 69-84.
[33]	LI H L. Time works well: Dynamic time warping based on time weighting for time series data mining[J]. Information Sciences, 2021, 547: 592-608. doi: 10.1016/j.ins.2020.08.089
[34]	PETITJEAN F, FORESTIER G, WEBB G, et al. Dynamic time warping averaging of time series allows faster and more accurate classification[C]//Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM). [S.l.]: IEEE, 2014: 470-479.
[35]	PAPARRIZOS J, GRAVANO L. Fast and accurate time-series clustering[J]. ACM Transactions on Database Systems, 2017, 42(2): 1-49.
[36]	KEOGH E, PAZZANI M. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback[C]//Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. New York: ACM, 1998: 239-241.
[37]	HAUTAMÄKI V, NYKÄNEN P, FRÄNTI P. Time-series clustering by approximate prototypes[C]//Proceedings of the 19th International Conference on Pattern Recognition. Piscataway: IEEE, 2008: 1-4.
[38]	FAYYAD U M, REINA C, BRADLEY P S. Initialization of iterative refinement clustering algorithms[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: ACM, 1998: 194-198.
[39]	BERINGER J, HÜLLERMEIER E. Online clustering of parallel data streams[J]. Data & Knowledge Engineering, 2006, 58(2): 180-204.
[40]	LI H L, WEI M. Fuzzy clustering based on feature weights for multivariate time series[J]. Knowledge-Based Systems, 2020, 197: 1-11.
[41]	CORDUAS M, PICCOLO D. Time series clustering and classification by the autoregressive metric[J]. Computational Statistics & Data Analysis, 2008, 52(4): 1860-1872.
[42]	CHANDRAKALA S, SEKHAR C C. A density based method for multivariate time series clustering in kernel feature space[C]//Proceedings of IEEE International Joint Conference on Neural Networks. Piscataway: IEEE, 2008: 1885-1890.
[43]	WANG W, YANG J, MUNTZ R. STING: A statistical information grid approach to spatial data mining[C]// Proceedings of the International Conference on Very Large Data Bases. New York: ACM, 1997: 186-195.
[44]	ZHANG X, LIU J, DU Y, et al. A novel clustering method on time series data[J]. Expert Systems with Applications, 2011, 38(9): 11891-11900. doi: 10.1016/j.eswa.2011.03.081
[45]	JASTRZEBSKAA A, NÁPOLESB G, SALGUEIRO Y, et al. Evaluating time series similarity using concept-based models[J]. Knowledge-based Systems, 2022, 238(28): 107811.
[46]	LI H, LIU Z. Multivariate time series clustering based on complex network[J]. Pattern Recognition, 2021, 115: 107919. doi: 10.1016/j.patcog.2021.107919
[47]	LI H, DU T. Multivariate time-series clustering based on component relationship networks[J]. Expert Systems with Applications, 2021, 173: 114649. doi: 10.1016/j.eswa.2021.114649
[48]	BAGNALL A, JANACEK G. Clustering time series with clipped data[J]. Machine Learning, 2005, 58(2-3): 151-178. doi: 10.1007/s10994-005-5825-6
[49]	COPPI R, D’URSO P, GIORDANI P. Fuzzy c-medoids clustering models for time-varying data[M]//BERNADETTE B M, GIULIANELLA C, RONALD R Y. Modern Information Processing: From Theory Applications. [S.l.]: Elsevier, 2006: 195-206.
[50]	PETITJEAN F, KETTERLIN A, GANÇARSKI P. A global averaging method for dynamic time warping with applications to clustering[J]. Pattern Recognition, 2011, 44(3): 678-693. doi: 10.1016/j.patcog.2010.09.013
[51]	闫相斌, 李一军, 邹鹏, 等. 动静态属性数据相结合的客户分类方法研究[J]. 中国管理科学, 2005, 13(2): 95-100. doi: 10.3321/j.issn:1003-207X.2005.02.016 YAN X B, LI Y J, ZOU P, et al. Customer segmentation based on integration of dynamic and static attributes[J]. Chinese Journal of Management Science, 2005, 13(2): 95-100. doi: 10.3321/j.issn:1003-207X.2005.02.016
[52]	CAI L, QU S, YUAN Y, et al. A clustering-ranking method for many-objective optimization[J]. Applied Soft Computing, 2015, 35: 681-694. doi: 10.1016/j.asoc.2015.06.020
[53]	王德青, 何凌云, 朱建平. 基于函数型自适应聚类的股票收益波动模式比较[J]. 统计研究, 2018, 35(9): 79-91. WANG D Q, HE L Y, ZHU J P. Comparing returns volatility patterns of SSE50 stock shares via functional adaptive clustering[J]. Statistical Research, 2018, 35(9): 79-91.
[54]	LI H, WU Y J, CHEN Y. Time is money: Dynamic-model-based time series data-mining for correlation analysis of commodity sales[J]. Journal of Computational and Applied Mathematics, 2020, 370: 1-20.
[55]	刘乃龙, 周晓东, 刘钊铭, 等. 基于多变量时间序列的接触状态聚类分析[J]. 电子科技大学学报, 2020, 49(5): 660-665. doi: 10.12178/1001-0548.2020192 LIU N L, ZHOU X D, LIU Z M, et al. Contact state clustering analysis based on multivariate time series[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(5): 660-665. doi: 10.12178/1001-0548.2020192
[56]	李乃和, 汤世强. 特大型连续服务供应商的顾客细分与新顾客归类判别研究[J]. 中国管理科学, 2015(S1): 602-609. LI N H, TANG S Q. Research on customers' segmentation and new comers' classification by super-large scale continuous service provider[J]. Chinese Journal of Management Science, 2015(S1): 602-609.
[57]	VÍCTOR R, HÉCTOR D, DAVID C. Analysing temporal performance profiles of UAV operators using time series clustering[J]. Expert Systems with Applications, 2017, 70(15): 103-118.
[58]	鲁峰, 黄金泉. 基于灰色关联聚类的特征提取算法[J]. 系统工程理论与实践, 2012, 32(4): 872-876. doi: 10.3969/j.issn.1000-6788.2012.04.025 LU F, HUANG J Q. Feature extraction algorithm of clustering based on grey relational theory[J]. Systems Engineering-Theory & Practice, 2012, 32(4): 872-876. doi: 10.3969/j.issn.1000-6788.2012.04.025
[59]	王浩, 李国欢, 姚宏亮, 等. 基于影响力计算模型的股票网络社团划分方法[J]. 计算机研究与发展, 2014, 51(10): 2137-2147. doi: 10.7544/issn1000-1239.2014.20130575 WANG H, LI G H, YAO H L, et al. Stock network community detection method based on influence calculating model[J]. Journal of Computer Research and Development, 2014, 51(10): 2137-2147. doi: 10.7544/issn1000-1239.2014.20130575
[60]	CHOUDHURY S, GHOSH S, BHATTACHARYA A, et al. A real time clustering and SVM based price-volatility prediction for optimal trading strategy[J]. Neurocomputing, 2014, 131: 419-426. doi: 10.1016/j.neucom.2013.10.002
[61]	AGHABOZORGI S, YING W T. Stock market co-movement assessment using a three-phase clustering method[J]. Expert Systems with Applications, 2014, 41(4): 1301-1314. doi: 10.1016/j.eswa.2013.08.028

[1]	杨骏, 敬思远, 钟勇. 面向时间序列有序分类的Shapelet抽取算法 . 电子科技大学学报, 2023, 52(6): 887-896. doi: 10.12178/1001-0548.2022278
[2]	李海林, 贾瑞颖, 谭观音. 基于K-Shape的时间序列模糊分类方法 . 电子科技大学学报, 2021, 50(6): 899-906. doi: 10.12178/1001-0548.2020380
[3]	郑文, 赵偲, 李泽堃, 武啸泽, 胡涛. 基于Web数据挖掘的COVID-19流行病学特征分析 . 电子科技大学学报, 2020, 49(3): 408-414. doi: 10.12178/1001-0548.2020079
[4]	张林兵, 郭强, 吴行斌, 梁耀洲, 刘建国. 基于多维行为分析的用户聚类方法研究 . 电子科技大学学报, 2020, 49(2): 315-320. doi: 10.12178/1001-0548.2018212
[5]	杨文忠, 张志豪, 吾守尔·斯拉木, 温杰彬, 富雅玲, 王丽花, 王婷. 基于时间序列关系的GBRT交通事故预测模型 . 电子科技大学学报, 2020, 49(4): 615-621. doi: 10.12178/1001-0548.2019151
[6]	刘乃龙, 周晓东, 刘钊铭, 崔龙. 基于多变量时间序列的接触状态聚类分析 . 电子科技大学学报, 2020, 49(5): 660-665. doi: 10.12178/1001-0548.2020192
[7]	陈莉, 刘弘. 基于跨本体语义相关的三维模型检索方法 . 电子科技大学学报, 2017, 46(4): 585-590. doi: 10.3969/j.issn.1001-0548.2017.04.018
[8]	李海林, 万校基. 基于簇中心群的时间序列数据分类方法 . 电子科技大学学报, 2017, 46(3): 625-630. doi: 10.3969/j.issn.1001-0548.2017.03.024
[9]	高嵘, 王强, 罗东, 秦志光. 食品溯源时序数据的函数型聚类分析 . 电子科技大学学报, 2012, 41(4): 561-563,591. doi: 10.3969/j.issn.1001-0548.2012.04.016
[10]	杨珺, 曹阳, 马秦生, 王敏. 人工免疫行为轮廓取证分析方法 . 电子科技大学学报, 2010, 39(6): 911-914,919. doi: 10.3969/j.issn.1001-0548.2010.06.022
[11]	王金龙, 徐从富. 研究者出版记录中的会议序列提取研究 . 电子科技大学学报, 2010, 39(1): 110-113. doi: 10.3969/j.issn.1001-0548.2010.01.025
[12]	武斌, 杨义先, 郑康锋. 入侵检测中基于序列模式的告警关联分析 . 电子科技大学学报, 2009, 38(3): 415-418. doi: 10.3969/j.issn.1001-0548.2009.03.023
[13]	梁冰, 刘群. 基于自动机模型数据关联性能评估算法 . 电子科技大学学报, 2008, 37(4): 606-609,629.
[14]	王金龙, 徐从富, 徐娇芬, 骆国靖. 利用销售数据的商品影响关系挖掘研究 . 电子科技大学学报, 2007, 36(6): 1282-1285.
[15]	朵春红, 王翠茹. 网格和密度的聚类算法在CRM中的应用 . 电子科技大学学报, 2007, 36(6): 1289-1291,1314.
[16]	周巧临, 傅彦. 科学数据时间序列的预测方法 . 电子科技大学学报, 2007, 36(6): 1260-1263.
[17]	耿技, 印鉴. 改进的共享型最近邻居聚类算法 . 电子科技大学学报, 2006, 35(1): 70-72.
[18]	文琪, 彭宏. 小波变换的离群时序数据挖掘分析 . 电子科技大学学报, 2005, 34(4): 556-558.
[19]	曾异平, 朱宏, 龙文光. 趋势分析中的一种增量数据挖掘方法研究 . 电子科技大学学报, 2004, 33(4): 464-467.
[20]	李秀森, 韩静轩, 马力. 增长因素为聚类变量的因素分析 . 电子科技大学学报, 2002, 31(2): 204-206.

点击查看大图

图(4)

计量

文章访问数: 6917
HTML全文浏览量: 2159
PDF下载量: 171
被引次数: 0

全文HTML

大数据背景下，数据挖掘与分析成为信息处理和知识管理等相关学科领域重点关注的研究对象^[1]。在各种复杂数据类型中，广泛存在于金融市场和工业工程等领域的时间序列是一种与时间密切相关的数据，根据变量属性维度的大小其可分为单变量和多变量两种时间序列。相应地，时间序列数据挖掘是从时间序列数据库中发现信息与知识的理论与方法，为帮助政府和企业管理者在相关领域中提供更为可靠的辅助决策与技术支持^[2]。时间序列的高维性具有时间维度长、属性变量多、数据体量大等特征，给传统数据挖掘技术的实施带来了极大困扰，在一定程度上阻碍了其在时间序列数据分析领域中的应用与发展。因此，运用数据挖掘技术从高维时间序列数据中发现信息和知识成为了数据分析领域中具有挑战性且最主要的研究方向之一^[3]。

传统时间序列数据分析主要基于某种数据分布假设，再选取和制定计量经济模型来对时间序列数据预测分析。在大数据时代，除了需要传统的统计模型对时间序列数据进行预测与分析之外，鉴于时间序列数据具有时间维度长、属性变量多和数据体量大等高维性特征，借助机器学习、模式识别、智能计算和数据挖掘等模型和算法对高维时间序列数据可以进行深入研究与挖掘。聚类是数据挖掘相关研究和应用中非常重要的方法，涉及计算机科学、模式识别、人工智能和机器学习等多个研究领域，同时也常被用于教育、营销、医学和生物信息学等学科，在大数据、人工智能和机器人等热点领域有突出贡献^[4]。如在大规模群体决策中，聚类分析被用于划分大规模群体、处理非合作行为和社区发现等^[5-6]。聚类分析也是一项重要而且基础的工作，其过程包括了时间序列的数据表达、特征提取、相似性度量以及具体聚类模型与算法等。为此，本文对时间数据挖掘中的聚类分析进行综述研究，首先介绍了目前时间序列聚类方法分类，然后分别从特征表示、相似性度量、聚类算法和簇原型等方面进行国内外研究状况分析，最后分析了目前研究存在的不足，同时给出了未来的研究方向。

3. 主要问题与未来研究方向

时间序列数据聚类研究主要集中于整体序列数据对象，在数据降维与特征表示、相似性度量、聚类模型与算法、簇原型和应用等方法上取得了一定的学术进展，拥有应用价值，但仍存在一些问题有待探讨，以便系统性地研究和提高聚类分析在时间序列数据资料中的挖掘质量和应用性能。

1) 传统时间序列聚类模型与算法主要以一元时间序列数据为研究对象，通过数据转换、特征表示或模型参数实现时间维度的降维，利用数据挖掘中经典聚类方法进行分析，缺少兼顾时间序列数据时间维度长、属性变量多和数据体量大等高维性的问题。特别地，数据体量大造成算法在运行期间需要消耗巨大的内存空间，使得以静态处理方式为基础的传统聚类算法在此类高维时间序列数据集中无法得到有效地运行。因此，根据高维时间序列数据自身的特点，需要研究适用于高维时间序列数据的实时、动态或者增量运行的聚类算法。

2) 基于原始数据的时间序列距离度量通常需要较大的时间复杂度和空间复杂度，并且大部分距离度量方法对数据中的扭曲数据较为敏感，使其在计算过程中无法获得直观的度量效果。数据降维后的特征表示在一定程度上能够改善此种境况，降低了聚类算法和模型的复杂性，但特征表示对高维原始时间序列数据的精确定位难以实现，最终影响聚类模型与算法的精度。针对基于特征表示和相似性度量方法的高维时间序列数据聚类模型与算法研究，需要改变传统方法仅对高维数据进行某个特定维度上的数据降维和特征表示，制定适用于特征表示后由于信息丢失而造成距离度量不准确的情况，进而提升相关模型与算法的聚类效果。

3) 动态时间弯曲是时间序列数据挖掘领域中最为常用的相似性度量方法，它能有效地匹配时间序列数据中的近似形态趋势，对时间点异常数据不敏感，能够度量不等长时间序列之间的相似性，具有较好的度量质量和较高的准确性等优势。在高维时间序列数据库中，过长的时间维度、过多的属性变量和过大的数据体量造成动态时间弯曲方法容易平滑时间序列局部形态的特点，不能有效反映高维时间序列数据对象之间的形态变化关系，无法实现真实距离的有效度量，进而影响基于动态时间弯曲的高维时间序列聚类效果。如何提高动态时间弯曲方法的精确度量和计算性能是高维时间序列聚类研究中需要解决的问题之一。

4) 关于时间序列聚类的研究目前大多数集中在提升特征表示、距离度量和簇原型的质量或效率上，对于聚类本身的设计与研究相对不足。虽然已有学者利用多步聚类方法在一定程度上改进了传统聚类算法在时间序列数据中的分析效果，但也存在步骤繁琐、聚类结果易受参数设置影响、计算性能较低等问题。同时，由于时间序列数据高维性和其他特征因素的影响，聚类方法在相关应用中大多数局限于变量属性少、时间维度短和数据量少等对象的分析，较少考虑不确定性和高噪声等因素的影响，使得聚类分析理论和方法在实际应用中具有局限性。为此，通过总结现有的时间序列聚类算法优缺点，结合具体问题中的数据特征，在考虑多种特征因素影响的情况下来构建符合高维时间序列数据的高性能聚类算法值得深入研究。

4. 结束语

本文梳理了目前常用的时序聚类算法，综述了该领域中的相关研究成果，归纳了已有研究存在的不足，提出了一些值得研究的方向。研究发现，时间序列数据挖掘中聚类模型与算法的研究顺应了大数据时代潮流，解决了高维性给传统时间序列聚类分析带来不能快速有效挖掘的问题，提高和拓展了时间序列数据挖掘领域中的相关理论和方法。时间序列数据聚类研究成果能给政府部门和企业对相关事务决策提供更为完备成熟的理论基础与技术，以便进行更为科学合理的智能管理。

参考文献 (61)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

时间序列数据挖掘中的聚类研究综述

doi: 10.12178/1001-0548.2022055

作者简介:
李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

通讯作者: 李海林，E-mail：blihailin@163.com

Summary of Clustering Research in Time Series Data Mining

计量

时间序列数据挖掘中的聚类研究综述

doi: 10.12178/1001-0548.2022055

1. 华侨大学信息管理与信息系统系　福建泉州　362021

2. 华侨大学应用统计与大数据研究中心　福建厦门　361021

作者简介:
李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

通讯作者: 李海林，E-mail：blihailin@163.com

English Abstract

Summary of Clustering Research in Time Series Data Mining

1. Department of Information Management and Information Systems, Huaqiao University　Quanzhou Fujian　362021

2. Research Center for Applied Statistics and Big Data, Huaqiao University　Xiamen Fujian　361021

全文HTML

2.1. 研究地位

2.2. 数据降维与特征表示

2.3. 相似性度量

2.4. 聚类算法

2.5. 簇原型

2.6. 应用研究

目录

期刊在线

编辑办公

友情链接

留言板

时间序列数据挖掘中的聚类研究综述

doi: 10.12178/1001-0548.2022055

作者简介: 李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

通讯作者: 李海林，E-mail：blihailin@163.com

Summary of Clustering Research in Time Series Data Mining

计量

出版历程

时间序列数据挖掘中的聚类研究综述

doi: 10.12178/1001-0548.2022055

1. 华侨大学信息管理与信息系统系 福建 泉州 362021 2. 华侨大学应用统计与大数据研究中心 福建 厦门 361021

作者简介: 李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

通讯作者: 李海林，E-mail：blihailin@163.com

English Abstract

Summary of Clustering Research in Time Series Data Mining

1. Department of Information Management and Information Systems, Huaqiao University Quanzhou Fujian 362021 2. Research Center for Applied Statistics and Big Data, Huaqiao University Xiamen Fujian 361021

全文HTML

2.1. 研究地位

2.2. 数据降维与特征表示

2.3. 相似性度量

2.4. 聚类算法

2.5. 簇原型

2.6. 应用研究

目录

期刊在线

编辑办公

友情链接

作者简介:
李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

1. 华侨大学信息管理与信息系统系　福建泉州　362021

2. 华侨大学应用统计与大数据研究中心　福建厦门　361021

作者简介:
李海林(1982-)，男，博士，教授，主要从事数据挖掘与人工智能方面的研究

1. Department of Information Management and Information Systems, Huaqiao University　Quanzhou Fujian　362021

2. Research Center for Applied Statistics and Big Data, Huaqiao University　Xiamen Fujian　361021