面向时间序列有序分类的Shapelet抽取算法

Shapelet Extraction Algorithm for Time Series Ordinal Classification

  • 摘要: 当前面向时间序列有序分类的Shapelet抽取算法,首先计算Shapelet与时间序列之间的欧式距离及其类别标签之间的距离,然后根据两种距离的皮尔逊相关系数或斯皮尔曼相关系数来对Shapelet进行评价,效率较低。针对该问题,提出一种基于SAX表示时间序列的Shapelet评价指标CD-Cover,该指标同时考虑Shapelet对时间序列数据集的覆盖集中度和覆盖优势度。其次,提出一种基于随机采样的Shapelet抽取算法,该算法采用布隆过滤器对候选Shapelet进行预剪枝,采用移除自相似策略对抽取结果进行后剪枝。在11个时间序列公开数据集上的实验结果表明,相比现有方法,该算法抽取的Shapelet具有更好的有序分类能力,且算法的计算效率也更高。

     

    Abstract: The current Shapelet extraction algorithm for time series ordinal classification, which suffers from low efficiency, needs to figure out the Pearson's correlation coefficient or the Spearman's correlation coefficient between the Euclidean distances and the label distances from time series to Shapelets to evaluate the Shapelets. To handle this problem, this paper first proposes a Shapelet measure CD-Cover (concentration and dominance of coverage) based on the SAX (symbolic aggregate approximation)-represented time series. The measure takes into account both the concentration and the dominance of coverage of a Shapelet on the time series dataset. Secondly, this paper also proposes a Shapelet extraction algorithm based on random sampling. The algorithm uses the Bloom filter to pre-prune Shapelet candidates and employs a strategy of removing self-similar Shapelets to post-prune the extracting results. Experimental results on 11 time series public datasets show that the Shapelet extracted by the proposed algorithm has better ability for ordinal classification than the existing methods, and meanwhile, the computing efficiency of the proposed algorithm is superior to that of the existing methods.

     

/

返回文章
返回