基于网格划分加权的分布式离群点检测算法

A Weighted Distributed Outlier Detection Algorithm Based on Grid Partition

  • 摘要: 分布式计算被广泛应用于离群点检测问题,但分布式环境中节点计算性能的差异带来了数据计算性能的下降问题。针对面向大尺度高维数据离群点分布式计算的负载均衡问题,该文提出了一种加权分布式离群点检测方法。首先根据数据节点的计算性能确定数据节点的权值,然后将数据空间划分为若干个网格,最后设计了一种基于网格划分的加权分配算法WGBA,将这些网格分配到数据节点中,实现并行计算。实验验证了该方法的有效性。

     

    Abstract: Outlier detection as one of the hot issues in data mining area aims to discover the objects with abnormal behaviors from the original data distribution. And it can generate many valuable applications, e.g., bank fraud, network instruction and etc. Currently, distributed computing has been widely applied in outlier detection. However, it still brings the lower performance of data computing since there are computing differences in compute nodes of distributed environment. To solve the problem of load balancing in distributed computing-based outlier detection with respect to large scale and high dimensional data, a weighted distributed outlier detection method has been proposed. First, we tend to ascertain the weight of data node based on computing performance of data node, whereafter dividing the data space into several grids. At last, for the purpose of parallel computing, a weighted grid-based allocation algorithm based on grid dividing is proposed, which allocates the grids to configured data nodes. The extensive experiments verify the effectiveness of proposed method, and demonstrate its better performance.

     

/

返回文章
返回