Abstract:
Outlier detection as one of the hot issues in data mining area aims to discover the objects with abnormal behaviors from the original data distribution. And it can generate many valuable applications, e.g., bank fraud, network instruction and etc. Currently, distributed computing has been widely applied in outlier detection. However, it still brings the lower performance of data computing since there are computing differences in compute nodes of distributed environment. To solve the problem of load balancing in distributed computing-based outlier detection with respect to large scale and high dimensional data, a weighted distributed outlier detection method has been proposed. First, we tend to ascertain the weight of data node based on computing performance of data node, whereafter dividing the data space into several grids. At last, for the purpose of parallel computing, a weighted grid-based allocation algorithm based on grid dividing is proposed, which allocates the grids to configured data nodes. The extensive experiments verify the effectiveness of proposed method, and demonstrate its better performance.