分布式数据挖掘计算过程——DDCP算法研究
Research on Distributive Datamining Calculating Process——DDCP Algorithm
-
摘要: 提出了一种关联规则挖掘大项集生成的并行和分布式处理的计算框架的算法,该算法以大规模事务数据库为基础,将数据有效地分片后作分布或者并行处理,通过节点之间的通信降低了节点间传输的数据量。通过算法实例验证了算法的正确性和可行性,可以在分布式或者并行环境里实现高效的数据挖掘。Abstract: This article proposed a algorithm of the calculate architecture used for the association rule and this algorithm based on the data partition, fully uses the merits and specialties, at the same time uses controller to assign transactions randomly to resolve the data skew in the database. The algorithm is used for the example and shows the correctness and feasibility. It can be used for distribute database and most applicable for distribute calculation.