关于Top-<i>N</i>最频繁项集挖掘的研究

关于Top-N最频繁项集挖掘的研究

摘要: 最频繁项集挖掘决定了文本关联规则挖掘算法的性能，是文本关联规则挖掘中研究的重点和难点。该文分析了当前最频繁项集挖掘方面的不足，改进了传统的倒排表，结合最小支持度阈值动态调整策略，提出了一个新的基于改进的倒排表和集合理论的Top-N最频繁项集挖掘算法。同样，给出了几个命题和推论，并把它们用于该文算法以提高性能，实验结果表明，所提算法的规则有效率和时间性能优于NApriori算法和IntvMatrix算法。

Abstract: Most frequent item sets mining is the focus and the difficulty of text association rules mining, andit directly determines the performance of text association rules mining algorithms. Firstly, several most frequentitem sets mining algorithms are analyzd and summarized. And then, traditional inverted list is improved. Based onthe improved list and set theory, a new TOP-N most frequent itemset mining algorithm combined minimum supportthreshold dynamic adjustment strategy is presented. In addition, several propositions and deductions for improvingthe performance of the performance of the provided algorithm are offered. Experimental results show that theprovided algorithm is better than Napriori and IntvMatrix.