便于快速信息融合的主题检测算法
Chinese Topic Detection Algorithm for Fast Information Aggregation
-
摘要: 物联网要求对海量信息源里的不同主题,自动地高性能地进行检测和融合。目前大多数公开报道的中文主题检测算法时间复杂度是非线性的,在海量多信息源的信息融合方面缺乏可行性。该文采用高效能的一元语法模型结合全文检索的方法降低主题间的比较次数,理论上将算法效率提升到线性。通过新华社实际数据的实验证实,算法的时间复杂度确实为线性的。另算法应用于两项云计算的实际产品中,也验证了算法适用于物联网环境下的高速信息融合。Abstract: The most salient features of Internet of Things (IoT) are its ubiquitous large scale information gathering and intelligent processing to meet everyone's needs. Current solutions in Chinese topic detection and clustering have high time complexity such as O(n2) or O(n3). This paper presents an efficient and patented algorithm for defining topic detection and information clustering over the Internet of Things by combining an improved unigram language model and full text retrieval technique to reduce the time complexity. The experiments and real world applications show that the new method possesses much lesser time complexity.