Abstract:
Existing unknown words recognition methods mainly focus on unknown words with some specific structure, such as names, places and organizations. However, with the booming of e-commerce and social networking, more and more unknown entity words with uncertain structures appear in specific areas. In order to handle this problem, this paper presents two algorithms of unknown words recognition based on context-sensitive method. We first calculate correlations between any two words in sequence to get support of any potential combination, then filter out wrong combinations by filtering module, and achieve the recognition aiming at the non-deterministic structure of unknown words. Experiment results indicate that two algorithms can achieve a high accuracy. Besides, they can adapt to different application scenarios by adjusting the parameters.