由排序支持向量机抽取博客文章的摘要

Extraction of Blog Post Summarization by Using Ranking SVM

  • 摘要: 提出了一种用平滑型排序支持向量机(Rank-sSVM)抽取博客文章摘要的方法。使用该排序算法抽取的摘要,反映了评论者的意见和博客文集的特性。自动摘要过程中,首先经人工从文章选择重要句子标记为摘要,作为训练对象;再由机器生成表示文章语句的特征集,共14个特征,包含标签、评论等博客文章独有的信息;最后用Rank-sSVM学习人工摘要后,将文章所有句子排序,选取最靠前的若干语句构成摘要。该方法在一个中文博客数据集上取得良好效果。

     

    Abstract: A new approach is presented for blog post summarization based on ranking smooth support vector machine (Rank-sSVM). The use of ranking algorithm for this task allows one to adapt summaries to the commenter needs and to the blog corpus characteristics. To use Rank-sSVM, firstly, key sentences are extracted manually from blog posts as training samples. Feature set representing post sentences, which consist of 14 features including tag, comment and other unique blog information, is generated by machine. After all the sentences are ranked by the ranking model, the most important ones in front are selected to summarize the post. The experimental results show that the proposed method has good performance on Chinese blog datasets.

     

/

返回文章
返回