Abstract:
A new approach is presented for blog post summarization based on ranking smooth support vector machine (Rank-sSVM). The use of ranking algorithm for this task allows one to adapt summaries to the commenter needs and to the blog corpus characteristics. To use Rank-sSVM, firstly, key sentences are extracted manually from blog posts as training samples. Feature set representing post sentences, which consist of 14 features including tag, comment and other unique blog information, is generated by machine. After all the sentences are ranked by the ranking model, the most important ones in front are selected to summarize the post. The experimental results show that the proposed method has good performance on Chinese blog datasets.