User Reading Preference and Community Detection in an Online Reading Community

XU Yi-tie; LIU Hong-li; HU Hai-bo

doi:10.3969/j.issn.1001-0548.2019.06.020

Volume 48 Issue 6

Nov. 2019

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2019 > 48(6): 939-946

XU Yi-tie, LIU Hong-li, HU Hai-bo. User Reading Preference and Community Detection in an Online Reading Community[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(6): 939-946. doi: 10.3969/j.issn.1001-0548.2019.06.020

Citation:

XU Yi-tie, LIU Hong-li, HU Hai-bo. User Reading Preference and Community Detection in an Online Reading Community[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(6): 939-946. doi: 10.3969/j.issn.1001-0548.2019.06.020

User Reading Preference and Community Detection in an Online Reading Community

doi: 10.3969/j.issn.1001-0548.2019.06.020

School of Business, East China University of Science and Technology Xuhui Shanghai 200237

Received Date: 2018-09-21
Rev Recd Date: 2019-03-15
Publish Date: 2019-11-30

Abstract

To reveal the subject preference of reading and the diversity of reading interest of users in online reading communities, this paper crawls the data of Douban reading, uses the common reading relationship to construct the book networks, and combines the complex network theory and machine learning methods to study the book networks. We find that in the book networks, the two-way weights between disciplines are nearly equal. Users who read philosophy, political science, and other humanities and social sciences have the most extensive interest in reading, while users who read engineering technology disciplines such as mining engineering and nuclear science and technology have the narrowest interest in reading. The network constructed by the secondary disciplines has three distinct communities, corresponding to the three major areas of humanities and social sciences, engineering technology, and basic sciences. For the suitability of interdisciplinary research, the basic sciences are the highest, the humanities and social sciences are the second, and the engineering technology is the lowest. Research results are of great significance to interdisciplinary cross recommendation of books.
- book network,
- community detection,
- discipline preference,
- online reading community,
- text classification,
- user behavior

References

[1]	胡海波, 王科, 徐玲, 等.基于复杂网络理论的在线社会网络分析[J].复杂系统与复杂性科学, 2008, 5(2):1-14. doi: 10.3969/j.issn.1672-3813.2008.02.001 HU Hai-bo, WANG Ke, XU Ling, et al. Analysis of online social networks based on complex network theory[J]. Complex Systems and Complexity Science, 2008, 5(2):1-14. doi: 10.3969/j.issn.1672-3813.2008.02.001
[2]	李栋, 徐志明, 李生, 等.在线社会网络中信息扩散[J].计算机学报, 2014, 37(1):189-206. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201401014 LI Dong, XU Zhi-ming, LI Sheng, et al. A survey on information diffusion in online social networks[J]. Chinese Journal of Computers, 2014, 37(1):189-206. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201401014
[3]	ZHANG Z K, LIU C, ZHAN X X, et al. Dynamics of information diffusion and its applications on complex networks[J]. Physics Reports, 2016, 651:1-34. doi: 10.1016/j.physrep.2016.07.002
[4]	罗春海, 刘红丽, 胡海波.微博网络中用户主题兴趣相关性及主题信息扩散研究[J].电子科技大学学报, 2017, 46(2):458-468. doi: 10.3969/j.issn.1001-0548.2017.02.022 LUO Chun-hai, LIU Hong-li, HU Hai-bo. Research on correlation of users' topic interests and topic information diffusion in microblog networks[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2):458-468. doi: 10.3969/j.issn.1001-0548.2017.02.022
[5]	陆豪放, 张千明, 周莹, 等.微博中的信息传播:媒体效应与社交影响[J].电子科技大学学报, 2014, 43(2):167-173. doi: 10.3969/j.issn.1001-0548.2014.02.002 LU Hao-fang, ZHANG Qian-ming, ZHOU Ying, et al. Information spreading in microblogging systems:Media effect versus social impact[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(2):167-173. doi: 10.3969/j.issn.1001-0548.2014.02.002
[6]	许小可, 胡海波, 张伦, 等.社交网络上的计算传播学[M].北京:高等教育出版社, 2015. XU Xiao-ke, HU Hai-bo, ZHANG Lun, et al. Computational communication on social networks[M]. Beijing:Higher Education Press, 2015.
[7]	汪小帆, 李翔, 陈关荣.网络科学导论[M].北京:高等教育出版社, 2012. WANG Xiao-fan, LI Xiang, CHEN Guan-rong. Introduction to network science[M]. Beijing:Higher Education Press, 2012.
[8]	BARABÁSI A L. Network science[M]. Cambridge, UK:Cambridge University Press, 2016.
[9]	李楠楠, 张宁.图书馆借阅网的二分图研究[J].复杂系统与复杂性科学, 2009, 6(2):33-39. doi: 10.3969/j.issn.1672-3813.2009.02.005 LI Nan-nan, ZHANG Ning. The study of the bipartite graph about the library lending network[J]. Complex Systems and Complexity Science, 2009, 6(2):33-39. doi: 10.3969/j.issn.1672-3813.2009.02.005
[10]	燕飞, 张铭, 孙韬, 等.基于网络特征的用户图书借阅行为分析——以北京大学图书馆为例[J].情报学报, 2011, 30(8):875-882. doi: 10.3772/j.issn.1000-0135.2011.08.013 YAN Fei, ZHANG Ming, SUN Tao, et al. Network based users' book-loan behavior analysis:A case study of Peking university library[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(8):875-882. doi: 10.3772/j.issn.1000-0135.2011.08.013
[11]	张柯, 赵金龙, 胡小丽.基于复杂网络理论的高校图书馆借阅网络研究[J].大学图书情报学刊, 2014, 32(1):75-77. doi: 10.3969/j.issn.1006-1525.2014.01.017 ZHANG Ke, ZHAO Jin-long, HU Xiao-li. Research on book-borrowing network of university library based on the complex network theory[J]. Journal of Academic Library and Information Science, 2014, 32(1):75-77. doi: 10.3969/j.issn.1006-1525.2014.01.017
[12]	陈晓威, 孙建军.基于图书借阅网络的各类书籍关系研究[J].图书情报工作, 2017, 61(11):21-28. http://d.old.wanfangdata.com.cn/Periodical/tsqbgz201711003 CHEN Xiao-wei, SUN Jian-jun. The relationships among books based on the book-borrowing network[J]. Library and Information Service, 2017, 61(11):21-28. http://d.old.wanfangdata.com.cn/Periodical/tsqbgz201711003
[13]	王福生, 杨洪勇.图书管理系统中的借阅行为分析[J].复杂系统与复杂性科学, 2012, 9(1):55-58. doi: 10.3969/j.issn.1672-3813.2012.01.009 WANG Fu-sheng, YANG Hong-yong. Books-borrowing behavior in library management system[J]. Complex Systems and Complexity Science, 2012, 9(1):55-58. doi: 10.3969/j.issn.1672-3813.2012.01.009
[14]	BARABÁSI A L. The origin of bursts and heavy tails in human dynamics[J]. Nature, 2005, 435:207-211. doi: 10.1038/nature03459
[15]	MICHEL J B, SHEN Y K, AIDEN A P, et al. Quantitative analysis of culture using millions of digitized books[J]. Science, 2011, 331(6014):176-182. doi: 10.1126/science.1199644
[16]	SHI F, SHI Y, DOKSHIN F A, et al. Millions of online book co-purchases reveal partisan differences in the consumption of science[J]. Nature Human Behaviour, 2017, 1(4):0079. doi: 10.1038/s41562-017-0079
[17]	HU H B, WANG X F. Unified index to quantifying heterogeneity of complex networks[J]. Physica A, 2008, 387(14):3769-3780. doi: 10.1016/j.physa.2008.01.113
[18]	CLAUSET A, SHALIZI C R, NEWMAN M E J. Power-law distributions in empirical data[J]. SIAM Review, 2009, 51:661-703. doi: 10.1137/070710111
[19]	SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=2966e9ecc77ffc44adc76c0baa5b10bc
[20]	MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to information retrieval[M]. New York, USA:Cambridge University Press, 2008.
[21]	CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20(3):273-297. http://d.old.wanfangdata.com.cn/Periodical/hwyhmb200803006
[22]	JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09)[2017-07-12]. https://arxiv.org/abs/1607.01759.
[23]	HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999: 50-57.
[24]	HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2):177-196.
[25]	BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022. http://d.old.wanfangdata.com.cn/Periodical/jsjyy201306024
[26]	COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12:2493-2537. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1103.0398
[27]	LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521:436-444. doi: 10.1038/nature14539
[28]	GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge, MA:The MIT Press, 2016.
[29]	KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1746-1751.
[30]	LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: IJCAI, 2016: 2873-2879.
[31]	LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 2267-2273.
[32]	YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2016: 1480-1489.
[33]	WENG L, MENCZER F. Topicality and impact in social media:Diverse messages, focused messengers[J]. PLoS ONE, 2015, 10(2):e0118410. doi: 10.1371/journal.pone.0118410
[34]	FORTUNATO S. Community detection in graphs[J]. Physics Reports, 2010, 486(3-5):75-174. doi: 10.1016/j.physrep.2009.11.002
[35]	JAVED M A, YOUNIS M S, LATIF S, et al. Community detection in networks:A multidisciplinary review[J]. Journal of Network and Computer Applications, 2018, 108:87-111. doi: 10.1016/j.jnca.2018.02.011
[36]	GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12):7821-7826. doi: 10.1073/pnas.122653799
[37]	CLAUSET A, NEWMAN M E J, MOORE C. Finding community structure in very large networks[J]. Phys Rev E, 2004, 70:066111. doi: 10.1103/PhysRevE.70.066111
[38]	BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics, 2008(10):P10008. doi: 10.1088-1742-5468-2008-10-P10008/
[39]	PONS P, LATAPY M. Computing communities in large networks using random walks[EB/OL]. (2005-12-12)[2017-07-23]. https://arxiv.org/abs/physics/0512106.
[40]	ROSVALL M, BERGSTROM C T. Maps of random walks on complex networks reveal community structure[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4):1118-1123. doi: 10.1073/pnas.0706851105

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views(4565) PDF downloads(84) Cited by()

Proportional views

HTML

随着互联网尤其是移动互联网的发展，以社会协作技术为特征的各种社交媒体网站大量涌现^[1]。它们一类以在线交友为目的，如Facebook、人人网等^[1]，一类以信息发布与传播为目的^[2-5]，如Twitter、微博、微信等，还有一类以内容分享为目的，如优酷、豆瓣网等。它们以其潜在的研究价值吸引了来自不同学科学者们的关注^{[1-3, 6]}。

阅读是人们获取信息的重要途径。作为从符号中获得意义的一种实践活动，阅读随着语言载体及媒介的变化而不断变化。以Web 2.0为特征的在线阅读社区，如豆瓣读书、LibraryThing、Goodreads等，充分利用大众参与，使用户能够在平台上留下大量阅读记录和评论信息，利用这些数据，学者们可以对用户的阅读偏好及用户之间、图书之间的相关性做分析。

早期对大众阅读行为的研究往往基于读者在图书馆的借阅记录，利用复杂网络理论^[7-8]分析根据阅读行为构建的网络的统计特征。如文献[9]用图书及其借阅者构成的二分图对图书馆的图书借阅网进行了研究。文献[10]等对大学图书馆的图书借阅记录进行了分析，研究了两类网络，一是图书借阅形成的用户到图书的“图书借阅网络”，即二分图，二是“共同借阅网络”，即相同的图书被不同的读者所借阅，从而形成的读者间的知识分享社会网络。文献[11]基于研究生读者的借阅信息构建了读者间的共同借阅网络，研究了该网络的统计特性，并对借阅网络的社团结构进行了分析。文献[12]利用高校图书馆的学生借阅数据，构建了图书共现网络，该网络为无向加权图，节点表示书籍，连边表示共同借阅关系，利用复杂网络理论揭示了书籍间的内在联系。另有学者从其他角度对读者借阅行为进行了研究，如文献[13]利用图书管理系统中借阅图书的间隔时间数据，基于人类动力学理论^[14]分析了借阅行为，发现在群体层间隔时间分布可用幂律近似刻画。

在数字化时代，学者研究书籍的方式发生了根本性的变化，文献[15]对数百万册的数字化图书所包含的人类文化进行了定量分析，研究了文化趋势或大众关注的热点随时间的变化。文献[16]利用美国亚马逊购书数据分析了用户的政治倾向，发现共同购买关系中90%以上的图书具有相同的政治倾向，自由派政治书籍的读者更偏好基础科学，如物理学、天文学和动物学，而保守派读者则更偏好应用和商业主题，如犯罪学、医学和地球物理学。

虽然关于图书阅读的实证研究已得到学者们的广泛关注，但现有研究往往未能从学科的角度研究读者或用户的阅读行为。图书自身内容决定了所隶属的学科，从学科类别来划分图书从而形成不同的社团能够揭示用户的跨学科阅读特征。此外，由于大量电子图书的出现以及社会化阅读的兴起，用户能够及时依照个人兴趣进行图书选阅，通常，用户不会仅局限于阅读某一学科的图书，但目前涉及跨学科阅读的研究仍然很少。

本文利用中国最活跃的读书社区——豆瓣网“豆瓣读书”中的数据建立刻画图书间用户共同阅读关系的有向无权网络，通过ISBN(国际标准书号)进行跨库查询并结合文本分类方法对网络中每一本图书进行学科类别标注，利用复杂网络理论分析网络特征，从学科角度研究用户阅读学科偏好及其多样性，并利用社团发现算法进一步分析图书网络中图书所隶属学科之间的相关性。

2. 图书网络及图书隶属学科分类

2.1. 图书网络结构特征

设A为一图书，B为喜欢A的用户喜欢的其他图书中的一本，则从A到B存在一有向边，通过“喜欢该本书的人也喜欢”可构建有向图书网络。豆瓣图书网络包含150 513个节点，1 438 390条边。该网络的平均路径长度L=10.28，其值跟网络规模的对数在同一数量级，聚类系数C=0.25，远大于同等规模随机图的聚类系数，可见该图书网络具有小世界特性。

2.2. 图书隶属学科分类

豆瓣网中的图书区别于线下图书馆记录的图书，缺乏中图分类号，因此无法通过其对图书进行学科分类，可根据图书的ISBN从中国高等教育文献保障系统(CALIS，http://opac.calis.edu.cn/opac/simpleSearch.do)获取其学科分类。部分图书冷门或太新，图书的ISBN缺失，此外有些图书在CALIS没有记录，这些图书也无法通过ISBN对其进行学科分类。对这类图书，采用机器学习方法对其进行隶属学科的归类。

2.2.1. 基于CALIS的图书分类

对拥有ISBN的图书，向CALIS数据库递交其ISBN，便可获得图书的隶属学科分类。具体流程为从已爬取的图书数据中获取所有图书的ISBN并形成队列，从队列头部取出一个ISBN利用爬虫程序向CALIS数据库自动递交该号并进行页面跳转，跳转后的网页会出现图书隶属的学科分类，并可获得分类相对应的编号。在CALIS的分类体系中，图书可归为12个一级学科，85个二级学科，具体的分类见附录(https://doi.org/10.6084/m9.figshare.6728984)。

2.2.2. 基于机器学习的图书分类

通过CALIS数据库对图书隶属学科进行标注形成具有标签的数据集，再利用此数据集结合机器学习中有监督的多类别分类算法，对缺失ISBN的图书进行学科分类。考虑到二级学科含有85个类别，数量较多无法保障分类结果的准确性，故本次机器学习的分类基于一级学科，即12大类。

在文本分类过程中，特征构建及分类器的选择可对结果准确性产生重要影响。本文首先考虑利用TF-IDF对文本进行特征构建^[19-20]，之后用支持向量机(SVM)分类器^[21]对文本进行隶属学科分类，考虑到SVM是目前最常用、效果最好的分类器之一，且可以很好地解决本文样本较小情况下的机器学习问题，故第一个模型选用TF-IDF+SVM组合。

此外，图书还存在学科类别不均衡及多语种问题。在获取的图书中，有部分为英语原版及其他不同语言的图书，为了防止出现外文图书数量较少从而导致文本训练集数据量不足的问题，本文在TF-IDF+SVM的基础上，引进基于词向量的FastText文本分类算法。FastText为Facebook在2016年开源的文本分类项目^[22]，可利用类别不均衡分布的优势来加速运算过程，其不仅较好解决了图书学科类别不均衡问题，还支持多语言表达。故本文测试了3个模型，模型1为中文样本集+TF-IDF+SVM，2为中文样本集+FastText，3为全样本集+FastText。

值得注意的是，文本分类也经常利用主题模型，如概率潜在语义分析(PLSA)^[23-24]和隐狄利克雷分布(LDA)^[25]，但它们均为无监督算法，不适用于本文有监督的图书分类。此外，文本分类作为自然语言处理领域^[26]的重要问题，目前亦有相应的深度学习^[27-28]算法，如文本卷积神经网络(TextCNN)^[29]、文本循环神经网络(TextRNN)^[30]、文本循环卷积神经网络(TextRCNN)^[31]、分层注意网络(HAN)^[32]等，但这些算法一般应用于大规模文本或图像分类以及语音识别等问题，文本分类往往并不需要太深的网络结构，且本文用于分类的文本量较少，故无需应用深度学习。实际上，本文利用的FastText算法，也是一种浅层神经网络方法，虽非深度学习，但可快速进行文本分类，适用于本文的应用场景。

考虑到分类算法的运算效率和分类结果的准确性，本文随机选取已标注好学科类别的10万本图书作为全样本集，按照上述方法流程进行分类任务，表 2为各个图书类别的精确率P、召回率R和F-measure值。

学科	模型1			模型2			模型3
学科	Precision	Recall	F-measure	Precision	Recall	F-measure	Precision	Recall	F-measure
01	0.761 8	0.745 5	0.753 562	0.722 9	0.746 0	0.734 3	0.745 0	0.760 2	0.752 5
02	0.801 8	0.686 1	0.739 452	0.660 8	0.779 1	0.715 1	0.642 4	0.741 0	0.688 2
03	0.673 0	0.616 1	0.643 294	0.551 8	0.671 4	0.605 8	0.570 5	0.677 4	0.619 3
04	0.708 2	0.637 8	0.671 159	0.571 4	0.711 1	0.633 7	0.577 9	0.690 5	0.629 2
05	0.849 6	0.910 6	0.879 043	0.910 3	0.837 6	0.872 4	0.894 3	0.814 4	0.852 5
06	0.752 2	0.753 8	0.752 999	0.726 5	0.705 8	0.716 0	0.702 4	0.685 6	0.693 9
07	0.787 4	0.767 0	0.777 066	0.708 7	0.768 4	0.737 4	0.730 4	0.758 2	0.744 0
08	0.836 8	0.823 4	0.830 046	0.790 2	0.851 8	0.819 8	0.778 1	0.842 0	0.808 8
09	0.750 0	0.166 7	0.272 772	0.666 7	0.400 0	0.500 0	0.733 3	0.392 9	0.511 6
10	0.755 6	0.531 2	0.623 834	0.796 9	0.622 0	0.698 6	0.582 1	0.639 3	0.609 4
11	0.727 3	0.510 6	0.599 983	0.702 1	0.452 1	0.550 0	0.666 7	0.500 0	0.571 4
12	0.708 5	0.771 2	0.738 522	0.756 5	0.745 5	0.750 9	0.687 7	0.743 0	0.714 3

从表 2可见除了学科09的F-measure较低，分类结果不够准确外，3种模型对于大部分学科的分类结果都比较准确。本文又计算了3种模型的准确率(accuracy)，作为最终选定模型的评价指标，3种模型的准确率分别为0.791 8，0.770 1和0.758 9。

对比模型1和2，可见对于中文图书的分类任务，模型1优于2，考虑到模型3在全样本数据集中的适用性，本文最终的图书学科分类流程为从数据库中筛选出未标注学科类别的图书，通过ISBN查询到该图书简介，得到类别标签+图书简介的样本集并以此作为最终的分类文本。在每次分类前，先判断图书语言，根据语言的不同选择不同的分类模型，若为中文图书采用TF-IDF+SVM，非中文图书则采用FastText算法，以此保证分类的准确率。结合CALIS和机器学习分类方法，本文最终为150 513本图书标注了类别。

5. 结束语

本文基于豆瓣阅读的数据，利用用户共同阅读关系构建图书网络，结合复杂网络理论和机器学习对图书网络进行分析，揭示了用户阅读的学科偏好、跨学科阅读的多样性以及学科之间的联系强度。研究发现，图书被用户关注的程度存在异质性，阅读哲学、政治学、艺术学等人文社科的用户跨学科阅读最为广泛，而阅读矿业工程、核科学与技术、军制学等工程科技学科的用户跨学科阅读最窄。二级学科网络具有3个明显的社团，对应于人文社科、工程科技和基础科学三大领域，基础科学最适合做跨学科研究，人文社科次之，工程科技再之。

在前互联网时代，大众的阅读记录往往难以保存，近年来以互联网为基础的新媒体的出现极大的改变了大众的阅读方式，可搜集的数据也不再局限于图书馆的借阅记录。作为中国最活跃的读书社区，豆瓣阅读记录了大量用户的阅读信息，进而可以对用户的阅读行为及偏好进行深入研究，本文在这方面做了有益的探索，但仍存在不足，如数据量较少、数据采样可能有偏及分析层面较为宏观等。将来的工作希望结合用户自身属性及阅读行为的时间序列信息对跨学科阅读进行进一步研究，如跨学科阅读的稳定性及影响用户阅读偏好的因素等。

Reference (40)

[1]	胡海波, 王科, 徐玲. 基于复杂网络理论的在线社会网络分析[J]. 复杂系统与复杂性科学, 2008, 5(2): 1-14. doi: 10.3969/j.issn.1672-3813.2008.02.001	HU Hai-bo, WANG Ke, XU Ling. Analysis of online social networks based on complex network theory[J]. Complex Systems and Complexity Science, 2008, 5(2): 1-14. doi: 10.3969/j.issn.1672-3813.2008.02.001
[2]	李栋, 徐志明, 李生. 在线社会网络中信息扩散[J]. 计算机学报, 2014, 37(1): 189-206.	LI Dong, XU Zhi-ming, LI Sheng. A survey on information diffusion in online social networks[J]. Chinese Journal of Computers, 2014, 37(1): 189-206.
[3]	ZHANG Z K, LIU C, ZHAN X X. Dynamics of information diffusion and its applications on complex networks[J]. Physics Reports, 2016, 651(): 1-34. doi: 10.1016/j.physrep.2016.07.002
[4]	罗春海, 刘红丽, 胡海波. 微博网络中用户主题兴趣相关性及主题信息扩散研究[J]. 电子科技大学学报, 2017, 46(2): 458-468. doi: 10.3969/j.issn.1001-0548.2017.02.022	LUO Chun-hai, LIU Hong-li, HU Hai-bo. Research on correlation of users' topic interests and topic information diffusion in microblog networks[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2): 458-468. doi: 10.3969/j.issn.1001-0548.2017.02.022
[5]	陆豪放, 张千明, 周莹. 微博中的信息传播:媒体效应与社交影响[J]. 电子科技大学学报, 2014, 43(2): 167-173. doi: 10.3969/j.issn.1001-0548.2014.02.002	LU Hao-fang, ZHANG Qian-ming, ZHOU Ying. Information spreading in microblogging systems:Media effect versus social impact[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(2): 167-173. doi: 10.3969/j.issn.1001-0548.2014.02.002
[6]	许小可, 胡海波, 张伦, 等.社交网络上的计算传播学[M].北京:高等教育出版社, 2015.	XU Xiao-ke, HU Hai-bo, ZHANG Lun, et al. Computational communication on social networks[M]. Beijing:Higher Education Press, 2015.
[7]	汪小帆, 李翔, 陈关荣.网络科学导论[M].北京:高等教育出版社, 2012.	WANG Xiao-fan, LI Xiang, CHEN Guan-rong. Introduction to network science[M]. Beijing:Higher Education Press, 2012.
[8]	BARABÁSI A L. Network science[M]. Cambridge, UK:Cambridge University Press, 2016.
[9]	李楠楠, 张宁. 图书馆借阅网的二分图研究[J]. 复杂系统与复杂性科学, 2009, 6(2): 33-39. doi: 10.3969/j.issn.1672-3813.2009.02.005	LI Nan-nan, ZHANG Ning. The study of the bipartite graph about the library lending network[J]. Complex Systems and Complexity Science, 2009, 6(2): 33-39. doi: 10.3969/j.issn.1672-3813.2009.02.005
[10]	燕飞, 张铭, 孙韬. 基于网络特征的用户图书借阅行为分析——以北京大学图书馆为例[J]. 情报学报, 2011, 30(8): 875-882. doi: 10.3772/j.issn.1000-0135.2011.08.013	YAN Fei, ZHANG Ming, SUN Tao. Network based users' book-loan behavior analysis:A case study of Peking university library[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(8): 875-882. doi: 10.3772/j.issn.1000-0135.2011.08.013
[11]	张柯, 赵金龙, 胡小丽. 基于复杂网络理论的高校图书馆借阅网络研究[J]. 大学图书情报学刊, 2014, 32(1): 75-77. doi: 10.3969/j.issn.1006-1525.2014.01.017	ZHANG Ke, ZHAO Jin-long, HU Xiao-li. Research on book-borrowing network of university library based on the complex network theory[J]. Journal of Academic Library and Information Science, 2014, 32(1): 75-77. doi: 10.3969/j.issn.1006-1525.2014.01.017
[12]	陈晓威, 孙建军. 基于图书借阅网络的各类书籍关系研究[J]. 图书情报工作, 2017, 61(11): 21-28.	CHEN Xiao-wei, SUN Jian-jun. The relationships among books based on the book-borrowing network[J]. Library and Information Service, 2017, 61(11): 21-28.
[13]	王福生, 杨洪勇. 图书管理系统中的借阅行为分析[J]. 复杂系统与复杂性科学, 2012, 9(1): 55-58. doi: 10.3969/j.issn.1672-3813.2012.01.009	WANG Fu-sheng, YANG Hong-yong. Books-borrowing behavior in library management system[J]. Complex Systems and Complexity Science, 2012, 9(1): 55-58. doi: 10.3969/j.issn.1672-3813.2012.01.009
[14]	BARABÁSI A L. The origin of bursts and heavy tails in human dynamics[J]. Nature, 2005, 435(): 207-211. doi: 10.1038/nature03459
[15]	MICHEL J B, SHEN Y K, AIDEN A P. Quantitative analysis of culture using millions of digitized books[J]. Science, 2011, 331(6014): 176-182. doi: 10.1126/science.1199644
[16]	SHI F, SHI Y, DOKSHIN F A. Millions of online book co-purchases reveal partisan differences in the consumption of science[J]. Nature Human Behaviour, 2017, 1(4): 0079-. doi: 10.1038/s41562-017-0079
[17]	HU H B, WANG X F. Unified index to quantifying heterogeneity of complex networks[J]. Physica A, 2008, 387(14): 3769-3780. doi: 10.1016/j.physa.2008.01.113
[18]	CLAUSET A, SHALIZI C R, NEWMAN M E J. Power-law distributions in empirical data[J]. SIAM Review, 2009, 51(): 661-703. doi: 10.1137/070710111
[19]	SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
[20]	MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to information retrieval[M]. New York, USA:Cambridge University Press, 2008.
[21]	CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297.
[22]	JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09)[2017-07-12]. https://arxiv.org/abs/1607.01759.
[23]	HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999: 50-57.
[24]	HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2): 177-196.
[25]	BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(): 993-1022.
[26]	COLLOBERT R, WESTON J, BOTTOU L. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(): 2493-2537.
[27]	LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(): 436-444. doi: 10.1038/nature14539
[28]	GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge, MA:The MIT Press, 2016.
[29]	KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1746-1751.
[30]	LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: IJCAI, 2016: 2873-2879.
[31]	LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 2267-2273.
[32]	YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2016: 1480-1489.
[33]	WENG L, MENCZER F. Topicality and impact in social media:Diverse messages, focused messengers[J]. PLoS ONE, 2015, 10(2): e0118410-. doi: 10.1371/journal.pone.0118410
[34]	FORTUNATO S. Community detection in graphs[J]. Physics Reports, 2010, 486(3-5): 75-174. doi: 10.1016/j.physrep.2009.11.002
[35]	JAVED M A, YOUNIS M S, LATIF S. Community detection in networks:A multidisciplinary review[J]. Journal of Network and Computer Applications, 2018, 108(): 87-111. doi: 10.1016/j.jnca.2018.02.011
[36]	GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12): 7821-7826. doi: 10.1073/pnas.122653799
[37]	CLAUSET A, NEWMAN M E J, MOORE C. Finding community structure in very large networks[J]. Phys Rev E, 2004, 70(): 066111-. doi: 10.1103/PhysRevE.70.066111
[38]	BLONDEL V D, GUILLAUME J L, LAMBIOTTE R. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics, 2008, (10): P10008-.
[39]	PONS P, LATAPY M. Computing communities in large networks using random walks[EB/OL]. (2005-12-12)[2017-07-23]. https://arxiv.org/abs/physics/0512106.
[40]	ROSVALL M, BERGSTROM C T. Maps of random walks on complex networks reveal community structure[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4): 1118-1123. doi: 10.1073/pnas.0706851105

分布		评分人数	在读人数	已读人数	想读人数
幂律	${\hat n_{\min }}$	518	118	507	8 677
幂律	$\hat \alpha $	1.98	2.12	2.15	2.78
对数正态	${\hat \mu _{\log }}$	-0.82	-1.72	-9.76	6.11
对数正态	${\hat \sigma _{\log }}$	3.03	2.70	3.92	1.50

学科	信息熵
哲学	1.555 6
政治学	1.153 1
艺术学	1.081 6
法学	0.915 0
心理学	0.748 4
历史学	0.635 7
工商管理	0.631 3
理论经济学	0.595 7
应用经济学	0.580 8
计算机科学与技术	0.566 5
矿业工程	0.000 6
核科学与技术	0.000 6
军制学	0.000 6
石油与天然气工程	0.000 6
兽医学	0.000 8
口腔医学	0.001 1
农业资源利用	0.001 5
战术学	0.001 5
测绘科学与技术	0.001 8
农业工程	0.002 2

社团编号	图书数量	图书隶属学科编号
1	676	0503, 1205, 0504, 0501, 0502
2	605	0502, 0503, 0504, 0501, 1205
3	503	0502, 0503, 0504, 0822, 0501, 0101
4	487	0502, 0503, 0504, 0501, 0812
5	427	0502, 0503, 0504, 0501, 0812
6	329	0501, 0502, 0503, 0504
7	310	1205, 0501, 0502, 0504
8	289	0501, 0502, 0503, 0504
9	278	0501, 0502, 0504
10	268	0501, 0502, 0503, 0504
11	265	0501, 0502, 0503, 0504
12	255	0501, 0502, 0504
13	250	0501, 0502, 0504
14	249	0501, 0502, 0503, 0504
15	246	0502, 0503, 0504, 0601, 0501
16	242	0501, 0502, 0503, 0504
17	240	0501, 0502, 0503, 0504
18	236	0501, 0502, 0504
19	235	0502, 0503, 0504, 0501, 0101
20	234	0502, 0503, 0504, 0501, 1205

社团	网络直径	聚类系数	平均路径长度	学科编号	社团特征
蓝色社团(社团A)	3	0.36	1.27	0101，0201，0302，0303，0501，0502，0601，0705，1108，1201	哲学，经济学，政治学，社会学，中外文学，历史学，地理学，军事学，管理科学
绿色社团(社团B)	4	0.34	1.52	0403，0707，0710，0712，0806，0807，0813，0814，0815，0818，0829，0830，0901，0902，0905，0908，1001，1002，1004，1005，1007	体育学，海洋科学，生物学，系统科学，冶金工程，动力工程，建筑学，土木工程，水利工程，环境科学，作物园艺，医学
红色社团(社团C)	2	0.44	1.15	0701，0702，0703，0704，0708，0709，0801，0802，0808，0809，0811，0812，0816，0825，1205	数学，物理学，化学，天文学，地质学，力学，机械工程，电气工程，计算机科学与技术，航空宇航科学与技术，图书馆情报学

User Reading Preference and Community Detection in an Online Reading Community

doi: 10.3969/j.issn.1001-0548.2019.06.020