在线读书社区中的用户阅读偏好及社团发现

许益贴, 刘红丽, 胡海波

许益贴, 刘红丽, 胡海波. 在线读书社区中的用户阅读偏好及社团发现[J]. 电子科技大学学报, 2019, 48(6): 939-946. DOI: 10.3969/j.issn.1001-0548.2019.06.020
引用本文: 许益贴, 刘红丽, 胡海波. 在线读书社区中的用户阅读偏好及社团发现[J]. 电子科技大学学报, 2019, 48(6): 939-946. DOI: 10.3969/j.issn.1001-0548.2019.06.020
XU Yi-tie, LIU Hong-li, HU Hai-bo. User Reading Preference and Community Detection in an Online Reading Community[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(6): 939-946. DOI: 10.3969/j.issn.1001-0548.2019.06.020
Citation: XU Yi-tie, LIU Hong-li, HU Hai-bo. User Reading Preference and Community Detection in an Online Reading Community[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(6): 939-946. DOI: 10.3969/j.issn.1001-0548.2019.06.020

在线读书社区中的用户阅读偏好及社团发现

基金项目: 

国家自然科学基金 61473119

国家自然科学基金 61973121

中央高校基本科研业务费 222201718006

详细信息
    作者简介:

    许益贴(1993-), 男, 主要从事社交媒体方面的研究

    通讯作者:

    胡海波, E-mail:sdhuzi@163.com

  • 中图分类号: TP393;N949

User Reading Preference and Community Detection in an Online Reading Community

  • 摘要: 为了揭示在线读书社区中用户阅读的学科偏好及阅读兴趣的多样性,抓取了豆瓣读书社区的数据,利用用户共同阅读关系构建图书网络,结合复杂网络理论和机器学习方法对网络进行了研究。发现图书网络中,学科之间的双向权重近乎相等;阅读哲学、政治学等人文社科的用户跨学科阅读最为广泛,而阅读矿业工程、核科学与技术等工程科技学科的用户跨学科阅读最窄;二级学科网络具有3个明显的社团,对应人文社科、工程科技和基础科学三大领域,跨学科研究的适合度由高到低依次为基础科学、人文社科和工程科技。研究结果对于图书跨学科交叉推荐具有重要意义。
    Abstract: To reveal the subject preference of reading and the diversity of reading interest of users in online reading communities, this paper crawls the data of Douban reading, uses the common reading relationship to construct the book networks, and combines the complex network theory and machine learning methods to study the book networks. We find that in the book networks, the two-way weights between disciplines are nearly equal. Users who read philosophy, political science, and other humanities and social sciences have the most extensive interest in reading, while users who read engineering technology disciplines such as mining engineering and nuclear science and technology have the narrowest interest in reading. The network constructed by the secondary disciplines has three distinct communities, corresponding to the three major areas of humanities and social sciences, engineering technology, and basic sciences. For the suitability of interdisciplinary research, the basic sciences are the highest, the humanities and social sciences are the second, and the engineering technology is the lowest. Research results are of great significance to interdisciplinary cross recommendation of books.
  • 图  1   图书被关注程度的互补累积分布

    图  2   学科交互弦图

    图  3   学科之间权重的相关性,直线为对角线w2 = w1

    图  4   一级学科间的偏好关系

    图  5   二级学科网络社团划分

    表  1   幂律及对数正态分布的拟合参数

    分布 评分人数 在读人数 已读人数 想读人数
    幂律 ${\hat n_{\min }}$ 518 118 507 8 677
    $\hat \alpha $ 1.98 2.12 2.15 2.78
    对数正态 ${\hat \mu _{\log }}$ -0.82 -1.72 -9.76 6.11
    ${\hat \sigma _{\log }}$ 3.03 2.70 3.92 1.50
    下载: 导出CSV

    表  2   文本分类结果

    学科 模型1 模型2 模型3
    Precision Recall F-measure Precision Recall F-measure Precision Recall F-measure
    01 0.761 8 0.745 5 0.753 562 0.722 9 0.746 0 0.734 3 0.745 0 0.760 2 0.752 5
    02 0.801 8 0.686 1 0.739 452 0.660 8 0.779 1 0.715 1 0.642 4 0.741 0 0.688 2
    03 0.673 0 0.616 1 0.643 294 0.551 8 0.671 4 0.605 8 0.570 5 0.677 4 0.619 3
    04 0.708 2 0.637 8 0.671 159 0.571 4 0.711 1 0.633 7 0.577 9 0.690 5 0.629 2
    05 0.849 6 0.910 6 0.879 043 0.910 3 0.837 6 0.872 4 0.894 3 0.814 4 0.852 5
    06 0.752 2 0.753 8 0.752 999 0.726 5 0.705 8 0.716 0 0.702 4 0.685 6 0.693 9
    07 0.787 4 0.767 0 0.777 066 0.708 7 0.768 4 0.737 4 0.730 4 0.758 2 0.744 0
    08 0.836 8 0.823 4 0.830 046 0.790 2 0.851 8 0.819 8 0.778 1 0.842 0 0.808 8
    09 0.750 0 0.166 7 0.272 772 0.666 7 0.400 0 0.500 0 0.733 3 0.392 9 0.511 6
    10 0.755 6 0.531 2 0.623 834 0.796 9 0.622 0 0.698 6 0.582 1 0.639 3 0.609 4
    11 0.727 3 0.510 6 0.599 983 0.702 1 0.452 1 0.550 0 0.666 7 0.500 0 0.571 4
    12 0.708 5 0.771 2 0.738 522 0.756 5 0.745 5 0.750 9 0.687 7 0.743 0 0.714 3
    下载: 导出CSV

    表  3   不同二级学科用户的信息熵

    学科 信息熵
    哲学 1.555 6
    政治学 1.153 1
    艺术学 1.081 6
    法学 0.915 0
    心理学 0.748 4
    历史学 0.635 7
    工商管理 0.631 3
    理论经济学 0.595 7
    应用经济学 0.580 8
    计算机科学与技术 0.566 5
    矿业工程 0.000 6
    核科学与技术 0.000 6
    军制学 0.000 6
    石油与天然气工程 0.000 6
    兽医学 0.000 8
    口腔医学 0.001 1
    农业资源利用 0.001 5
    战术学 0.001 5
    测绘科学与技术 0.001 8
    农业工程 0.002 2
    下载: 导出CSV

    表  4   图书数排名前20的社团

    社团编号 图书数量 图书隶属学科编号
    1 676 0503, 1205, 0504, 0501, 0502
    2 605 0502, 0503, 0504, 0501, 1205
    3 503 0502, 0503, 0504, 0822, 0501, 0101
    4 487 0502, 0503, 0504, 0501, 0812
    5 427 0502, 0503, 0504, 0501, 0812
    6 329 0501, 0502, 0503, 0504
    7 310 1205, 0501, 0502, 0504
    8 289 0501, 0502, 0503, 0504
    9 278 0501, 0502, 0504
    10 268 0501, 0502, 0503, 0504
    11 265 0501, 0502, 0503, 0504
    12 255 0501, 0502, 0504
    13 250 0501, 0502, 0504
    14 249 0501, 0502, 0503, 0504
    15 246 0502, 0503, 0504, 0601, 0501
    16 242 0501, 0502, 0503, 0504
    17 240 0501, 0502, 0503, 0504
    18 236 0501, 0502, 0504
    19 235 0502, 0503, 0504, 0501, 0101
    20 234 0502, 0503, 0504, 0501, 1205
    下载: 导出CSV

    表  5   二级学科网络社团特性

    社团 网络直径 聚类系数 平均路径长度 学科编号 社团特征
    蓝色社团(社团A) 3 0.36 1.27 0101,0201,0302,0303,0501,0502,0601,0705,1108,1201 哲学,经济学,政治学,社会学,中外文学,历史学,地理学,军事学,管理科学
    绿色社团(社团B) 4 0.34 1.52 0403,0707,0710,0712,0806,0807,0813,0814,0815,0818,0829,0830,0901,0902,0905,0908,1001,1002,1004,1005,1007 体育学,海洋科学,生物学,系统科学,冶金工程,动力工程,建筑学,土木工程,水利工程,环境科学,作物园艺,医学
    红色社团(社团C) 2 0.44 1.15 0701,0702,0703,0704,0708,0709,0801,0802,0808,0809,0811,0812,0816,0825,1205 数学,物理学,化学,天文学,地质学,力学,机械工程,电气工程,计算机科学与技术,航空宇航科学与技术,图书馆情报学
    下载: 导出CSV
  • [1] 胡海波, 王科, 徐玲, 等.基于复杂网络理论的在线社会网络分析[J].复杂系统与复杂性科学, 2008, 5(2):1-14. DOI: 10.3969/j.issn.1672-3813.2008.02.001

    HU Hai-bo, WANG Ke, XU Ling, et al. Analysis of online social networks based on complex network theory[J]. Complex Systems and Complexity Science, 2008, 5(2):1-14. DOI: 10.3969/j.issn.1672-3813.2008.02.001

    [2] 李栋, 徐志明, 李生, 等.在线社会网络中信息扩散[J].计算机学报, 2014, 37(1):189-206. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201401014

    LI Dong, XU Zhi-ming, LI Sheng, et al. A survey on information diffusion in online social networks[J]. Chinese Journal of Computers, 2014, 37(1):189-206. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201401014

    [3]

    ZHANG Z K, LIU C, ZHAN X X, et al. Dynamics of information diffusion and its applications on complex networks[J]. Physics Reports, 2016, 651:1-34. DOI: 10.1016/j.physrep.2016.07.002

    [4] 罗春海, 刘红丽, 胡海波.微博网络中用户主题兴趣相关性及主题信息扩散研究[J].电子科技大学学报, 2017, 46(2):458-468. DOI: 10.3969/j.issn.1001-0548.2017.02.022

    LUO Chun-hai, LIU Hong-li, HU Hai-bo. Research on correlation of users' topic interests and topic information diffusion in microblog networks[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2):458-468. DOI: 10.3969/j.issn.1001-0548.2017.02.022

    [5] 陆豪放, 张千明, 周莹, 等.微博中的信息传播:媒体效应与社交影响[J].电子科技大学学报, 2014, 43(2):167-173. DOI: 10.3969/j.issn.1001-0548.2014.02.002

    LU Hao-fang, ZHANG Qian-ming, ZHOU Ying, et al. Information spreading in microblogging systems:Media effect versus social impact[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(2):167-173. DOI: 10.3969/j.issn.1001-0548.2014.02.002

    [6] 许小可, 胡海波, 张伦, 等.社交网络上的计算传播学[M].北京:高等教育出版社, 2015.

    XU Xiao-ke, HU Hai-bo, ZHANG Lun, et al. Computational communication on social networks[M]. Beijing:Higher Education Press, 2015.

    [7] 汪小帆, 李翔, 陈关荣.网络科学导论[M].北京:高等教育出版社, 2012.

    WANG Xiao-fan, LI Xiang, CHEN Guan-rong. Introduction to network science[M]. Beijing:Higher Education Press, 2012.

    [8]

    BARABÁSI A L. Network science[M]. Cambridge, UK:Cambridge University Press, 2016.

    [9] 李楠楠, 张宁.图书馆借阅网的二分图研究[J].复杂系统与复杂性科学, 2009, 6(2):33-39. DOI: 10.3969/j.issn.1672-3813.2009.02.005

    LI Nan-nan, ZHANG Ning. The study of the bipartite graph about the library lending network[J]. Complex Systems and Complexity Science, 2009, 6(2):33-39. DOI: 10.3969/j.issn.1672-3813.2009.02.005

    [10] 燕飞, 张铭, 孙韬, 等.基于网络特征的用户图书借阅行为分析——以北京大学图书馆为例[J].情报学报, 2011, 30(8):875-882. DOI: 10.3772/j.issn.1000-0135.2011.08.013

    YAN Fei, ZHANG Ming, SUN Tao, et al. Network based users' book-loan behavior analysis:A case study of Peking university library[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(8):875-882. DOI: 10.3772/j.issn.1000-0135.2011.08.013

    [11] 张柯, 赵金龙, 胡小丽.基于复杂网络理论的高校图书馆借阅网络研究[J].大学图书情报学刊, 2014, 32(1):75-77. DOI: 10.3969/j.issn.1006-1525.2014.01.017

    ZHANG Ke, ZHAO Jin-long, HU Xiao-li. Research on book-borrowing network of university library based on the complex network theory[J]. Journal of Academic Library and Information Science, 2014, 32(1):75-77. DOI: 10.3969/j.issn.1006-1525.2014.01.017

    [12] 陈晓威, 孙建军.基于图书借阅网络的各类书籍关系研究[J].图书情报工作, 2017, 61(11):21-28. http://d.old.wanfangdata.com.cn/Periodical/tsqbgz201711003

    CHEN Xiao-wei, SUN Jian-jun. The relationships among books based on the book-borrowing network[J]. Library and Information Service, 2017, 61(11):21-28. http://d.old.wanfangdata.com.cn/Periodical/tsqbgz201711003

    [13] 王福生, 杨洪勇.图书管理系统中的借阅行为分析[J].复杂系统与复杂性科学, 2012, 9(1):55-58. DOI: 10.3969/j.issn.1672-3813.2012.01.009

    WANG Fu-sheng, YANG Hong-yong. Books-borrowing behavior in library management system[J]. Complex Systems and Complexity Science, 2012, 9(1):55-58. DOI: 10.3969/j.issn.1672-3813.2012.01.009

    [14]

    BARABÁSI A L. The origin of bursts and heavy tails in human dynamics[J]. Nature, 2005, 435:207-211. DOI: 10.1038/nature03459

    [15]

    MICHEL J B, SHEN Y K, AIDEN A P, et al. Quantitative analysis of culture using millions of digitized books[J]. Science, 2011, 331(6014):176-182. DOI: 10.1126/science.1199644

    [16]

    SHI F, SHI Y, DOKSHIN F A, et al. Millions of online book co-purchases reveal partisan differences in the consumption of science[J]. Nature Human Behaviour, 2017, 1(4):0079. DOI: 10.1038/s41562-017-0079

    [17]

    HU H B, WANG X F. Unified index to quantifying heterogeneity of complex networks[J]. Physica A, 2008, 387(14):3769-3780. DOI: 10.1016/j.physa.2008.01.113

    [18]

    CLAUSET A, SHALIZI C R, NEWMAN M E J. Power-law distributions in empirical data[J]. SIAM Review, 2009, 51:661-703. DOI: 10.1137/070710111

    [19]

    SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=2966e9ecc77ffc44adc76c0baa5b10bc

    [20]

    MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to information retrieval[M]. New York, USA:Cambridge University Press, 2008.

    [21]

    CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20(3):273-297. http://d.old.wanfangdata.com.cn/Periodical/hwyhmb200803006

    [22]

    JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09)[2017-07-12]. https://arxiv.org/abs/1607.01759.

    [23]

    HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999: 50-57.

    [24]

    HOFMANN T. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2):177-196.

    [25]

    BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022. http://d.old.wanfangdata.com.cn/Periodical/jsjyy201306024

    [26]

    COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12:2493-2537. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1103.0398

    [27]

    LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521:436-444. DOI: 10.1038/nature14539

    [28]

    GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. Cambridge, MA:The MIT Press, 2016.

    [29]

    KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1746-1751.

    [30]

    LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: IJCAI, 2016: 2873-2879.

    [31]

    LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 2267-2273.

    [32]

    YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2016: 1480-1489.

    [33]

    WENG L, MENCZER F. Topicality and impact in social media:Diverse messages, focused messengers[J]. PLoS ONE, 2015, 10(2):e0118410. DOI: 10.1371/journal.pone.0118410

    [34]

    FORTUNATO S. Community detection in graphs[J]. Physics Reports, 2010, 486(3-5):75-174. DOI: 10.1016/j.physrep.2009.11.002

    [35]

    JAVED M A, YOUNIS M S, LATIF S, et al. Community detection in networks:A multidisciplinary review[J]. Journal of Network and Computer Applications, 2018, 108:87-111. DOI: 10.1016/j.jnca.2018.02.011

    [36]

    GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12):7821-7826. DOI: 10.1073/pnas.122653799

    [37]

    CLAUSET A, NEWMAN M E J, MOORE C. Finding community structure in very large networks[J]. Phys Rev E, 2004, 70:066111. DOI: 10.1103/PhysRevE.70.066111

    [38]

    BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics, 2008(10):P10008. DOI: 10.1088-1742-5468-2008-10-P10008/

    [39]

    PONS P, LATAPY M. Computing communities in large networks using random walks[EB/OL]. (2005-12-12)[2017-07-23]. https://arxiv.org/abs/physics/0512106.

    [40]

    ROSVALL M, BERGSTROM C T. Maps of random walks on complex networks reveal community structure[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4):1118-1123. DOI: 10.1073/pnas.0706851105

图(5)  /  表(5)
计量
  • 文章访问数:  6715
  • HTML全文浏览量:  2797
  • PDF下载量:  86
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-09-20
  • 修回日期:  2019-03-14
  • 刊出日期:  2019-11-29

目录

    /

    返回文章
    返回