LIU Jingfa, CHEN Jinglan, ZHAO Peng. Focused Crawler Method Based on Wang−Landau Sampling[J]. Journal of University of Electronic Science and Technology of China, 2023, 52(4): 578-587. DOI: 10.12178/1001-0548.2022183
Citation: LIU Jingfa, CHEN Jinglan, ZHAO Peng. Focused Crawler Method Based on Wang−Landau Sampling[J]. Journal of University of Electronic Science and Technology of China, 2023, 52(4): 578-587. DOI: 10.12178/1001-0548.2022183

Focused Crawler Method Based on Wang−Landau Sampling

  • Aiming at the problem that the traditional crawler methods are easy to fall into local optima of the search and rarely consider modifying the crawling path based on historical crawling experience, a focused crawler method based on Wang−Landau (WL) sampling is proposed. This method uses the vector space model (VSM) and PageRank algorithm to evaluate the relevance and importance of links, respectively. Regional competition strategy is used to select the target link from the link set containing the topic−related links and links with potential value. Based on probability density function, the WL algorithm is used to sample the selected target links in the set, and guides the subsequent crawling of the crawler according to the historical statistical experience, so as to optimize the search path. The experimental results show that the WL-based focused crawling method can search more topic-relevant webpages than other methods in the literature, and the climbing accuracy and standard deviation of topic relevance of all downloaded pages are also significantly improved.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return