Affiliation:
1. National Geomatics Center of China, Beijing 100830, China
2. School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Abstract
Temporal intent is an important component of events. It plays an important role in collecting them from the web with focused crawlers. However, traditionally focused crawlers usually only consider factors such as topic keywords, web page content, and anchor text, ignoring the relationship between web pages and the temporal intent of events. This leads to their poor crawling performance. This paper aims to understand the temporal intent of events and apply it within focused crawlers. First, a new temporal intent identification method is proposed based on Google Trends data. The method can automatically identify the start time of an event and quantify the temporal distribution of the event. Then, a new focused event crawler with temporal intent is proposed. The crawler incorporates the start time of the event into the similarity calculation module, and a new URL (Uniform Resource Locator) priority assignment method is developed using the quantified temporal distribution of temporal intent as the independent variable of a natural exponential function. Experimental results show that our method is effective in identifying the start time of events at the month level and quantifying the temporal distribution of events. Furthermore, compared to the traditional best-first crawling method, the precision of our method improves by an average of 10.28%, and a maximum of 25.21%. These results indicate that our method performs better in retrieving relevant pages and assigning URL priority. This also illustrates the importance of the relationship between web pages and the temporal intent of events.
Funder
Hunan Provincial Natural Science Foundation of China
Yunnan Fundamental Research Projects
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference46 articles.
1. Exploring a landslide inventory created by automated web data mining: The case of Italy;Franceschini;Landslides,2022
2. Sufi, F.K., and Khalil, I. (IEEE Trans. Comput. Social Syst., 2022). Automated Disaster Monitoring from Social Media Posts Using AI-Based Location Intelligence and Sentiment Analysis, IEEE Trans. Comput. Social Syst., early access.
3. Huang, X., Jin, H.D., and Zhang, Y. (2019). Risk assessment of earthquake network public opinion based on global search BP neural network. PLoS ONE, 14.
4. Pre-hospital emergency response to terrorist attacks: A scoping review;Amiresmaili;Hong Kong J. Emerg. Med.,2022
5. Survey of temporal information retrieval and related applications;Campos;ACM Comput. Surv. (CSUR),2014
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Goal-Directed Target Discovery in Social Network;2023 4th International Conference on Computer Engineering and Intelligent Control (ICCEIC);2023-10-20