Affiliation:
1. University of Maryland, Computer Science Center, College Park, Maryland
Abstract
This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location).
The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Reference7 articles.
1. Automatic abstracting. RADC-TDR-63-93 TRW Computer Div. Thompsoa-Ramo- Wooldridge Inc. Canoga Park Calif. Feb. 1963. Automatic abstracting. RADC-TDR-63-93 TRW Computer Div. Thompsoa-Ramo- Wooldridge Inc. Canoga Park Calif. Feb. 1963.
2. Problems in automatic abstracting
3. Automatic abstracting and indexing—survey and recommendations
Cited by
665 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献