Recovering semantics of tables on the web-Reference-Cited by-同舟云学术

Recovering semantics of tables on the web

Published:2011-06 Issue:9 Volume:4 Page:528-538
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Venetis Petros¹,Halevy Alon²,Madhavan Jayant²,Paşca Marius²,Shen Warren²,Wu Fei²,Miao Gengxin³,Wu Chung²

Affiliation:

1. Stanford University

2. Google Inc.

3. UC Santa Barbara

Abstract

The Web offers a corpus of over 100 million tables [6], but the meaning of each table is rarely explicit from the table itself. Header rows exist in few cases and even when they do, the attribute names are typically useless. We describe a system that attempts to recover the semantics of tables by enriching the table with additional annotations. Our annotations facilitate operations such as searching for tables and finding related tables. To recover semantics of tables, we leverage a database of class labels and relationships automatically extracted from the Web. The database of classes and relationships has very wide coverage, but is also noisy. We attach a class label to a column if a sufficient number of the values in the column are identified with that label in the database of class labels, and analogously for binary relationships. We describe a formal model for reasoning about when we have seen sufficient evidence for a label, and show that it performs substantially better than a simple majority scheme. We describe a set of experiments that illustrate the utility of the recovered semantics for table search and show that it performs substantially better than previous approaches. In addition, we characterize what fraction of tables on the Web can be annotated using our approach.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2002938.2002939

Cited by 188 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model;Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD);2024-01-04

2. Dataset Discovery and Exploration: A Survey;ACM Computing Surveys;2023-11-09

3. Ontology-Driven Semantic Analysis of Tabular Data: An Iterative Approach with Advanced Entity Recognition;Applied Sciences;2023-10-02

4. Knowledge Graph Engineering Based on Semantic Annotation of Tables;Computation;2023-09-05

5. Context-Aware Semantic Type Identification for Relational Attributes;Journal of Computer Science and Technology;2023-07