Affiliation:
1. University of Padua, Padua, Italy
Abstract
Data accuracy is a central dimension of data quality, especially when dealing with Knowledge Graphs (KGs). Auditing the accuracy of KGs is essential to make informed decisions in entity-oriented services or applications. However, manually evaluating the accuracy of large-scale KGs is prohibitively expensive, and research is focused on developing efficient sampling techniques for estimating KG accuracy. This work addresses the limitations of current KG accuracy estimation methods, which rely on the Wald method to build confidence intervals, addressing reliability issues such as zero-width and overshooting intervals. Our solution, rooted in the Wilson method and tailored for complex sampling designs, overcomes these limitations and ensures applicability across various evaluation scenarios. We show that the presented methods increase the reliability of accuracy estimates by up to two times when compared to the state-of-the-art while preserving or enhancing efficiency. Additionally, this consistency holds regardless of the KG size or topology.
Publisher
Association for Computing Machinery (ACM)
Reference43 articles.
1. Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions
2. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11--15, 2007 (LNCS), Vol. 4825. Springer, 722--735. 10.1007/978-3-540-76298-0_52
3. A. Bonifati G. H. L. Fletcher H. Voigt and N. Yakovets. 2018. Querying Graphs. Morgan & Claypool Publishers. 10.2200/S00873ED1V01Y201808DTM051
4. Interval Estimation for a Binomial Proportion;Brown L. D.;Statist. Sci.,2001
5. G. Casella and R. L. Berger. 2002. Statistical Inference. Thomson Learning. https://books.google.it/books?id=0x_vAAAAMAAJ