Faster Algorithms for Fair Max-Min Diversification in R d

Author:

Kurkure Yash1ORCID,Shamo Miles1ORCID,Wiseman Joseph1ORCID,Galhotra Sainyam2ORCID,Sintos Stavros1ORCID

Affiliation:

1. Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA

2. Department of Computer Science, Cornell University, Ithaca, New York, USA

Abstract

The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size k from a large collection of n data points (FairDiv). The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to n or k) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution.

Funder

CAHSI-Google IRP

Publisher

Association for Computing Machinery (ACM)

Reference52 articles.

1. https://github.com/UIC-DB-Theory/FairDiversityandClustering.

2. Beer review https://snap.stanford.edu/data/web-BeerAdvocate.html.

3. Courts seek to increase jury diversity https://eji.org/report/race-and-the-jury/why-representative-juries-arenecessary/# chapter-2.

4. Courts seek to increase jury diversity https://www.uscourts.gov/news/2019/05/09/courts-seek-increase-jury-diversity.

5. Diversity maximization under matroid constraints

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3