Equitable Top-k Results for Long Tail Data-Reference-Cited by-同舟云学术

Equitable Top-k Results for Long Tail Data

Published:2023-12-08 Issue:4 Volume:1 Page:1-24
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Islam Md Mouinul¹^ORCID,Asadi Mahsa¹^ORCID,Basu Roy Senjuti¹^ORCID

Affiliation:

1. New Jersey Institute of Technology, Newark, NJ, USA

Abstract

For datasets exhibiting long tail phenomenon, we identify a fairness concern in existing top-k algorithms, that return a "fixed" set of k results for a given query. This causes a handful of popular records (products, items, etc) getting overexposed and always be returned to the user query, whereas, there exists a long tail of niche records that may be equally desirable (have similar utility). To alleviate this, we propose θ-Equiv-top-k-MMSP inside existing top-k algorithms - instead of returning a fixed top-k set, it generates all (or many) top-k sets that are equivalent in utility and creates a probability distribution over those sets. The end user will be returned one of these sets during the query time proportional to its associated probability, such that, after many draws from many end users, each record will have as equal exposure as possible (governed by uniform selection probability). θ-Equiv-top-k-MMSP is formalized with two sub-problems. (a) θ-Equiv-top-k-Sets to produce a set S of sets, each set has k records, where the sets are equivalent in utility with the top-k set; (b) MaxMinFair to produce a probability distribution over S, that is, PDF(S), such that the records in S have uniform selection probability. We formally study the hardness of θ-Equiv-top-k-MMSP. We present multiple algorithmic results - (a) An exact solution for θ-Equiv-top-k-Sets, and MaxMinFair. (b) We design highly scalable algorithms that solve θ-Equiv-top-k-Sets through a random walk and is backed by probability theory, as well as a greedy solution designed for MaxMinFair. (c) We finally present an adaptive random walk based algorithm that solves θ-Equiv-top-k-Sets and MaxMinFair at the same time. We empirically study how θ-Equiv-top-k-MMSP can alleviate a equitable exposure concerns that group fairness suffers from. We run extensive experiments using 6 datasets and design intuitive baseline algorithms that corroborate our theoretical analysis.

Funder

NSF

Office of Naval Research

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3626727

Reference55 articles.

1. Real-time recommendation of diverse related articles

2. Diverse near neighbor problem

3. Efficient Indexes for Diverse Top-k Range Queries

4. Airbnb. 2023. Dataset. http://insideairbnb.com/get-the-data Airbnb. 2023. Dataset. http://insideairbnb.com/get-the-data

5. The Long Tail: Why the Future of Business Is Selling Less of More