Content

Speaker

Zafeiria (Iro) Moumoulidou

Abstract

Data is generated and collected from all aspects of human activity, in domains like commerce, medicine, and transportation, as well as scientific measurements, simulations, and environmental monitoring. However, while datasets grow large and are readily available, they are often downsampled since dealing with vast amount of data can be challenging. This is often due to efficiency purposes or even human consumption limitations: e.g., data visualization applications can only display small parts of the data at a time, since human users can process limited information.

While data subset selection is common, deriving high-quality subsets is non-trivial. In this thesis, we revisit the task of data selection through the lens of social-oriented metrics like diversity and fairness. First, we introduce the problem of fair and diverse data selection, which is NP-hard. We present an algorithmic framework with strong approximation guarantees that extends a well-established diversification-only model to support fairness in the selection process. Next, we show we can improve the approximation guarantees for fair and diverse data selection by introducing bi-criteria approximation algorithms. Finally, we focus on the task of data selection for visualization purposes. We propose a perception-aware data selection scheme for data visualization that leverages the notion of diversity to produce better subsets.