Neutralizing Self-Selection Bias in Sampling for Sortition

Posted on December 23, 2021 by keithsutherland

Bailey Flanigan, Paul Gölz, Anupam Gupta, and Ariel Procaccia, Advances in Neural Information Processing Systems (2020). https://arxiv.org/abs/2006.10498

Yoram recently drew our attention to this sortition paper which was highly ranked by the Google search engine. It’s interesting to see that engineers and computer scientists take the problem of self-selection bias more seriously than political theorists and sortition activists.

Abstract: Sortition is a political system in which decisions are made by panels of randomly selected citizens. The process for selecting a sortition panel is traditionally thought of as uniform sampling without replacement, which has strong fairness properties. In practice, however, sampling without replacement is not possible since only a fraction of agents is willing to participate in a panel when invited, and different demographic groups participate at different rates. In order to still produce panels whose composition resembles that of the population, we develop a sampling algorithm that restores close-to-equal representation probabilities for all agents while satisfying meaningful demographic quotas. As part of its input, our algorithm requires probabilities indicating how likely each volunteer in the pool was to participate. Since these participation probabilities are not directly observable, we show how to learn them, and demonstrate our approach using data on a real sortition panel combined with information on the general population in the form of publicly available survey data.

Citing statistics from the Sortition Foundation:

typically, only between 2 and 5% of citizens are willing to participate in the panel when contacted. Moreover, those who do participate exhibit self-selection bias, i.e., they are not representative of the population, but rather skew toward certain groups with certain features.

To address these issues, sortition practitioners introduce additional steps into the sampling process.

To prevent grossly unrepresentative panels, many practitioners impose quotas on groups based on orthogonal demographic features such as gender, age, or residence inside the country. These quotas ensure that the ex-post number of panel members belonging to such a group lies within a narrow interval around the proportional share. Since it is hard to construct panels satisfying a set of quotas, practitioners typically sample using greedy heuristics. . . By not deliberately controlling individual selection probabilities, existing panel selection procedures fail to achieve basic fairness guarantees to individuals. Where uniform sampling in the absence of self-selection bias selects each person with equal probability k=n, currently-used greedy algorithms do not even guarantee a minimum selection probability for members of the pool, let alone fairly distributed probabilities over members of the population. The absence of such a theoretical guarantee has real ramifications in practice.

The authors then introduce an algorithm which is designed to ensure that the stratification process does not unfairly impact the probability of any individual in the sample pool being selected for the final panel, but (of necessity) it fails to tackle the likelihood that the 2-5% of citizens who accept the initial invitation may be unrepresentative of the sample population.

The real-world sortition project that they use to illustrate this is Climate Assembly UK (2020). Out of the original 30,000 invitations sent out only 1,727 citizens chose to enter the selection pool and the resultant panel selection was stratified by gender, age, region, education level, ethnicity and rural/urban. The organisers wanted to include a stratification for “climate concern level” but could not as only four respondents were “not at all concerned”. This would suggest that volunteering was possibly the most significant population parameter and that the resultant sample may well be highly unrepresentative of the sample population. This would clearly have a negative impact on the perceived democratic legitimacy of the assembly. The opacity of the stratification algorithm is also a significant factor.

In sum, given all these uncertainties it is hard to see how anything other than a large, quasi-mandatory sample could address the bias problems outlined in the paper.

Filed under: Academia, Applications, Sortition, Theory |

« 2021 review – statistics Short refutations of common arguments for sortition (part 3) »

Yoram Gat, on December 24, 2021 at 9:40 am said:

This work was mentioned on this blog when it was published in Nature, back in August. It was also mentioned just a week ago in the yearly review.

LikeLike

keithsutherland, on December 24, 2021 at 1:31 pm said:

Sorry Yoram, forgot about your earlier post (put it down to old age). I agree with your comments on the earlier blog post, especially regarding the need for transparency/simplicity and to make big efforts to ensure that everyone takes up the invitation (should be more quasi than mandatory). The other obvious point is that there is no way of knowing what factors lead to the vast majority of citizens turning down the invitation, so no amount of stratification can compensate for unknown unknowns (Donald Rumsfeld’s phrase). The authors of the paper claim that the “learning” aspect of their algorithm can do the heavy lifting, but I don’t understand how. This paper raises very important issues, so it would nice if it could generate some discussion second time round.

Happy Christmas (sic), everyone.

LikeLiked by 1 person