400 million voting records show profound racial and geographic disparities in voter turnout in the United States

Contributed equally to this work with: Michael Barber, John B. Holbein Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing * E-mail: barber@byu.edu Affiliation Department of Political Science, Brigham Young University, Provo, UT, United States of America ⨯

Contributed equally to this work with: Michael Barber, John B. Holbein Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing Affiliation Frank Batten School of Leadership and Public Policy, University of Virginia, Charlottesville, VA, United States of America

400 million voting records show profound racial and geographic disparities in voter turnout in the United States

Michael Barber,
John B. Holbein

Published: June 8, 2022
https://doi.org/10.1371/journal.pone.0268134

Figures

Abstract

One of the core tenets of a well-functioning representative democracy is that the people who vote to elect government officials are representative of the public. Here we reinforce the idea that reality is far from this lofty ideal. We document the extent and nature of inequities in voter participation in the United States with a level of granularity and precision that previous research has not afforded. To do so, we use a unique nationwide dataset of approximately 400 million validated voting records across multiple election cycles. With this novel dataset, we document large and persistent gaps in voter turnout by race, age, and political affiliation. Minority citizens, young people, and those who support the Democratic Party are much less likely to vote than whites, older citizens, and Republican Party supporters. Minorities, youth, and democrats are also much more likely to live in local communities where fewer individuals vote—areas that we term turnout deserts. Turnout deserts are especially pernicious given that they are self-reinforcing—bolstered by the social dynamics that fundamentally shape citizens’ voting patterns. Our results show just how glaring inequities in political participation are in the US. These patterns threaten the very fabric of our democracy and fundamentally shift the balance of political power in the halls of government towards the interests of whites, older citizens, and republicans. They illustrate that participation in the United States is strikingly unequal—far from the ideals that this country has long aspired to.

Citation: Barber M, Holbein JB (2022) 400 million voting records show profound racial and geographic disparities in voter turnout in the United States. PLoS ONE 17(6): e0268134. https://doi.org/10.1371/journal.pone.0268134

Editor: Noam Lupu, Vanderbilt University, UNITED STATES

Received: June 25, 2021; Accepted: April 22, 2022; Published: June 8, 2022

Copyright: © 2022 Barber, Holbein. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data cannot be shared publicly because of we have signed data use agreements with the Data Trust LLC that preclude us from legally sharing this data. The data underlying the results presented in the study are available from The Data Trust LLC; https://thedatatrust.com/, bill.dunne@thedatatrust.com.

Funding: J.H. received funding from The U.S. National Science Foundation for this project (SES-1657821). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. https://nsf.gov/.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Voting is one of the central pillars of representative government [1]. Indeed—given its core role of shaping who gets elected to positions of power and the public policies they make [2–4]—voting can be thought of as the foundational act of democracy [2, 3, 5] and a key component in determining the individual well-being of the multitudes of citizens affected by the policies that government implements. Voting serves as the primary means of ensuring a “government of, by and for the people” [6]. For these reasons, social scientists, philosophers, historians, pundits, and theorists have long looked to voter turnout as a barometer of democracy’s health and have lamented stagnating or declining levels of democratic participation [2, 7, 8]. Advocates of higher turnout have made many attempts to increase turnout, including proposing numerous electoral reforms and spending millions of dollars (and countless hours) each election cycle in an attempt to get-out-the-vote (GOTV) [9].

Despite the critical importance of voting, much of what we understand about this fundamental form of civic participation comes from data sources that have distinct limitations. A dominant majority of voter turnout studies use survey data—either from government sources (e.g. the Current Population Survey [CPS]) or those collected by social scientists (e.g. the American National Election Survey [ANES] or the Cooperative Congressional Election Study [CCES]) [10]. While we have, undoubtedly, learned many things from these studies, this reliance on surveys is unfortunate for at least two reasons. First, most surveys measure self-reported levels of voter turnout and, as such, are susceptible to social desirability biases [11]. As we show below, over-reporting of voting is not random, with certain groups much more likely to over-report than others. Furthermore, surveys that use validated vote are in reality simply using the sample of the voter file that matches to the self-selected respondents to the survey, whereas we use the entire file of validated voting individuals, as discussed below.

Second, surveys are rarely large enough to be representative at the local level. Most surveys—including the comparatively large CPS, CCES, and ANES—that contain voting information are only designed to be representative at the state level—not allowing us to dig into the key dynamics at play in local communities [12, 13]. Given this, surveys can say little about the dynamics and consequences of local communities’ patterns of voter participation, and there are therefore significant gaps in what we know about this central act of democracy. We have too little information about who is more/less likely to vote, how voter turnout varies across local communities, and what consequences low/unequal turnout in one’s social network has on future levels of political participation. And much of what we think we know is, in fact, based on potentially biased samples [14].

Materials and methods

We address these gaps by bringing together a nationwide list of approximately 400 million voter records across two election cycles—2014 and 2016. We focus on these two elections to gather insights from both a Midterm and a Presidential Election and given their proximity to the voter file snapshot that we employ. In the United States, whether a citizen votes (but not who they vote for) is public record. Individual states publish this information. The voter file data that we use in this manuscript has been collated by the data and analytics firm The Data Trust LLC. Much like other large-scale voter-file vendors—like Catalist, L2, and Aristotle—The Data Trust appends voting information from all 50 US states and the District of Columbia. However, unlike other firms that share 1% samples of their voter files with researchers, we have access to the entire Data Trust file, which contains just over 200 million individuals in a single year snapshot. Section 1 in S1 File further discusses the benefits of using voter files over survey data to study this question. To explore the extent to which the patterns that we document below persist across electoral contexts, we leverage two nationwide snapshots—one from 2014 (a midterm election) and one from 2016 (a presidential election). Our appended dataset contains voting and registration histories of all registered voters in the United States across these two election cycles. The file contains variables like vote history, age, gender, race, geographic location, and political party (along with a host of other modeled variables). These measures are of high quality. Indeed, [12, 15, 16] show that nationwide files have a high degree of fidelity to historical and contemporary aggregate demographic, partisanship, and turnout measures available through other administrative units (e.g. the Census) as well as individual-level measures in surveys [17]. In some states, race/political party are modeled and in others they are self-reported by the voter—via the voter registration form. The self-reported include AL, FL, GA, LA, NC, SC, MS, TN. The self-reported party states include AK, AZ, CA, CO, DE, DC, MD, MS, NE, NH, NY, NC, OK, OR, RI, UT, WV, and WY. The modeled states use a combination of factors to model race/party, including names, geographic location, previous voting history in primary elections, and campaign data on voter contacts. We show in the S1 File that the patterns that we document of geographic and demographic inequities in voter turnout are not driven by whether a person lives in a modeled or self-reported race/party state. (See discussion in S1 File and S1-S3 and S5-S8 Figs in S1 File for further evidence of the voter file’s high quality).

We focus on calculating three quantities in this paper. First, we estimate the size of turnout gaps by race, age, and political party. When looking at turnout differences by age, we define “old” as greater than 60 and “young” as younger than 30 throughout the paper. Though this quantity has been calculated in survey data (with self-reported turnout) before, it has (to our knowledge) yet to be provided using a comprehensive nationwide administrative file that measure validated voting histories. This is unfortunate as survey-data may provide us with misleading estimates about how large/small these turnout gaps are given differential rates of misreported voter turnout rates. Still, acknowledging that there is a literature on this topic, we readily note that this is not the main contribution of our article. We estimate this quantity first—as a means of bench-marking our results to what has been done previously in the literature. We focus on race, age, and political party given the core role that these factors play in politics in the United States. These estimates provide us with a picture of gaps in political participation across several salient social dimensions.

Second, we identify geographic areas where voter turnout is high and where, in contrast, it is low. We explore the location and types of people who live in what we term a turnout desert—a local community where comparatively few individuals around them vote. In this core part of our analysis, we focus our attention on the electoral precinct. We choose precincts because they serve as the geographic unit where most political activity occurs. In the US, precincts vary in size from about 400–3,000 voters [18]. We borrow our terminology here from the public health literature that explores ‘food deserts’ (e.g. [19]) or geographic areas where residents’ access to affordable, healthy food options (especially fresh fruits and vegetables) is restricted. Like a food desert, a turnout desert is a place not necessarily where no one votes, but rather where access to individuals who regularly participate in politics is restricted. Conscientious of the fact that identifying turnout deserts is, in and of itself, a new enterprise, in our analyses we define turnout deserts in different ways—some that rely on where a community lies in the overall distribution of voter turnout, some that rely on arbitrary thresholds of where turnout is low, and some that simply look along the continuous range of turnout levels across communities. (Ultimately, our results are robust to alternative ways of coding turnout deserts.) In contrast to the literature that estimates gaps in voter turnout by citizen demographics, to date no work has identified the scope, prevalence, location, characteristics, and consequences of turnout deserts. This has simply not been possible with survey data that is not large enough nor designed to be representative at levels of geography lower than the state. Simply put, voter files are the only convincing way to look at turnout rates overall and across individual-level subgroups at the micro level. And this has not been done before. This is unfortunate as there are large inequalities in democratic participation within local communities in the United States—a fact that previous research has not fully explored or acknowledged.

Third, we are the first to explore the extent to which registered citizens of various social characteristics are likely to live in turnout deserts. We specifically explore whether minorities, the young, and democrats are more/less likely to live in a turnout desert than whites, older citizens, and republicans. We focus on these dimensions given their salience in struggles for political power in the United States. Again, this task has never been done before given that the data demands have simply been too great. Exploring the extent, nature, and spread of turnout deserts is vitally important given that previous research has suggested that, at its roots, voting is a social act. Indeed, survey-based studies show that individuals that self-report being asked to vote are also much more likely to say that they vote in elections [20, 21], voting interventions that increase the social salience of voting have large effects [22, 23], voting interventions have spillover effects within families [24–26], and individuals with social skills are much more likely to cast a vote [8]. Taken together, this suggests that if certain groups of people—e.g., minorities, Democrats, and youth—live in turnout deserts, these deserts have the potential to be reinforcing—locking-in long-term power inequities in the American political system.

Together, these three quantities serve to give a thorough description of participatory inequalities in American democracy. They provide us with a even clearer picture of turnout inequities than previous research has afforded.

Results

First, we note that there are large gaps across groups in both electoral contexts in addition to major over-estimates of turnout when using survey data (even with validated votes). Fig 1 shows differences by race, party, and age. The black bars are turnout rates using the full voter file. The grey bars indicate turnout rates using CCES validated vote and the ANES, respectively. We note that in many cases survey-based estimates of turnout are much higher than what is found in voter files, which further motivates our recommendation that researchers base turnout rates on voter files and not survey data, even when validated turnout is linked to survey data [12]. (For state by state turnout rates, see S5-S8 Figs in S1 File and for the distribution of turnout in local areas, see S9 Fig in S1 File) Moreover, our work also shows that previous survey measures of turnout are not uniformly higher than voter files; rather, the differences in voter turnout rates vary across individual groups. For example, in 2016 the ANES does a reasonably good job at getting turnout rates for older citizens (i.e. those 60+)—being only 3 percentage points higher than the Data Trust estimate—but the ANES does horribly in getting the turnout rates of younger voters—who they project a full 22 percentage points higher than the Data Trust estimate. Conversely, the CCES validated vote estimates underestimates older citizens’ overall rates of turnout, but gets the overall rate of turnout among young people spot on. (We note, however, that as Fraga and Holbein (2020) show, the CCES’s youth turnout estimates are quite off at the state level.) Similar, although slightly less striking differences can be seen along racial (where misses in the CCES range from 2 to 10 percentage points and 20 to 25 percentage points in the ANES) and partisan (where misses in the CCES range from 10 to 11 percentage points and 20 to 24 percentage points in the ANES) lines. In 2014, racial gaps between the CCES and the Data Trust data range from 7 to 17 percentage points; partisan gaps range from 21 to 23 percentage points; and age gaps range from 11 to 14 percentage points. In short, there is at least some evidence that the difference between voter file estimates of turnout and survey-based (or survey linked to voter files) is not uniform across groups. We note that the differences are not due to social desirability, in the classic sense; that is, in the over-reporting of voting. This is true as we are using the CCES validated voter turnout measures. The differences, then, can be attributed to perhaps the sampling framework of the CCES, those who respond to the survey, or its weights.