The Language Divide

By Matthew Gentzkow

Posted on February 8, 2017


In the wake of the 2016 election, the political divide in America has widened into a chasm. Deep divisions are in some sense nothing new. Conflicts surrounding the Vietnam War, McCarthyism, and civil rights were bitterly fought and often seemed intractable. Yet many have suggested that there is something qualitatively different about the degree of mutual incomprehension that divides left and right today.


The academic literature has often struggled to find evidence to back up this perception. Many prominent scholars have argued that growing polarization of the general public is a myth [“Political Polarization in the American Public” and Culture War? The Myth of a Polarized America]. The evidence for deepening divides in Congress is clearer, but the data suggests that polarization was as great or greater in earlier periods [Polarized America: The Dance of Ideology and Unequal Riches]. These literatures together have considered a range of data points, including voters’ self-reported ideological orientations, views on issues, and perceptions of the other side, as well as patterns of roll-call voting in Congress.


In a recent study [“Measuring Polarization in High-dimensional Data”], Jesse Shapiro, Matt Taddy, and I consider a different kind of data on political divisions: the languages we use to talk about politics. We build an index that captures the extent to which Republicans and Democrats speak differently at each point in time. Unlike most other measures of polarization, this one shows that something really is qualitatively different today. We find that the differences in language between left and right have exploded since the early 1990s, and are far larger today than at any point since our data begin in the late 19th Century.


It will come as no surprise to anyone who has been paying attention to American politics that Democrats and Republicans use language differently. Democrats talk about “estate taxes,” “undocumented workers,” and “tax breaks for the wealthy,” while Republicans refer to “death taxes,” “illegal aliens,” and “tax reform.” The 2010 Affordable Care Act was “comprehensive health reform” to Democrats and a “Washington takeover of health care” to Republicans. Within hours of the 2016 killing of 49 people in a nightclub in Orlando, Democrats were calling the event a “mass shooting”—linking it to the broader problem of gun violence—while Republicans were calling it an act of “radical Islamic terrorism”—linking it to concerns about national security and immigration.


Is today's partisan language a new phenomenon? In one sense, the answer is clearly no: one can easily find examples of partisan terms in America's distant past. For example, northerners referred to the American Civil War as the “Great Rebellion,” while southerners called it the “War for Southern Independence” or, in later years, the “War of Northern Aggression.” However, the magnitude of the differences we see today, the deliberate strategic choices that seem to underlie them, and the expanding role of consultants and focus groups suggest that what we see today might represent a consequential change.


In our study, we apply tools from economics and machine learning to quantify the partisanship of language in the US Congress from 1873 to 2009. As an input, we use the full text of the Congressional Record for these years—a verbatim transcript of every speech given on the floor. We parse the raw text to identify who is speaking at each moment, and count the number of times each of the roughly half million two-word phrases that appear in the Record are used by Republicans and Democrats respectively. We then estimate a machine-learning model that seeks to predict a speaker’s part from the language he or she uses.


To capture differences over time, we define a new measure of partisanship: the probability that an observer could predict a speaker’s party from a fixed quantity of speech. If Democrats and Republicans use language similarly, prediction will be difficult and this measure will be close to 0.5. If they use language very differently, it will be higher.


Figure 1 shows how this measure has evolved over the course of our sample. It shows that partisanship is low and relatively stable from the late 19th Century until the early 1990s, then explodes in the two decades since. Prior to 1990, the probability of guessing a speaker’s party after hearing one minute of speech was less than 55 percent. By the end of our sample in 2008, it had risen to more than 80 percent.


What drove this change? We cannot say conclusively, but the timing of inflection point in Figure 1 provides strong circumstantial evidence.


Figure 1

Image: Polarization graph

The sharp change in the series occurs in the 104th congress (1995-1996), the first following the 1994 election. This was no ordinary election. It was the year the Republicans took back control of Congress for the first time in more than forty years, under the leadership of Newt Gingrich and a platform called the Contract with America. It is widely viewed as a watershed event in the history of the US Congress. And it was an election in which the centerpiece of the Republicans’ strategy was language. Assisted by the consultant Frank Luntz—who was hired by Gingrich to help craft the Contract with America, and became famous in significant part because of his role in the 1994 campaign—the Republicans used focus groups and polling to identify rhetoric that resonated with voters.


Asked in an interview whether “language can change a paradigm,” Luntz replied:


I don't believe it – I know it. I've seen it with my own eyes.... I watched in 1994 when the group of Republicans got together and said: “We're going to do this completely differently than it's ever been done before....” Every politician and every political party issues a platform, but only these people signed a contract [Interview Frank Luntz, Frontline].


These facts, along with additional evidence we discuss in the paper, suggest that what we saw in 1994 was an innovation in the technology of political persuasion. While speakers have always tried to craft language for rhetorical effect, the advent of large-scale testing and coordination meant politicians could do this far more effectively than ever before. The innovation introduced by Republicans was quickly imitated by Democrats, and the partisan divide in language has continued to grow over subsequent years.


Does this change in language matter? A broad body of prior work gives us reason to think that it may. Laboratory experiments show that framing can affect the public’s views on issues. Field studies reveal effects of language on broader outcomes such as marriage and risk preferences. Most fundamentally, language is one of the most important cues of group identity, with differences in language or accent producing own-group preferences even in infants and young children. The more we speak different languages, the more we are likely to behave and feel like separate tribes, with strong bonds among the like-minded and deepening hostility and distrust between those who think differently.



The Language Divide” is published on TAP by permission from its author, Professor Matthew Gentzkow.


Matthew Gentzkow is Professor of Economics at Stanford University. He studies empirical industrial organization and political economy, with a focus on media industries. He is a Co-Editor of the American Economic Journal: Applied Economics and Associate Editor of the RAND Journal of Economics.


Professor Gentzkow was awarded the American Economic Association’s (AEA) 2014 John Bates Clark Medal, presented to an American economist under the age of forty who is judged to have made the most significant contribution to economic thought and knowledge.



About the Author

  • Matthew Gentzkow
  • Stanford University
  • 579 Serra Mall
    Stanford, CA 94305

Recent TAP Bloggers