Content

Speaker

Ankita Gupta

Abstract

Policy and legal argumentation spans thousands of public comments, book pages, and judicial opinions covering decades of United States public discourse. Obtaining insights from these vast corpora can be valuable in helping institutions understand public feedback or aiding analysts in drafting well-reasoned arguments. However, processing these corpora is challenging: manual analyses are labor-intensive and time-consuming, and existing content analysis techniques are often limited to identifying binary sentiments or opinions towards an issue but fail to capture how these opinions are interconnected through argumentative relations in the broader discourse.

In this thesis, I examine how argumentative and persuasive texts can be understood at scale. I explore several aspects of such texts including a) identifying an argument’s structure, b) identifying whose beliefs are cited to build persuasive arguments, and c) detecting stances among multiple arguments in a large corpora. Throughout these studies, I use graphs—signed edges among sentential clauses—to represent and understand argumentative texts. I also explore the tradeoff between annotator effort and computational resources required for sophisticated argument analysis at scale.

First, I develop a method to identify the structure of an argument composed of claims supported by reasons. Traditional supervised models for this task often rely on extensive domain-specific annotations, making it difficult to scale or adapt to new domains. To address this issue, we developed a zero-shot method that uses the names of a popular linguistic theory to prompt large language models (LMs). We apply this method to a large corpus of critical public comments, enabling a more comprehensive analysis of the qualitative public feedback beyond conventional measures of public preference (e.g., polls, voting).

Second, I examine whose beliefs are cited to build arguments. Arguments often involve strategically citing beliefs—either one's own or others—with varying levels of commitment. To disambiguate an author’s beliefs from those attributed to others, we introduce the task of epistemic stance detection and develop a neural model for this task. We apply it to study citation practices among political opinion elites across U.S. political ideologies.

Third, I propose an efficient method to detect stances between arguments, measuring how strongly they strengthen or weaken each other. While LMs can identify stances, they can be computationally expensive at scale, and training smaller models demands significant annotator effort to develop annotated datasets for supervision. To address this issue, we use legal writing conventions to semi-automatically mine a large dataset of naturally occurring expert-annotated argument pairs with stances. We then explore the effectiveness of these annotations for developing small, domain-agnostic, and cost-efficient models, comparing their performance to expert annotations and zero-shot approaches.

Overall, the expected contributions of this thesis include frameworks, methods, datasets, and evaluation strategies for understanding argumentative texts across domains. This work opens several promising avenues for future research, such as enabling institutions to listen at scale by summarizing public feedback or identifying areas of disagreement, developing tools to assess an analytical report’s quality via argument stance analyses, and investigating questions like how well can citation practices help predict a legal precedent’s vitality. We anticipate these advancements will benefit areas such as natural language processing, computational semantics, and computational social science.

Advisor

Brendan O'Connor