PhD Dissertation Proposal Defense: Prasanna Lakkur, A Computational Approach To Understanding Online Community Dynamics and Opinion
Content
Speaker
Abstract
This research advances methodologies for analyzing textual content from large-scale online datasets, interpreting message intent, retrieving knowledge, and leveraging these insights to address practical challenges across diverse domains. Firstly, I propose two complementary approaches to improve Temporal Question Answering. TempoQR uses a Retrieval Augmented Generation (RAG) inspired approach to enhance the question representation using relevant facts from a large knowledge graph. LASR, on the other hand, focuses on understanding the intent of the question to pick the best SPARQL query to answer temporal questions. Building upon this work, I use intent classification to classify messages with labels of interest to the law enforcement to help them triage massive corpora of conversations. The classification model enables a novel conversation clustering technique, revealing distinct conversational patterns and providing insights into different victim experiences.
Secondly, I advance a method for understanding the opinions expressed within large-scale online communities. I propose a self-supervised approach to model collective community opinion by leveraging readily available data. This overall solution is based on two novel techniques for community comparison: BOTS, which calculates similarity based on expressed opinions; and Emb-PSR, which calculates similarity based on content. My quantitative results demonstrate that both BOTS and Emb-PSR outperform existing methods in their respective tasks and facilitate cross-platform comparisons between communities. These methods are further enhanced through the integration of RAG to improve opinion modeling accuracy, particularly in smaller communities.
Finally, I use the enhanced opinion model to enable detection of influence operations and sudden shifts of opinions in online communities. I introduce a new Reddit-based dataset to validate the influence operation detection. This work provides tools for understanding user intent, online dynamics and identifying coordinated manipulation efforts.