Content

Speaker: Tanya Chowdhury

Abstract: The rapid adoption of Neural Ranking Models (NRMs) in Information Retrieval (IR) underscores the urgent need for robust interpretability methods to ensure their reliability and transparency. Despite their widespread use, how NRMs model query-document relevance remains poorly understood, limiting their application in high-stakes domains. This proposal addresses the critical gap in generalizable and scalable post-hoc and mechanistic interpretability methods tailored for ranking tasks.

First, we extend the Local Interpretable Model-agnostic Explanations (LIME) framework to NRMs, proposing RankLIME. This framework is able to generate local feature attributions for pointwise, pairwise, as well as listwise ranking models. Using RankLIME, practitioners gain a scalable tool to interpret the individual contributions of features to the ranking decisions of complex neural models. Next, recognizing the inconsistencies and contradictions in existing empirical ranking feature attribution methods, we adopt an axiomatic approach, establishing a set of fundamental principles that ranking-based attribution methods should satisfy. To this end, we introduce the RankSHAP framework, using the Shapley value from cooperative game theory. RankSHAP ensures that feature attributions are generalizable, consistent, and aligned with human intuition. Extensive experiments on benchmark datasets and diverse ranking models, coupled with a user study, validate the effectiveness of RankSHAP. An axiomatic analysis further clarifies the compliance and deviations of other attribution methods relative to the proposed axioms, enhancing our understanding of their reliability.

Beyond feature attributions, this thesis delves into the internal mechanisms of transformer-based ranking LLMs to understand the abstract features these models internally encode to model relevance in ranking decisions. Through a layer-wise analysis of LLM neuron activations, we investigate whether known statistical and human-engineered features—such as term frequency, inverse document frequency, and covered query term ratio—are embedded within network representations across different LLM configurations. These findings provide insights into the internal workings of ranking LLMs, laying the groundwork for designing more transparent and effective ranking systems.

In the next phase of this work, we aim to reverse-engineer implicit features encoded within the neuron activations of fine-tuned ranking LLMs, uncovering some of the novel latent features that contribute to their improved accuracy.  Understanding how relevance is modeled within these systems will not only advance our theoretical comprehension of ranking-focused LLMs, but also inform the development of improved statistical rankers by integrating effective relevance modeling mechanisms observed in LLMs. Ultimately, this work aspires to bridge the gap between interpretability and performance in neural ranking models, paving the way for more reliable, transparent, and accountable ranking systems in critical real-world applications.

Advisor: James Allan