PhD Dissertation Proposal Defense: Shib Dasgupta, Learning Set Theoretic Representation With Box Embeddings
Content
Speaker
Description
Sets are fundamental to representing and reasoning about human knowledge. Many queries in information retrieval and recommendation systems are inherently set-theoretic, involving operations like conjunctions, disjunctions, and negations. However, differentiable set representations are challenging due to the need for set-theoretic consistency. Region-based embeddings, particularly Box Embeddings, provide a natural solution by functioning as trainable Venn diagrams, where volumes capture joint probabilities between concepts.
Region-based embeddings like Box Embeddings serve as trainable Venn diagrams, capturing set-theoretic operations. However, they face training challenges, especially with local identifiability. To address this, I developed a Gumbel process-based method, creating a now widely-used Box Embedding variant. Using the set-theoretic structure of box embeddings, I then enhanced word semantics—for example, our Word2Box model understands that "tongue" and "body" should relate to "mouth," while "tongue" and "language" aligns with "dialect."
Information retrieval systems often handle queries with implicit set operations like intersection, union, and difference, such as “Comedy movies but no action” on Netflix. Document retrieval and recommendation systems struggle to capture these set-theoretic dependencies, especially with sparse data. To address this challenge, I propose a matrix completion framework that incorporates set-theoretic dependencies, using box embeddings to represent entities and concepts as sets.
Advisor
Andrew McCallum