PhD Dissertation Proposal Defense: Arjun Karuvally, Beyond the Hopfield Memory Theory: Dynamic Energy Landscapes and Traveling Waves in RNNs
Content
Speaker
Abstract
Recurrent Neural Networks (RNNs) are central to artificial intelligence, excelling in sequence processing tasks across domains—from natural language processing to protein folding in biology. However, fundamental questions about how RNNs store and process information over time by forming and updating their memories remain unanswered, limiting our ability to understand and improve these models. Current theories, such as the Hopfield memory theory, primarily focus on static memory storage and associative retrieval, lacking mechanisms to explain the dynamic and adaptive memory processes observed in real-world applications. In this thesis, I propose two theoretical frameworks to address this challenge: the Dynamic Energy Theory, elucidating the RNN long-term memory processes through synaptic interactions over time, and the Wave Theory, describing the dynamic storage of inputs as transient working memories in the neural activity. By building mathematical models from these theories, studying their properties, capacities, and limitations, I derive new RNNs with improved capabilities.
The Dynamic Energy Theory generalizes Hopfield memory by allowing the energy function to evolve over time, representing sequences as dynamic trajectories on the network's changing energy landscape. Using this theory, I develop a class of continuous-time RNNs with slow-fast timescale dynamics—where some neurons update rapidly while others change slowly—and analyze their "escape times," the durations spent in each memory state, revealing the conditions necessary for state transitions. Further, the analysis of memory capacity shows that it scales with the strength of inter-memory interactions, enabling the derivation of networks with long-sequence storage capacities that exponentially outperform existing sequence networks. Next, I show how local biologically plausible learning rules are derived from the energy function adapting existing synaptic memories based on the input provided to the network. These new networks could potentially transform tasks requiring adaptive long-term memory retention.
The Wave Theory conceptualizes the binding of input and task-relevant variables in RNNs as propagating waves of neural activity. Building upon the Dynamic Energy Theory, it elucidates how synaptic interactions support wave propagation in the neural activity through local interactions. I demonstrate that practical RNNs like Elman RNNs and State Space Models (SSMs) can be transformed into this wave-based model, suggesting that it serves as a canonical framework for understanding existing RNNs. Using this model, I reveal hidden traveling waves in Elman RNNs that store memories and mitigate the vanishing gradient problem. Additionally, I show that the canonical wave model limits the computational power of existing SSMs to finite state machines—the simplest class in the Chomsky hierarchy. By incorporating waves with variable speed and direction, I illustrate how the computational power can be increased, enabling these models to process more complex sequences.
Together, these theories aim to fill a critical gap in understanding how RNNs store and process information over time. By enhancing interpretability and performance, they could inform the design of more efficient neural networks, positively impacting applications that rely on sequential data across diverse scientific disciplines and paving the way for advancements in artificial intelligence.
Advisor
Hava Siegelmann