PhD Thesis Defense: Arjun Karuvally, Beyond the Hopfield Memory Theory: Dynamic Energy Landscapes and Traveling Waves in RNNs
Content
Speaker: Arjun Karuvally
Advisor: Hava Siegelmann
Abstract: Recurrent Neural Networks (RNNs) are central to artificial intelligence, excelling in sequence processing tasks across domains—from natural language processing to protein folding in biology. However, fundamental questions about how RNNs store and process information over time remain unanswered, limiting our ability to design and build capable AI models. Current theories, such as the Hopfield memory theory, primarily focus on static memory storage and associative retrieval, lacking explanations for the dynamic and adaptive memory processes in real-world applications. In this thesis, I formulate two theoretical frameworks addressing the challenge: the Dynamic Energy Theory, elucidating the RNN long-term memory processes through synaptic interactions over time, and the Wave Theory, describing the dynamic storage of inputs as transient working memories in neural activity. By building mathematical models from these theories, studying their properties, capacities, and limitations, I derive new RNNs with improved capabilities.
The Dynamic Energy Theory (DET) generalizes Hopfield memory by allowing the energy function to evolve over time, representing sequences as dynamic trajectories on the network's changing energy landscape. Using the DET, I develop a class of continuous-time RNNs with slow-fast timescale dynamics—where some neurons update rapidly while others change slowly—and analyze their "escape times," the durations spent in each memory state, and memory capacity. The escape time analysis shows the conditions necessary for state transitions and a phase transition from static to dynamic memory when inter-memory interaction strength increases. The memory capacity shows increase with the strength of inter-memory interactions, enabling deriving networks with long-sequence storage capacities that *exponentially* outperform the linear capacity of current sequence networks. Next, I show how local biologically plausible learning rules are derived from the energy function adapting existing synaptic memories based on the input provided to the network. These new networks could potentially transform tasks requiring adaptive long-term memory retention.
The Wave Theory conceptualizes the binding of input and task-relevant variables in RNNs as propagating neural activity waves. Building upon the DET, it elucidates how synaptic interactions support wave propagation in the neural activity through local interactions. I demonstrate that practical sequence models like the Autoregressive Moving Average models, Elman RNNs, State Space Models (SSMs), and transformers can be derived from the wave theory, suggesting that it serves as a canonical framework for understanding existing sequence processing models. Using the theory, I show hidden traveling waves in Elman RNNs that store memories and mitigate the vanishing gradient problem. Building on the wave theory, I introduce a new class of adaptive and unitary SSMs with rapid and adaptive synaptic strength changes enabling high accuracy with relatively low number of parameters.
Together, the two theories aim to fill a critical gap in understanding how RNNs store and process information over time. With the insights from the theory, we are able to design parameter efficient and performant models for sequential data paving the way for the next generation of AI architectures.
Join the zoom