Google's Titans Long-Context AI Uses Surprise Memory System To Outperform GPT-4 On BABILong Tests

On December 4, 2025, Google Research introduced the Titans architecture and MIRAS framework for long-context AI models. Led by Ali Behrouz, Meisam Razaviyayn, and Vahab Mirrokni, the work targets large-scale sequence modeling and describes new memory mechanisms that update model parameters while processing streaming data.

Key Details

Google Research positions Titans as a specific sequence model architecture and MIRAS as a broader theoretical framework. Both aim to combine transformer-level accuracy with the linear-time efficiency of recurrent models.

Titans introduces a neural long-term memory module implemented as a deep multi-layer perceptron rather than a single fixed vector state.
The architecture combines this long-term memory with an attention-based short-term context, enabling selective recall of summarized past information at inference time. This aligns with related work on the separation of short-term and long-term memory in continual learning.
Titans updates its long-term memory using a "surprise metric" derived from gradients that measure how much new inputs differ from the model's expectations.
The authors describe two memory controls in Titans: momentum over recent surprise and adaptive weight decay for controlled forgetting.
MIRAS formalizes sequence models as associative memories defined by four components: memory architecture, attentional bias, retention gate, and memory algorithm.
Within MIRAS, forgetting mechanisms are interpreted as regularization terms that balance new learning against preservation of previous states.
Using MIRAS, Google Research introduces three attention-free variants named YAAD, MONETA, and MEMORA, each with distinct loss functions. YAAD applies Huber loss, MONETA uses generalized norms, and MEMORA constrains memory to behave like a probability distribution.
The blog reports that Titans and MIRAS variants outperform Transformer++, Mamba-2, and Gated DeltaNet on multiple language modeling benchmarks.
According to the authors, these models also match or exceed baselines on genomic modeling and time-series forecasting tasks.
On benchmarks including C4, WikiText, and zero-shot reasoning tasks such as HellaSwag and PIQA, the authors report higher accuracy and lower perplexity than comparison architectures.
In the BABILong benchmark for extreme long-context tasks, Titans is reported to surpass all evaluated baselines, including GPT-4, with fewer parameters.
The work demonstrates Titans handling context windows above two million tokens while maintaining linear-time inference and efficient, parallelizable training.

Background Context

The Titans and MIRAS work builds on known limitations of the transformer architecture for very long sequences, particularly its use of attention. Standard transformers have quadratic compute and memory costs that grow quickly with context length.

Prior research on linear recurrent neural networks and modern state space models compresses context into fixed-size states, which can miss fine-grained information in very long sequences.

Google Research positions MIRAS as a unifying view across transformers, linear RNNs, and related architectures through associative memory optimization. In this view, model updates solve an inner optimization problem for each token, while a retention gate limits deviation from past states. The framework extends beyond conventional mean squared error and dot-product similarity, allowing objectives on non-Euclidean objectives and alternative regularizers.

Ablation experiments in the Titans paper compare long-term memory modules with identical parameter counts but different depths. According to the reported results, deeper memory networks achieve lower perplexity and better scaling with sequence length than shallower configurations.

Source Citations

The Titans and MIRAS work appears in two research papers and an accompanying Google Research blog post, which detail the architectures, theoretical framework, and empirical evaluations.

Google Research blog announcement - primary summary of Titans and MIRAS.
Titans architecture paper - technical description of the Titans model and reported experiments.
MIRAS framework paper - formalizes MIRAS and introduces YAAD, MONETA, and MEMORA.
BABILong repository - source of the long-context evaluation tasks used to assess Titans.

Google's Titans Long-Context AI Uses Surprise Memory System To Outperform GPT-4 On BABILong Tests

Key Details

Background Context

Source Citations

More articles