
On November 10th, Google Research published a blog post on November 7th, introducing a new machine learning paradigm—Nested Learning—to address the challenge of "catastrophic forgetting" in AI models when continuously learning new knowledge.
While current large language models are powerful, their knowledge is still limited to pre-training data or a limited context window. Unlike the human brain, they cannot continuously learn new skills without forgetting old knowledge through "neuroplasticity" (the brain's ability to adjust its structure and function based on experience, learning, and environmental changes).
Directly updating models with new data often leads to "catastrophic forgetting," meaning that after learning a new task, the performance on old tasks deteriorates significantly. To address this fundamental challenge, researchers at Google Research proposed a novel solution. In a paper published at NeurIPS 2025, Google introduced the "Nested Learning" paradigm. This paradigm fundamentally unifies the traditionally separate concepts of model architecture and optimization algorithms.
This paradigm posits that a complex machine learning model is essentially a series of nested or parallel optimization problems, each with its own independent "context flow" and update rate. This perspective reveals a novel design dimension, allowing researchers to build AI components with greater computational depth, effectively mitigating the catastrophic forgetting problem.
Based on the nested learning paradigm, the research team proposed two specific technical improvements:
First, "deep optimizers," which treat the optimizer itself as a learning module and improve its underlying objective function, making it more robust to imperfect data (the ability of a system or process to maintain its functionality and stability in the face of uncertainty, change, incorrect input, or anomalies).
Second, "continuous memory systems" (CMS), which treat the model's memory as a spectrum composed of modules with different update frequencies, smoothly transitioning from short-term memory to long-term memory, creating a richer and more efficient continuous learning memory system.
To validate these theories, the research team designed and implemented a proof-of-concept model called "Hope." Hope is a self-modifying recurrent network based on the Titans architecture. It deeply integrates a continuum memory system (CMS), enabling it to optimize its own memory through self-referential processes, thus achieving near-infinite-level context learning.
In a series of publicly available language modeling and commonsense reasoning tasks, the Hope architecture exhibits lower perplexity and significantly higher accuracy than modern recurrent models and standard Transformers. Particularly in the Needle-In-Haystack (NIAH) benchmark, which tests a model's ability to memorize long texts, Hope demonstrates superior memory management capabilities, proving that the continuum memory system is an effective solution for handling extremely long information sequences, paving the way for truly "learning from the past" AI. NIAH is a benchmark used to evaluate the long text understanding and information retrieval capabilities of large language models, requiring the model to accurately identify and answer a specific information point (needle) from a very long text (haystack).