Agents Learn to Learn: Skill Optimizers and Evolving Memory Take Center Stage
The new frontier for AI agents is not just better base models, but self-improving "cognitive architectures" that learn and evolve their own skills and memory systems.
Agent capabilities are increasingly defined by their surrounding architecture, not just the base LLM. A wave of new research treats agent "skills" and "memory" as components to be systematically optimized and evolved, much like model weights during training.
The SkillOpt framework introduces a "text-space optimizer" that turns scored agent rollouts into edits on a skill document, accepting changes only if they improve validation scores. On GPT-5.5, it boosted accuracy by up to +24.8 points. Similarly, the PANDO framework uses "online skill distillation" to learn from web navigation trajectories, making agents more efficient over time, while DRIVE disentangles abstract "reasoning skills" from website-specific "interaction skills" to improve generalization.
This evolutionary pressure extends to memory. The M* framework automatically discovers task-optimized "memory harnesses"—Python programs defining data schemas and logic—through code evolution. MemSkill takes a similar approach, evolving a set of memory skills by analyzing and refining routines that failed on hard cases.
This marks a shift from hand-crafting agent control flows to creating systems that learn their own. The focus is moving up a level of abstraction: from prompting for a task to building optimizers that discover the best prompts and memory structures for a whole class of tasks.