These concepts frame agentic computation at the level of an operating system or computer architecture, providing foundational abstractions for managing agent execution, communication, and safety.
Agent libOS: A library-OS-inspired runtime that treats an agent as a schedulable AgentProcess with identity, state, capabilities, and audit records.
OpenAgenet/OAN: An open infrastructure for trusted agent interconnection, providing a protocol-neutral trust layer for identity provenance, governance, discovery authorization, and signed invocation before agents interact OpenAgenet/OAN: Open Infrastructure for Trusted Agent Interconnection.
Lab Agent Protocol (LAP): A protocol for agent-to-instrument interaction in autonomous science, featuring primitives for capabilities, reservations, safety handshakes, and reproducible results.
Embodied Agent Architectures
These architectures focus on agents that interact with physical or simulated environments, often incorporating multimodal perception and control.
SCOPE: A modular agent for natural-language, open-vocabulary pan-tilt-zoom (PTZ) camera control and visual scene understanding, designed for edge deployment.
The fundamental agent loop involves cycles of observation, thought, and action. Modern architectures add sophisticated mechanisms for planning, memory management, and adaptive control.
ReAct: The ReAct (Reason-Act) framework is a foundational pattern that combines reasoning and action generation in an interleaved manner.
Agentic Harnesses
Harnesses wrap and augment existing models with structured execution, verification, and repair capabilities without requiring model retraining.
Microsoft Agent Framework (MAF) Agent Harness: A production-oriented harness providing first-class patterns like automatic context compaction, instruction merging, session-scoped file memory, plan vs. execute modes, skill discovery, and background agent delegation Microsoft Agent Framework at BUILD 2026: Agent Harness, Hosted Agents, CodeAct, and more.
Deontic Agentic Reasoning (DAR): An agentic setup where a model interacts with statutes on demand to perform deontic reasoning (applying rules to facts), which can be improved by agentic harnesses DAR: Deontic Reasoning with Agentic Harnesses.
MUSE: A multimodal unified structured execution harness that wraps any off-the-shelf MLLM with composable modules for task representation, visual processing, tool use, parsing, verification, and verifier-guided repair.
Cognitive Memory Management: The SALIMORY framework trains a single model to manage a cognitively-structured memory (user facts, preferences, working memory) using a hierarchical stage-wise process reward for distinct memory operations SaliMory: Orchestrating Cognitive Memory for Conversational Agents.
Constitutional AI Verification: Glass Box is a runtime verification layer that intercepts AI policy actions and evaluates them against physics-grounded constitutional constraints and safety invariants before execution.
World Models
World models are internal simulators that learn the structure and dynamics of an environment, enabling agents to predict, plan, and reason within learned representations.
Planning and Execution Patterns
These patterns focus on how agents decompose tasks, generate steps, and manage resources.
Planner-Executor and Multi-Stage Workflows
Planner-Executor: A common pattern where a "planner" LLM decomposes a task into steps and an "executor" carries them out.
Multi-agent systems decompose complex tasks among multiple, often specialized, agents that coordinate to achieve a goal.
Communication and Coordination
Streaming Communication: The StreamMA system streams each reasoning step to downstream agents as it is generated, reducing latency and improving effectiveness by preventing error-prone late steps from misleading other agents Streaming Communication in Multi-Agent Reasoning.
Dynamic Ensembling: Dynamic Logit-Level Gating (DLLG) is a framework where a lightweight gating module learns to predict token-level fusion weights to ensemble multiple specialized LLM experts without retraining them DLLG: Dynamic Logit-Level Gating of LLM Experts.
Swarm Training: AgentJet is a distributed swarm training framework with a decoupled architecture where server nodes host and optimize models while client nodes execute agents, enabling heterogeneous multi-model RL, fault-tolerance, and live code iteration AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning.
Policy Sharing Tradeoffs: In multi-agent RL, isolated-policy training (separate parameters per role) can reach higher peak accuracy but risks collapse, while shared-policy training can be "captured" by a dominant role.
System Stability and Safety
Safety Under Scaffolding: Agentic scaffolds like ReAct or multi-agent debate can alter a model's measured safety, with effects varying significantly by model, undermining the utility of a single composite safety score Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety.
Bilevel Autoresearch: A framework where an outer autoresearch loop improves an inner autoresearch loop by reading its code and traces, identifying bottlenecks, and generating injectable search mechanisms at runtime Bilevel Autoresearch: Meta-Autoresearching Itself.
Self-Play: A prover-conjecturer system can enable self-improvement in theorem proving. SSR trains software agents by having a single agent iteratively inject and repair bugs of increasing complexity.
Parthenon: A self-evolving legal-agent framework that converts scored failures into task-agnostic edits to skills, tools, and knowledge without changing model weights, mimicking how a firm refines its playbooks Parthenon Law: A Self-Evolving Legal-Agent Framework.
Agent Instruction Protocol (AIP): Models a skill as a directed execution graph with nodes backed by deterministic scripts or natural language, improving reliability by giving the agent runnable units instead of prose to interpret AIP: A Graph Representation for Learning and Governing Agent Skills.
DistIL: A distributional variant of DAgger that uses a forward cross-entropy objective to learn from rich feedback (e.g., execution traces, expert corrections) by propagating future expert-student disagreement back to earlier decisions Reinforcement Learning from Rich Feedback with Distributional DAgger.
Modality-Aware Credit Assignment (MoCA): An RL framework that improves multimodal synergy by decoupling generation into perception and reasoning steps, allowing it to reward perceptual fidelity independently of reasoning outcomes Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning.