Agent System Prompts

System prompts are crucial for defining an agent's behavior, persona, and operational guidelines within an agent-architecture. They establish the initial instructions for a large language model (LLM), setting its role, constraints, and how it should process subsequent inputs. As a core component of context-engineering, system prompts significantly influence agent performance, reliability, and robustness.

Prompt Design and Optimization

System prompts can be systematically improved through automated optimization, specific techniques, and structured frameworks.

Automated and Online Optimization

Ctx2Skill is a self-evolving framework that autonomously discovers, refines, and selects context-specific skills for language models.
It uses a multi-agent self-play loop with a Challenger, Reasoner, and Judge to evolve skills.
Proposer and Generator agents analyze failure cases to synthesize targeted skill updates.
A Cross-time Replay mechanism helps prevent adversarial collapse and ensures robust skill evolution.

Prompting Techniques and Frameworks

Collab-REC is a multi-agent framework that uses LLM-based agents with distinct roles (e.g., Personalization, Popularity, Sustainability) to generate diverse recommendations.
A non-LLM moderator merges and refines proposals from these agents through iterative constrained refinement.
Hybrid reasoning LLMs are often controlled by high-level "Think/No-think" instructions Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.
Mode switching in these models is largely driven by specific trigger tokens, not just the instructions themselves Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.
For example, an "Okay" token can induce reasoning, while a newline pattern following it can suppress reasoning Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.
Mid-Think is a training-free prompting format that combines these token-level triggers to achieve intermediate-budget reasoning Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.
It consistently outperforms fixed-token and prompt-based baselines in terms of the accuracy-length trade-off Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.
Mid-Think can reduce RL training time by approximately 15% while improving final performance on benchmarks like AIME and GPQA Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers.

Subtle Cues and Algorithm Steering

Incidental prompt cues, such as contextual words or metadata outside the task specification, can steer an LLM's algorithm choice in code generation The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation.
This phenomenon, termed "algorithm steering," refers to cue-induced shifts in algorithm-family distributions even when all outputs pass the same tests The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation.
Experiments show large, systematic shifts (up to 100 percentage points) in algorithm choices, largely consistent with cue semantics The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation.
This creates an "invisible lottery" over critical factors like performance, security, and maintainability of generated code The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation.
Directly naming the desired algorithm in the prompt is the most reliable mitigation tested The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation.

Safety Alignment via Prompting

LLM-assisted reranking can operationalize nuanced objectives in recommender systems beyond traditional engagement or accuracy metrics.
Without constraints, naive prompts in reranking can inadvertently amplify exposure to ideologically extreme or conspiratorial content.
Lightweight prompt-level regularization can reduce the promotion of extreme content and increase ideological diversity with modest relevance loss.
LLMs rerank based on statistical regularities in language rather than a semantic understanding of ideology.
Prompt design should be considered a value-laden rather than a neutral default process.

Prompt Fidelity and Artifacts

When prompting small language models (SLMs) for psychometric assessments, prompt artifacts can frequently overpower the semantic signal.
Prompt artifacts include variations in personas, instructions, items, and option symbols.
In such cases, models predominantly reflect prompt compliance rather than simulated psychological traits.
A prompt variation framework can be used as a diagnostic tool to identify destructive artifacts and isolate semantic understanding.

Persona and Role Design

Anthropomorphization in LLM-based conversational agents involves ascribing human-like qualities to non-human entities.
LLMs routinely generate interactional and linguistic cues, such as first-person self-reference and affective expressions, which can increase user engagement.
While anthropomorphization raises ethical concerns like deception, overreliance, and exploitative relationship framing, some argue it can support autonomy, well-being, and inclusion.
Internal Coherence Maximization (ICM) can generate persona-specific in-context examples to steer a model toward a target group's values without human supervision.
More coherent examples generalize substantially better than incoherent ones, even when individual label accuracy is held constant.
LLM agents can be created with various roles and specialized prompts, accessing different information parts to generate synthetic data.
Listener LLM agents can be conditioned on finetuned conversation goals to cover diverse scenarios.

Model Capabilities and Calibration

LLM refusals to follow benign instructions, like adopting a political position or persona, can signal a capability deficit rather than just safety guardrails.
"Ideological depth" is proposed as a property comprising a model's ability to follow political instructions (steerability) and the feature richness of its internal political representations.
Models with higher ideological depth (more steerable) activate significantly more distinct political features.
Instruction-tuned LLMs are less calibrated than base pre-trained models, and the chat template further aggravates this issue.
LLMs exhibit an "ownership bias," being significantly more confident in their own answers than in identical answers provided by a user.
An inference-time strategy of framing the model's answer as user input during confidence elicitation can reduce overconfidence and improve calibration by up to 26%.

Skill Portability and Security

LLM agents increasingly rely on reusable skills, but these lack portability due to agent frameworks' sensitivity to prompt formatting.
SkCC is a compiler for LLM agents that introduces classical compilation design to skill development.
SkCC uses SkIR, a strongly-typed intermediate representation, to decouple skill semantics from framework-specific formatting, enabling portable deployment across agent frameworks.
A static Optimizer within SkCC enforces security constraints, blocking vulnerabilities before deployment.
SkCC improves pass rates on benchmarks like claude-code and Kimi CLI.

Security and Robustness: Prompt Injection

Prompt injection (PI) attacks pose a significant threat to LLM-based applications, including automatic grading systems.
Attackers can exploit PI vulnerabilities to manipulate systems into assigning artificially high scores, regardless of actual answer quality.
Current LLM-based automatic grading systems remain highly vulnerable to prompt injection attacks.

Memory Poisoning Attacks

Persistent memory in agent-architecture introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
Memory poisoning attacks exploit vulnerabilities in model capabilities, agent-system-prompts design, and agent-architecture From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
These attacks leverage four memory write channels and nine structural vulnerabilities From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
A taxonomy of six classes of memory poisoning attacks has been developed From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
Agents designed to write and retrieve memory more aggressively are more susceptible to these attacks From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
Existing prompt injection defenses are insufficient to cover memory poisoning attacks From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.
MPBench is a benchmark for evaluating memory poisoning attacks From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents.

Key References

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers https://arxiv.org/abs/2601.07036
From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents https://arxiv.org/abs/2606.04329
The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation https://arxiv.org/abs/2606.04057