Context engineering is the practice of shaping the information provided to a large language model (LLM) to improve the quality, relevance, and accuracy of its outputs. It encompasses user-facing techniques for managing the context window, such as retrieval-augmented generation (RAG), structuring inputs, and managing conversation history ("session hygiene"). This is distinct from agent-context-mgmt, which focuses on how an autonomous agent manages its own working memory.
Structuring and Refining Prompts
Subtle Cues and Triggers
Incidental cues in prompts, such as contextual words or metadata outside the main task specification, can steer an LLM's choice of algorithm during code generation The Invisible Lottery.
This "algorithm steering" can lead to significant, systematic shifts in the algorithm families a model selects, creating an "invisible lottery" over non-functional requirements like performance and security The Invisible Lottery.
Directly naming the desired algorithm in the prompt is the most reliable tested method to mitigate this effect The Invisible Lottery.
LLM reasoning behavior can be controlled by specific token-level triggers rather than high-level instructions Mid-Think.
For example, a leading "Okay" token can induce reasoning, while a newline pattern can suppress it Mid-Think.
The Mid-Think prompting format combines these triggers to achieve intermediate-budget reasoning, improving the accuracy-length trade-off Mid-Think.
Example Selection for In-Context Learning
Selecting the most effective examples for in-context learning (ICL) is a key challenge due to limited context window sizes KITE.
Simple nearest-neighbor methods for selecting examples can lead to poor generalization and a lack of diversity KITE.
The KITE framework treats example selection as a query-specific optimization problem, aiming to minimize prediction error for a given query by selecting a diverse set of exemplars KITE.
Input Ordering and Positional Bias
Multimodal LLMs (MLLMs) can exhibit positional bias, where the quality of an output (e.g., a video summary) depends on the position of the corresponding input in the context window A Systematic Evaluation of Positional Bias.
These positional effects are dependent on the specific model and domain, and are not uniformly solved by increasing the visual or generation budget A Systematic Evaluation of Positional Bias.
Leveraging External Context (RAG)
Retrieval Quality and Metrics
For Approximate Nearest Neighbor (ANN) search in RAG, the quality of retrieved results is more important than the exact overlap with the true k-nearest neighbors set (Recall@k) ANN Search: Recall What Matters.
Optimizing for Recall@k can force unnecessary computational overhead, while downstream task performance often remains high even when Recall@k drops significantly ANN Search: Recall What Matters.
The inverse approximation ratio (1/Ratio@k) is a more accurate proxy for the true utility and quality of ANN search results in RAG systems ANN Search: Recall What Matters.
Context Security and Privacy
RAG Vulnerabilities
RAG systems are vulnerable to security risks from poisoned retrieval content DiscourseFlip.
A "discourse-level opinion manipulation" attack can poison a retrieval corpus to influence user opinions across a wide network of related queries, not just a single topic DiscourseFlip.
The DiscourseFlip attack uses a graph-guided agent to allocate a poisoning budget to maximize opinion deviation, and has been shown to be effective while remaining difficult for users to detect DiscourseFlip.
Query Rewriting for Privacy
User queries sent to cloud-hosted LLMs often contain a mix of task-essential information and non-essential sensitive disclosures Need to Know.
Based on the principle of Contextual Integrity, queries can be rewritten to forward only the information that is necessary for the task, preserving privacy Need to Know.
A reinforcement learning framework can be used to train a query rewriter that preserves task-critical information while suppressing unnecessary sensitive data Need to Know.
Robustness to Injection Attacks
Safety-aligned LLMs are vulnerable to inference-time attacks where short token injections at any point during generation can redirect the model toward harmful outputs Inference-Time Vulnerability Beyond Shallow Safety.
This vulnerability extends beyond "shallow safety" (where alignment is concentrated in the first few tokens) and reveals that a model's internal state alignment does not guarantee robust behavior under perturbation Inference-Time Vulnerability Beyond Shallow Safety.