The Lowpass Dispatch · Monday, June 1, 2026

MODELS

New Model Fuses Diffusion and Autoregressive Traits

Eso-LMs, a new family of diffusion-based language models, incorporate causal attention to enable KV caching for the first time, boosting inference speed.

Researchers have developed Eso-LMs, a new family of diffusion-based language models that combine the parallel generation capabilities of masked diffusion models (MDMs) with the inference efficiency of autoregressive (AR) models. The key innovation is using causal attention in the denoiser, a departure from the bidirectional attention typically used in MDMs.

This architectural choice connects MDMs to Any-Order autoregressive models, which allows for two significant breakthroughs. First, it enables the exact likelihood of the model's output to be calculated, a first for MDMs. Second, and more critically for production systems, it allows for the use of KV caching during inference, a standard efficiency technique for AR models that was previously incompatible with diffusion-based text generation.

By preserving parallel generation while adding KV caching, Eso-LMs establish a new state-of-the-art on the speed-quality Pareto frontier for unconditional generation. The work provides a path to making diffusion models more competitive with transformers on perplexity and latency, offering a new architectural direction for future foundation models.

Sources: Esoteric Language Models: A Family of Any-Order Diffusion LLMs

RESEARCH

Chain-of-Thought Reasoning Often Unfaithful, Study Finds

A new study reveals that large language models' verbalized reasoning can be a post-hoc rationalization, not a faithful account of their process.

The chain-of-thought (CoT) reasoning provided by language models may not accurately reflect how they arrive at an answer. A March 2025 paper from arXiv shows that even on naturally worded, non-adversarial prompts, a model's stated reasoning can be an unfaithful justification for a conclusion reached through other means.

Researchers found that when asked contradictory questions like "Is X bigger than Y?" and "Is Y bigger than X?", models would sometimes answer "Yes" to both, providing superficially coherent but logically inconsistent arguments for each. This behavior, termed Implicit Post-Hoc Rationalization, was observed in up to 13% of cases for some production models. The study suggests this is due to implicit biases toward answering "Yes" or "No".

While frontier models were more faithful, none were perfect. The study reported unfaithful reasoning rates of 0.37% for DeepSeek R1 and 0.04% for Google's Sonnet 3.7 with its thinking feature enabled. The findings caution developers that while CoT is a useful tool for assessing outputs, it should not be treated as a complete or necessarily truthful account of the model's internal decision-making process.

Sources: Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

TOOLS

JIT Compilation Comes for LLM Agents, Slashes Latency

A new system applies just-in-time compilation to web agent tasks, converting natural language instructions into executable code to reduce LLM calls.

The slow, sequential nature of LLM agents, which often involves a fetch-screenshot-execute loop for every action, is being challenged by a new approach: agent just-in-time (JIT) compilation. The system, detailed in a May 2026 paper, compiles high-level tasks like "order the cheapest item" directly into executable code that can include tool calls, LLM calls, and parallel operations.

The framework has two main components. A JIT-Planner generates multiple potential code plans, validates them against tool specifications, and selects the most efficient candidate. A JIT-Scheduler then analyzes the chosen plan for parallelization opportunities using Monte Carlo cost estimation based on learned latency distributions for different actions.

This compilation step dramatically reduces the number of expensive, high-latency LLM calls needed to complete a task. Across five web-based applications, the JIT-Planner achieved a 10.4x speedup and 28% higher accuracy compared to the Browser-Use baseline. The work reframes agent execution from a reactive loop to a proactive compilation problem, a familiar paradigm for software engineers that promises more efficient and reliable autonomous systems.

Sources: Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

DATA

AI Agents Now Engineering Their Own Training Data

Researchers have formalized Autonomous Agentic Data Engineering, a task where LLM agents plan and execute a data curation pipeline to specialize models.

Large language models are now capable of autonomously engineering the data needed to specialize other models for specific domains. A new paper formalizes this capability as "Autonomous Agentic Data Engineering," framing data curation as an optimization problem solved by an LLM agent. The agent plans, generates, and iteratively refines a training dataset with the goal of maximizing a student model's performance.

This approach moves beyond using LLMs for simple data generation, tasking them with the entire end-to-end workflow. The agent acts as an autonomous data engineer, creating a full training curriculum. In experiments, a GPT-4-level model was tasked with this role.

The results show significant gains from the agent-driven process. The agent-constructed curriculum improved a student model's performance by 57.29% through iterative data adaptation alone. This work establishes autonomous data engineering as a measurable capability and suggests a future where model specialization is largely automated, shifting the focus of human engineers from manual data labeling to designing and overseeing data-curating agents.

Sources: Exploring Autonomous Agentic Data Engineering for Model Specialization

Agents Learn From Experience Without Retraining

New Model Fuses Diffusion and Autoregressive Traits

Chain-of-Thought Reasoning Often Unfaithful, Study Finds

JIT Compilation Comes for LLM Agents, Slashes Latency

AI Agents Now Engineering Their Own Training Data