Claude Agents Can Dream Now. Nothing Is the Same.

Harvey's legal AI agents started completing tasks 6x more often after one configuration change.

Not a new model. Not a better prompt. Not a rewrite of their orchestration layer.

A feature called dreaming that lets agents review their own past sessions overnight, reorganize their memory, and wake up the next morning knowing what they got wrong last time.

Anthropic shipped it on May 6th as a research preview for Claude Managed Agents. And whether or not you're building on their managed platform, it's the most important thing to understand about where agent infrastructure is heading.

Every Agent Session You've Built Started From Zero

Take a second and think about what stateless actually costs.

You tune a workflow. It makes a mistake. You fix the system prompt. Two weeks later, it makes the same mistake — that was a different session, a different context window, a different world.

You add a preference: "always convert to metric," "never email the client directly, Slack first," "the staging URL changed." Next session: none of that context exists. The agent is starting fresh, same as day one.

The standard answer is manual memory management. Build a vector store. Write a retrieval layer. Inject relevant context at session start. Maintain a database that slowly fills with conflicting entries nobody has time to clean up.

I wrote about one side of this problem in the flight recorder post — the case for capturing structured logs of what your agents actually do, so you can at least debug failures instead of guessing.

But logging is passive. Logging tells you what happened.

Dreaming tells the agent.

What Dreaming Does (And What It Doesn't)

After each session, structured data is preserved automatically: not just the transcript, but metadata about task outcomes, corrections made mid-session, tool calls that succeeded or failed, and which memory entries were retrieved and actually used.

The dreaming process runs across this data asynchronously — between sessions, not during them.

It produces a reorganized memory store: duplicate entries merged, outdated information pruned, recurring mistakes and preferences flagged and annotated.

Here's the part that matters most: dreaming does not modify model weights. No fine-tuning. No gradient updates. The underlying model is untouched. What changes is the memory layer — the structured context the agent loads at session start.

And before any of those changes go live, a human reviews the proposed diff. You approve each change, reject it, or modify it before it takes effect.

The agent shows its work. Humans decide what sticks.

That distinction matters. Fully autonomous self-improvement sounds impressive right up until an agent decides to update its memory based on an outlier session and bakes in a bad assumption at scale. The review step is not a concession to caution. It's how you catch that before it compounds across every run after it.

The Numbers Worth Taking Seriously

Harvey turned dreaming on for their legal AI agents. Task completion rates went up roughly 6x.

Wisedocs used the related outcomes feature on medical document review workflows. Review time dropped 50%.

Neither of these is a company running demos. Harvey processes documents for law firms. Wisedocs processes medical records for insurers. Real workflows, real error stakes, real clients.

A 6x task completion lift in legal AI is not a rounding error.

That's the difference between a workflow you'd trust in a high-stakes environment and one you'd still keep a paralegal watching, just in case. The 50% drop in review time is the same story — what happens when agents stop repeating the exact mistakes they made last week.

Now imagine three months of that compounding. An agent that starts with basic rules and improves its memory every cycle ends up carrying deeply domain-specific context — failure patterns, team preferences, institutional knowledge — that would take a new hire months to absorb. Except it doesn't forget when someone leaves.

The Uncomfortable Take

The whole industry has been treating agent memory as an infrastructure problem to solve at setup and maintain manually forever.

Build a vector store. Write retrieval. Inject context at session start. Pay someone to keep the database clean. Pray retrieval surfaces what's actually relevant.

That works until agents get serious usage. Memory grows. Retrieval gets noisier. Contradictory entries accumulate. You end up with agents that "have memory" but still produce wrong outputs because the right fact got buried under a dozen stale ones nobody cleaned up.

Dreaming is memory curation that runs itself.

The agent does the consolidation pass, shows you the proposed changes, and asks for sign-off. You spend ten minutes reviewing a diff instead of hours debugging why the agent is confused about the same thing it was confused about three weeks ago.

The teams treating this as a nice-to-have will spend the next six months getting outpaced by teams that turned it on, let the memory compound, and ended up with agents that are measurably better at their specific workflows in ways that aren't replicable overnight.

Agents that improve every week are not in the same category as agents that stay flat.

Where This Fits in the Bigger Picture

Two other features moved from research preview into public beta alongside dreaming: outcomes-based self-checking and multi-agent orchestration.

Outcomes gives agents a structured way to evaluate whether a task actually succeeded, not just whether it reached the final step. Multi-agent orchestration breaks complex work across specialized agents with their own context and tools.

The three slot together as a system:

Outcomes tells each agent whether it won or lost on a given task
Multi-agent orchestration routes work to the right specialist
Dreaming synthesizes what each specialist learned into a memory layer that compounds over time

That's not a feature list. That's an architecture for building agents that get better without you having to rebuild them.

I covered how Skills and subagents rewrote my local development workflow in yesterday's post — same principle at the local level: systematize what works, stop repeating yourself from scratch. Dreaming is that principle applied to long-running production deployments.

Start Using It

Dreaming is available now in research preview for Pro, Max, Team, Enterprise, and Claude API users building on Managed Agents.

You don't need to change your existing sessions or rebuild anything. It layers on top of whatever managed agent setup you already have.

You need enough session history for the consolidation to have something to analyze. A handful of full runs is enough to start. The more sessions you accumulate, the more patterns dreaming can find and the more useful the proposed diff becomes.

The cycle:

// 1. sessions run → logs captured automatically
// 2. trigger dreaming → agent analyzes logs, proposes updated memory
// 3. review the diff → approve, reject, or edit each proposed change
// 4. apply → updated memory live for the next session

If you're already running Managed Agents with any persistent memory, there's no reason to wait. The overhead is reviewing a diff. The upside is agents that stop making the same mistakes.

The first memory diff is the most interesting one to read. You'll see what the agent actually noticed — which mistakes it flagged, which preferences it picked up, which context it decided was worth preserving. That diff tells you more about how your workflow is actually performing than any dashboard metric you're tracking.

A stateless agent is a tool. An agent that improves between runs is a collaborator.

The session that forgets is over.

Stop running stateless. Turn on dreaming.