AgentTape
A flight recorder for Claude Code. Record, replay, diff, regression-test.
Project Stats
The Problem
Claude Code shows you everything it's doing while it works — every file read, every command run, every commit. That part is fine. The problem is what happens after the session ends.
The session becomes unstructured terminal text. You can scroll up. That's it. There's no way to:
Every other layer of the software stack has solved this. Services have distributed traces. Databases have query logs. Deployments have changelogs. AI agents — supposedly the most powerful development tool we've ever built — ship none of that.
The Tape Format
Every session is recorded as a JSONL tape file — one JSON object per line, one line per event. The format is simple enough to read in a text editor and structured enough to parse, diff, and query programmatically.
{"lineType":"meta","runId":"run_abc123","agent":"claude","startedAt":"2026-03-06T10:00:00Z"}
{"lineType":"event","eventType":"read_file","payload":{"path":"src/index.ts"}}
{"lineType":"event","eventType":"read_file","payload":{"path":"src/auth/middleware.ts"}}
{"lineType":"event","eventType":"file_written","payload":{"path":"src/auth/session.ts","linesChanged":42}}
{"lineType":"event","eventType":"command_executed","payload":{"command":"pnpm typecheck","exitCode":0}}
{"lineType":"event","eventType":"command_executed","payload":{"command":"pnpm test","exitCode":0}}
{"lineType":"event","eventType":"run_completed","payload":{"answer":"Done. Refactored session handling, all tests pass."}}Each tape is a self-contained, replayable artifact. It can be committed to version control, compared to another tape, or used as a regression baseline. The runId ties all events to a single session.
Architecture
A TypeScript pnpm monorepo with six scoped packages. Each package has a single responsibility. The CLI composes them.
Data Flow
Claude Code
Running a task
claude-hook
stdin payload
JSONL Tape
session.jsonl
replay
HTML viewer
diff
vs other tape
test
vs baseline
The Four Core Commands
agenttape recordHook installation (one-time)
agenttape hooks install adds three PostToolUse matchers to ~/.claude/settings.json — for Write/Edit, Bash, and Read. Once installed globally, it silently no-ops on any project that hasn't been initialised.Session recording
agenttape record --session sets AGENTTAPE_TAPE_PATH and starts Claude Code. Each hook fires, sends its payload to agenttape claude-hook on stdin, and the event is appended to the tape.Auto-generated HTML viewer
generateTapeHtml(tape) produces a fully self-contained HTML file — no server, no build step, no dependencies — and opens it in the browser automatically.agenttape diffRun the same task twice, then compare. The diff engine looks at tool call sequences, file counts, and output similarity — not raw text — so minor wording changes don't trigger false positives.
$ agenttape diff run-monday.jsonl run-tuesday.jsonl --summary
Diff result: changed
Severity: major
Tool sequence:
- monday: read_file → write_file → run_command
- tuesday: read_file → write_file → write_file → run_command
Differences:
- [major] Tool call count changed (3 → 4)
- [minor] Output drift detected in run_completedagenttape testCopy a tape from a session you're happy with into agent-tests/. Future sessions that deviate from that baseline — different files touched, extra commands, changed output — fail the test. You get the diff and decide if the change is intentional.
# In package.json scripts:
"test:agent": "agenttape test agent-tests/"
# Output on failure:
FAIL agent-tests/auth-refactor.tape.jsonl
Expected: 3 tool calls
Received: 5 tool calls
[major] Unexpected files written: src/auth/legacy.tsClaude Code Hook Configuration
The integration works through Claude Code's native hook system. Three PostToolUse matchers are added to the user's global settings. If AGENTTAPE_TAPE_PATH isn't set, every hook is a silent no-op — zero interference with normal Claude Code usage.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [{ "type": "command", "command": "agenttape claude-hook" }]
},
{
"matcher": "Bash",
"hooks": [{ "type": "command", "command": "agenttape claude-hook" }]
},
{
"matcher": "Read",
"hooks": [{ "type": "command", "command": "agenttape claude-hook" }]
}
]
}
}Key Technical Decisions
Format
→ JSONL over JSON or SQLite
JSONL is append-only — each event is a line. You can tail it live, cat it, grep it. No schema, no migrations, no lock files. Simple enough that anyone can inspect a tape without tooling.
Dependencies
→ Zero runtime deps (outside commander)
A CLI tool that installs globally shouldn't pull in a node_modules tree. All tape I/O, diffing, and HTML generation uses Node built-ins. commander is the only exception for arg parsing.
HTML Viewer
→ Pure function, self-contained
generateTapeHtml(tape) returns a string. No build step, no Webpack, no server. The output HTML file is an artifact you can email, commit, or open offline forever.
Monorepo structure
→ Six scoped packages
Each concern is isolated. The diff engine doesn't know about the CLI. The test runner imports the diff engine but not the Claude integration. This makes each package independently testable and usable.
Roadmap
Short Term
Medium Term
Longer Term
Want to build something like this?
I'm available for new projects and collaborations.