Expand description
Claude Code session transcript parser.
Claude Code persists every session as line-delimited JSON under
~/.claude/projects/<sanitized-cwd>/<session-id>.jsonl. The exact
schema isn’t formally documented and evolves between Claude Code
releases; we therefore use a permissive parser that:
- Reads each line as a generic
serde_json::Value. - Maps known shapes onto
TranscriptEntryvariants. - Skips unknown / malformed lines with a stderr warn (mirroring
LifecycleStore::read_allpolicy).
§Recognized shapes (Claude Code 2026-04+)
{"type":"user","message":{"role":"user","content":<str|arr>}}{"type":"assistant","message":{"role":"assistant","content":<str|arr>}}{"type":"tool_use","name":<str>,"input":<obj>}(and the nested-in-assistant-content variant){"type":"tool_result","content":<str|arr>}- everything else →
TranscriptEntry::Other(preserved verbatim so distill heuristics can still see the raw shape if needed)
Each entry exposes a normalized text() view (concatenated string
content) so heuristics don’t have to re-walk the message tree.
§What we deliberately do NOT do
- We don’t try to reconstruct turn boundaries (the assistant may
stream multiple
assistantrows for one turn; heuristics handle that). - We don’t merge tool_use / tool_result pairs — the distill layer does, after redaction.
- We don’t load the whole file into memory upfront for huge
sessions — we provide a streaming iterator (
stream) too.
Enums§
- Transcript
Entry - One parsed line from a transcript file.
Functions§
- find_
latest_ for_ cwd - Find the most recently modified
.jsonltranscript underproject_dir_for(cwd, home). ReturnsNonewhen: - parse_
line - Parse a single JSONL line. Returns
Nonefor malformed JSON; returnsSome(Other)for parseable JSON we don’t recognize so the caller can still inspect it. - project_
dir_ for - Resolve the directory Claude Code uses for transcripts of
cwd. - read_
all - Parse the entire transcript at
pathinto memory. Returns Ok with the parsed prefix even when corrupt lines are encountered (those are skipped and reported to stderr). - read_
tail - Parse at most
max_linesraw lines from the end of the transcript. Useful for distill heuristics that only care about recent turns — avoids loading multi-MB transcripts in full.