aicx-parser
Transcript parser, chunker, and slicer for AI session transcripts (Claude, Codex, Gemini, Junie). Companion crate of aicx.
aicx-parser is the reusable transcript processing core extracted from the
aicx 0.6.2 workspace. It contains the shared timeline types, chunking logic,
semantic segmentation, frontmatter parsing, and sanitization helpers used by
the aicx CLI and MCP server.
The crate is intentionally lean. It does not own filesystem walking, source
discovery, embedding, retrieval, networking, async runtimes, CLIs, or TUIs.
Consumers load transcript data through their own adapters, convert it into
TimelineEntry, and hand those entries to the parser APIs.
The main chunking path is frame-aware: user messages, agent replies, internal thoughts, and tool calls can be preserved as separate frame kinds while still supporting turn-pair windows and onion-layer slicing for retrieval workflows.
Status
aicx-parser v0.1.0 is prepared path-first. The crate artifacts are ready for
cargo publish --dry-run, while the real publish is intentionally held until
downstream integration verification is complete.
Quickstart
use ;
Adapters are expected to handle IO and source-specific parse failures before calling into this crate. The parser core receives already-normalized timeline entries and returns deterministic in-memory chunks and segments.
Public API Surface
The crate root re-exports the primary types and helpers used by aicx:
chunker:Chunk,ChunkerConfig,ChunkMetadataSidecar,classify_kindfrontmatter:ReportFrontmattersanitize:filter_self_echo,is_self_echo,normalize_querysegmentation:ProjectHashRegistry,TieredIdentity,classify_cwd_tier,infer_repo_identity_from_entry,infer_tiered_identity_from_entry,semantic_segments,semantic_segments_with_registrytimeline:ConversationMessage,ExtractionConfig,FrameKind,Kind,RepoIdentity,SemanticSegment,SourceInfo,SourceTier,TimelineEntrytypes:EntryState,EntryType,IntentEntry,Link,LinkType
Module-level APIs are also public for consumers that need lower-level control:
chunker::chunk_entrieschunker::format_chunk_textchunker::estimate_tokenschunker::chunk_summaryfrontmatter::parsesanitize::validate_read_pathsanitize::validate_write_pathsanitize::validate_dir_path
The public type vocabulary includes the stable axes used by downstream stores:
Kindfor conversation, plan, report, or other chunk classificationFrameKindfor user message, agent reply, internal thought, and tool callRepoIdentityfor owner/repository identitySourceTierfor primary, secondary, fallback, or opaque identity evidenceSemanticSegmentfor repo-aware groups of timeline entries
Feature Flags
aicx-parser keeps its default dependency surface small:
default = []- lean default buildjson-schema- opt in toschemars::JsonSchemaderives on supported public types
The json-schema feature is useful for tools that expose parser types over MCP,
HTTP, or generated contract documentation. Consumers that only chunk or segment
in process can leave it disabled.
Source Format Support
The parser is source-format agnostic. It currently supports normalized timeline
entries produced from these source families in the companion aicx crate:
- Claude JSONL sessions
- Codex session transcripts
- Gemini JSONL sessions
- Junie session transcripts
The filesystem walkers, JSON readers, session discovery logic, and
source-specific adapters live in aicx. This crate starts after that boundary:
given TimelineEntry values, it performs sanitization, segmentation, chunking,
classification, and frontmatter interpretation.
Companion To aicx
aicx-parser is published as a companion library for
aicx, the operator CLI and MCP server for
AI session context ingestion and retrieval.
Use aicx when you want the full product surface: local corpus discovery,
commands, MCP tooling, optional embedding materialization, and operator-facing
workflows. Use aicx-parser when you want the reusable pure-compute transcript
core inside another Rust application.
Dependency Budget
The v0.1.0 crate depends on:
anyhowchronoserdeserde_jsonregex- optional
schemars
It does not bring in async runtimes, HTTP stacks, terminal UI crates, model runtimes, vector databases, or embedding clients.
Release Strategy
The first public crate version is prepared as 0.1.0, but publication is
path-first: downstream integration gets verified before the actual
cargo publish step. This avoids publishing a public API that immediately
needs a breaking correction after real consumer use.
License
aicx-parser is licensed under the Business Source License 1.1 (BUSL-1.1).
See LICENSE.
Authors
- Maciej Gad void@div0.space
- Monika Szymanska hello@vetcoders.io