cai-ingest 0.1.0

Data ingestion parsers for Coding Agent Insights
Documentation

cai-ingest

Data ingestion from AI coding platforms.

Overview

cai-ingest provides parsers and scanners for extracting AI coding interactions from various sources including Claude Code conversations, Codex CLI history, and Git commit history.

Key Features

  • Claude Code parser - Parse conversation JSON files
  • Codex CLI parser - Parse Codex command history
  • Git scanner - Extract commits as entries
  • Unified API - Single Ingestor for all sources
  • Extensible - Easy to add new parsers

Usage

Using Ingestor

use cai_ingest::{Ingestor, IngestConfig};
use cai_storage::MemoryStorage;

#[tokio::main]
async fn main() -> cai_core::Result<()> {
    let storage = MemoryStorage::new();

    // Default configuration (Claude + Codex)
    let ingestor = Ingestor::with_defaults();
    let count = ingestor.ingest_all(&storage).await?;

    println!("Ingested {} entries", count);
    Ok(())
}

Individual Parsers

use cai_ingest::{ClaudeParser, CodexParser, GitScanner};

// Parse Claude conversations
let claude_parser = ClaudeParser::with_default_path()?;
let entries = claude_parser.parse_all()?;

// Parse Codex history
let codex_parser = CodexParser::with_default_path()?;
let entries = codex_parser.parse_all()?;

// Scan Git repository
let git_scanner = GitScanner::new("~/my-project");
let entries = git_scanner.scan()?;

Data Format

Each parser produces cai_core::Entry instances:

use cai_core::{Entry, Source, Metadata};
use chrono::Utc;

Entry {
    id: "unique-id".to_string(),
    source: Source::Claude,
    timestamp: Utc::now(),
    prompt: "Original prompt".to_string(),
    response: "AI response".to_string(),
    metadata: Metadata {
        file_path: Some("src/main.rs".to_string()),
        repo_url: Some("https://github.com/user/repo".to_string()),
        commit_hash: Some("abc123".to_string()),
        language: Some("Rust".to_string()),
        extra: std::collections::HashMap::new(),
    },
}

Usage

Add to your Cargo.toml:

[dependencies]
cai-ingest = { path = "../cai-ingest" }

Design Decisions

  • Streaming: Support large files without loading entirely into memory
  • Error recovery: Continue parsing on malformed entries
  • Extensible: Easy to add new source parsers

Testing

cargo test -p cai-ingest

License

MIT OR Apache-2.0