Skip to main content

Crate ah_ah_ah

Crate ah_ah_ah 

Source
Expand description

ah-ah-ah

Offline token counting with pluggable backends and boundary-aware decomposition.

Two backends are available:

  • Claude (default): Uses ctoc’s 38,360 API-verified Claude 3+ tokens with greedy longest-match via Aho-Corasick. Overcounts by ~4% compared to the real Claude tokenizer — safe for budget enforcement.
  • OpenAI: Uses bpe-openai for exact o200k_base BPE encoding.

Structured content (markdown tables, etc.) can cause greedy tokenizers to match tokens spanning structural boundaries. The Decomposer trait lets you plug in boundary-aware counting. A MarkdownDecomposer is included.

§Quick start

use ah_ah_ah::{count_tokens, Backend, MarkdownDecomposer};

// Raw counting (no decomposition).
let report = count_tokens("Hello, world!", None, Backend::Claude, None);
assert!(report.count > 0);

// With markdown-aware decomposition.
let md = MarkdownDecomposer;
let report = count_tokens("| A | B |\n|---|---|\n| 1 | 2 |", None, Backend::Claude, Some(&md));
assert!(report.count > 0);

Re-exports§

pub use backend::Backend;
pub use decompose::Decomposer;
pub use decompose::MarkdownDecomposer;
pub use error::Error;
pub use error::Result;
pub use tokens::TokenReport;
pub use tokens::count_tokens;

Modules§

backend
Tokenizer backend selection.
decompose
Text decomposition for boundary-aware token counting.
error
Error types for ah-ah-ah.
tokens
Public token counting API.