every-other-token 4.0.0

A real-time LLM stream interceptor for token-level interaction research
Documentation

every-other-token

CI Coverage crates.io docs.rs License: MIT Rust GitHub Stars

A real-time LLM token stream interceptor for interpretability research. Sits between your application and the model, mutates tokens mid-generation, captures per-token confidence and perplexity signals, and renders results in a zero-dependency terminal or web UI.


What it does

Aggregate benchmarks measure final outputs. every-other-token measures what happens during generation — token by token, position by position — with confidence scores, perplexity signals, and cross-provider structural comparison running simultaneously.

It directly enables four research directions:

  1. Semantic fragility — At what perturbation rate does coherent reasoning collapse?
  2. Cross-provider divergence — Do OpenAI and Anthropic produce structurally different token sequences for identical prompts?
  3. System prompt sensitivity — How much does framing shift per-token confidence distributions?
  4. Chaos resilience — Do models self-correct when every other token is randomly mutated?

How it works

The tool intercepts the SSE (server-sent events) stream produced by the provider API. Each chunk is parsed into individual tokens. For every token a decision is made — based on a Bresenham-spread rate schedule — whether to apply the active transform. The enriched event is then routed to the terminal renderer, web UI, WebSocket collaboration room, JSON-stream output, or replay recorder simultaneously.

Pipeline overview

CLI (clap) ──► main.rs
                  │
                  ▼
           TokenInterceptor (lib.rs)
           │              │
           ▼              ▼
    OpenAiPlugin    AnthropicPlugin
    (SSE + logprobs) (SSE)
           │
           ▼  per token
  process_content_logprob()
  ├── confidence  = exp(logprob)
  ├── perplexity  = exp(-logprob)
  ├── alternatives = top_logprobs[0..N]
  └── transform   = Transform::apply(token)
           │
           ├──► web_tx  ──► SSE ──► browser (web.rs)
           ├──► collab  ──► WebSocket ──► room participants (collab.rs)
           ├──► stdout  ──► terminal renderer (render.rs)
           ├──► Recorder ──► JSON replay file (replay.rs)
           └──► HeatmapExporter ──► CSV (heatmap.rs)

Research mode (research.rs)
  run_research() × N ──► ResearchOutput (JSON)
  run_research_suite() ──► batch over prompt file

Self-tune (feature = "self-tune")
  TelemetryBus ──► AnomalyDetector ──► TuningController ──► SnapshotStore

Self-modify (feature = "self-modify")
  TaskGen ──► ValidationGate ──► Deploy ──► Memory

Key modules

Module Responsibility
lib.rs TokenInterceptor, TokenEvent, stream parsing, retry logic
transforms.rs All transform strategies (Reverse, Noise, Chaos, Chain, …)
providers.rs ProviderPlugin trait, OpenAI and Anthropic SSE wire types, MCP types
web.rs Embedded HTTP/1.1 server, SSE fan-out, WebSocket upgrade
collab.rs Room store, participant management, surgery edits, chat, recording
research.rs Headless research loop, aggregate statistics, A/B mode, heatmap export
store.rs SQLite-backed experiment persistence, cross-session dedup cache
heatmap.rs Per-position confidence matrix → CSV
replay.rs JSON recording and deterministic replay
render.rs Terminal ANSI colouring, confidence indicators, visual-mode formatting
config.rs ~/.eot.toml / ./.eot.toml config file with merge semantics
cli.rs Clap argument definitions and helper functions
error.rs EotError enum — one variant per failure domain

Feature flags

Flag Default Description
(none) Terminal and web UI streaming, all transforms, research mode, collab rooms
sqlite-log Persist experiment runs to a local SQLite database via store::ExperimentStore
self-tune Background PID-based parameter tuning loop + telemetry bus
self-modify Agent loop for automated pipeline improvement (requires self-tune)
intelligence Reserved namespace for future interpretability features
evolution Reserved namespace for evolutionary optimisation
helix-bridge HTTP bridge that polls HelixRouter /api/stats and pushes config patches
redis-backing Write-through Redis persistence for agent memory and snapshots
wasm WASM target bindings via wasm-bindgen

Quickstart

Prerequisites

  • Rust 1.75 or later
  • An OpenAI API key (OPENAI_API_KEY) and/or Anthropic API key (ANTHROPIC_API_KEY)
git clone https://github.com/Mattbusel/Every-Other-Token

cd Every-Other-Token

cargo build --release


export OPENAI_API_KEY="sk-..."

export ANTHROPIC_API_KEY="sk-ant-..."

Basic usage

# Terminal output with per-token confidence color bands

./target/release/every-other-token "What is consciousness?" --visual


# Web UI on http://localhost:8888 — opens browser automatically

./target/release/every-other-token "What is consciousness?" --web


# Headless research: 20 runs, JSON aggregate stats

./target/release/every-other-token "Explain recursion" \

    --research --runs 20 --output results.json


# Side-by-side OpenAI vs Anthropic diff in the terminal

./target/release/every-other-token "Describe entropy" --diff-terminal


# A/B system-prompt experiment with significance testing

./target/release/every-other-token "Tell me a story" \

    --research --runs 20 \

    --system-a "Be poetic." --system-b "Be literal." \

    --significance

Shell completions

./target/release/every-other-token --completions bash >> ~/.bash_completion

./target/release/every-other-token --completions zsh  >  ~/.zfunc/_every-other-token

./target/release/every-other-token --completions fish > ~/.config/fish/completions/every-other-token.fish

Dry run (no API key required)

./target/release/every-other-token "hello" --dry-run --transform chaos


Transforms

Name Behavior Deterministic?
reverse Reverses token characters: "hello""olleh" Yes
uppercase Converts to uppercase: "hello""HELLO" Yes
mock Alternating lower/upper per char: "hello""hElLo" Yes
noise Appends a random symbol from * + ~ @ # $ % No (seeded with --seed)
chaos Randomly selects one of the above per token No (seeded with --seed)
scramble Fisher-Yates shuffles token characters No (seeded with --seed)
delete Replaces the token with the empty string Yes
synonym Substitutes from a 200-entry static synonym table Yes
delay:N Passes through after N ms pause Yes
A,B,... Chain: applies A, then B, then … in sequence Depends on chain

Rate control--rate 0.5 (default): every other token is transformed. Uses a Bresenham spread for deterministic, uniform distribution at any rate. Combine with --seed for fully reproducible runs.

Stochastic rate--rate-range 0.3-0.7 picks a random rate in [min, max] per run.

Confidence gating--min-confidence 0.8 only transforms tokens whose API confidence is below 0.8. High-confidence tokens pass through unchanged.


Web UI modes

Launch with --web to open the single-page application:

Mode Description
Single Live token stream with per-token confidence bars and perplexity pulse
Split Original vs transformed side by side
Quad All four transforms applied simultaneously in a 2×2 grid
Diff OpenAI and Anthropic streaming the same prompt in parallel, diverging positions highlighted
Experiment A/B mode: two system prompts, same user prompt, live divergence map
Research Aggregate stats dashboard: perplexity histogram, confidence distribution, vocab diversity

Configuration file

Create ~/.eot.toml (global) or .eot.toml in the working directory (local wins over global):

provider     = "anthropic"

model        = "claude-sonnet-4-6"

transform    = "reverse"

rate         = 0.5

port         = 8888

top_logprobs = 5

system_a     = "You are a concise assistant."

CLI flags override config file values. The rate field is clamped to [0.0, 1.0] with a warning if out of range.


CLI reference

USAGE:
    every-other-token [OPTIONS] <PROMPT> [TRANSFORM] [MODEL]

ARGS:
    <PROMPT>      Input prompt (use "-" to read from stdin)
    [TRANSFORM]   Transform type [default: reverse]
    [MODEL]       Model name [default: gpt-3.5-turbo]

OPTIONS:
    --provider <PROVIDER>           openai | anthropic | mock [default: openai]
    --visual, -v                    Enable ANSI confidence-colored output
    --heatmap                       Enable token importance heatmap
    --orchestrator                  Route through MCP pipeline at localhost:3000
    --web                           Launch web UI instead of terminal
    --port <PORT>                   Web UI port [default: 8888]
    --research                      Headless research mode
    --runs <RUNS>                   Number of research runs [default: 10]
    --output <FILE>                 Research output JSON path [default: research_output.json]
    --system-a <PROMPT>             System prompt A (A/B mode)
    --system-b <PROMPT>             System prompt B (A/B mode)
    --top-logprobs <N>              Top alternative tokens per position (0–20) [default: 5]
    --db <FILE>                     SQLite database for experiment persistence
    --significance                  Compute Welch's t-test across A/B confidence distributions
    --heatmap-export <FILE>         Export per-position confidence heatmap to CSV
    --heatmap-min-confidence <F>    Minimum mean confidence for heatmap rows [default: 0.0]
    --heatmap-sort-by <FIELD>       Sort heatmap by "position" or "confidence" [default: position]
    --record <FILE>                 Record token events to JSON replay file
    --replay <FILE>                 Replay token events from file (no API call)
    --rate <F>                      Fraction of tokens to transform (0.0–1.0) [default: 0.5]
    --rate-range <MIN-MAX>          Stochastic rate from interval (e.g. "0.3-0.7")
    --seed <N>                      Fixed RNG seed for reproducible Noise/Chaos runs
    --baseline                      Compare against stored "none" transform runs in SQLite
    --prompt-file <FILE>            Batch research: one prompt per line
    --diff-terminal                 Parallel OpenAI + Anthropic streams side by side
    --json-stream                   One JSON line per token to stdout
    --dry-run                       Validate transform without calling any API
    --template <TPL>                Prompt template with {input} placeholder
    --min-confidence <F>            Only transform tokens with confidence below this value
    --format <FMT>                  Research output format: "json" or "jsonl" [default: json]
    --collapse-window <N>           Confidence collapse detection window [default: 5]
    --orchestrator-url <URL>        MCP orchestrator base URL [default: http://localhost:3000]
    --max-retries <N>               API retry attempts on 429/5xx [default: 3]
    --completions <SHELL>           Generate shell completions (bash/zsh/fish/…)
    --log-db <FILE>                 SQLite experiment log (requires sqlite-log feature)

API reference (library)

Add to Cargo.toml:

[dependencies]

every-other-token = "4"

Core types

use every_other_token::{TokenInterceptor, TokenEvent};
use every_other_token::providers::Provider;
use every_other_token::transforms::Transform;

let mut interceptor = TokenInterceptor::new(
    Provider::Openai,
    Transform::Reverse,
    "gpt-4".to_string(),
    false,  // visual_mode
    false,  // heatmap_mode
    false,  // orchestrator
)?
.with_rate(0.5)
.with_seed(42);

interceptor.intercept_stream("What is entropy?").await?;

Web UI / channel mode

use every_other_token::{TokenInterceptor, TokenEvent};
use tokio::sync::mpsc;

let (tx, mut rx) = mpsc::unbounded_channel::<TokenEvent>();
interceptor.web_tx = Some(tx);
interceptor.intercept_stream("Explain recursion").await?;

while let Some(event) = rx.recv().await {
    println!("{}: {:?}", event.index, event.text);
}

Experiment store

use every_other_token::store::{ExperimentStore, RunRecord};

let store = ExperimentStore::open("experiments.db")?;
let id = store.insert_experiment("2026-01-01", "my prompt", "openai", "reverse", "gpt-4")?;
store.insert_run(id, &RunRecord {
    run_index: 0,
    token_count: 100,
    transformed_count: 50,
    avg_confidence: Some(0.82),
    avg_perplexity: Some(1.2),
    vocab_diversity: 0.73,
})?;

Performance

  • Sub-millisecond per-token processing overhead (Bresenham spread, no heap allocation per token)
  • Zero-copy async streaming via Tokio with back-pressure on the broadcast channel
  • ~4 MB release binary with LTO + strip = true
  • Parallel provider streams via tokio::select! / tokio::join!
  • Exponential back-off retry on 429 / 5xx responses (up to --max-retries attempts)

Contributing

Contributions are welcome. Please follow these steps:

  1. Fork the repository and create a feature branch from main.
  2. Run cargo fmt and cargo clippy -- -D warnings before committing.
  3. Add tests for any new public API surface. The CI gate requires all tests to pass on stable and the MSRV (1.75).
  4. Open a pull request against main with a clear description of the change and why it is needed.
  5. For significant changes, open an issue first to discuss the design.

Development commands

# Build

cargo build


# Tests (all feature combinations)

cargo test

cargo test --features sqlite-log

cargo test --features self-tune

cargo test --features self-modify

cargo test --features helix-bridge


# Lint

cargo clippy -- -D warnings

cargo clippy --all-features -- -D warnings


# Format check

cargo fmt --check


# Docs (with warnings-as-errors)

RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --open


# Security audit

cargo audit


# Dependency policy check

cargo deny check


# Release build

cargo build --release

Project layout

src/
├── lib.rs              # TokenInterceptor, TokenEvent, stream loop
├── main.rs             # CLI entry point
├── cli.rs              # Clap argument struct + helpers
├── config.rs           # .eot.toml config file support
├── error.rs            # EotError enum
├── providers.rs        # ProviderPlugin trait + wire types
├── transforms.rs       # Transform enum + all strategies
├── render.rs           # Terminal ANSI rendering helpers
├── web.rs              # Embedded HTTP/WebSocket server
├── collab.rs           # Multiplayer room management
├── research.rs         # Headless research loop + stats
├── store.rs            # SQLite experiment persistence
├── heatmap.rs          # Per-position confidence CSV export
├── replay.rs           # Token event recording + replay
├── self_tune/          # (feature: self-tune) PID tuning loop
├── self_modify/        # (feature: self-modify) Agent improvement loop
├── helix_bridge/       # (feature: helix-bridge) HelixRouter HTTP bridge
├── semantic_dedup.rs   # (feature: self-modify) In-session prompt dedup
└── experiment_log.rs   # (feature: sqlite-log) SQLite experiment logger
tests/
├── collab_tests.rs
├── providers_tests.rs
├── transforms_tests.rs
├── store_heatmap_replay_tests.rs
├── self_tune_integration.rs
└── web_integration.rs

License

MIT — see LICENSE.


Ecosystem