tinyjuice 0.2.1

Pluggable token compression for OpenHuman.
Documentation
# Conversation Compression Plan

## Goal

Add optional conversation-level compaction primitives inspired by Hermes without
polluting the core content compressor or taking ownership of OpenHuman session
persistence.

## Boundary

Conversation compression is not the same as tool-output compression.

OpenHuman already runs conversation-level middlewares on the tinyagents graph:
`ContextCompressionMiddleware` (live history summarization,
`src/openhuman/tinyagents/summarize.rs`), `MessageTrimMiddleware`, and
`MicrocompactMiddleware` (clears older tool-result bodies). These are the
integration seams for this plan — the helpers below should be consumable from
those middlewares, and the tool-result digest must coordinate with
microcompaction so the same messages are not processed twice.

TinyJuice core can provide:

- provider-neutral message types
- token-budget helpers
- deterministic tool-result digests
- boundary and tool-pair safety helpers
- summary prompt and structured summary contracts
- deterministic fallback summary
- prompt-cache hints
- context usage breakdown shapes

OpenHuman or another host should own:

- session database mutation
- compaction leases
- provider-specific request rendering
- LLM summary provider choice
- measured usage data
- archive or child-session persistence

## Proposed Modules

```text
src/conversation/types.rs
src/conversation/budget.rs
src/conversation/tool_digest.rs
src/conversation/boundary.rs
src/conversation/summary.rs
src/cache_hints.rs
src/observability.rs
```

These modules should be feature-gated or kept dependency-light.

## P0: Deterministic Safety Primitives

Add pure helpers first:

- `shrink_json_string_leaves()`
- `select_tail_by_budget()`
- head protection and decaying head helpers
- tool-call/tool-result boundary alignment
- orphaned tool result sanitizer
- latest real user anchor
- latest visible assistant anchor
- manual partial split and rejoin helpers

Acceptance:

- JSON argument shrinking preserves parseability.
- Non-JSON argument text is not corrupted.
- Retained tool results have retained parent tool calls.
- Tail selection is deterministic and budget-aware.
- Partial compaction keeps the requested last N user exchanges.

## P1: Conversation Tool-Result Digest

Add deterministic old-tool-output digesting:

- pair assistant tool calls with tool results by call id
- deduplicate identical older tool results by stable hash
- replace old large terminal outputs with command, exit code, line count, and
  byte count
- replace old read-file results with path, line range, and size metadata
- replace old search results with pattern, match count, and top evidence
- keep recent tail tool results exact

Acceptance:

- Duplicate read-file outputs keep only the newest full copy.
- Old terminal output becomes a concise digest.
- Recent tail output remains verbatim.
- Sensitive keys and values are redacted before digest persistence.

## P1: Prompt Cache Hints

Add cache support without mutating provider payloads:

- `PromptCacheHint`
- stable static-prefix cache key from system prompt plus sorted tool schemas
- Anthropic-compatible carrier selection as hints only
- live-zone metadata shape for frozen prefixes and mutable blocks

Acceptance:

- Tool order does not change static-prefix cache key.
- Cache hints are reported separately from compression steps.
- Frozen byte ranges are left for adapters to splice byte-identically.

## P2: Summary Contract

Add optional summary interfaces:

```rust
pub trait SummaryProvider {
    fn summarize(&self, request: SummaryRequest)
        -> Result<StructuredSummary, SummaryError>;
}
```

Core support:

- summary prompt construction
- strict structured summary schema
- summary prefix and end marker
- metadata tagging so generated summaries are not treated as real user turns
- deterministic fallback summary from local anchors
- summary failure policy

Acceptance:

- Summary text cannot be mistaken for a fresh active user request.
- Auth and network summary failures preserve original messages.
- Deterministic fallback says it is local and may be incomplete.
- Repeated compactions update previous summaries instead of nesting them.

## P2: Context Breakdown

Add host-facing usage breakdown structs:

- system prompt
- tool definitions
- rules/context files
- skills
- MCP tools
- subagents
- memory
- conversation

Acceptance:

- Hosts can show why compression triggered without logging raw prompt text.
- Measured provider usage can override rough estimates in reports.

## What Not To Add

- Mandatory LLM summarization in core.
- Provider SDKs in TinyJuice.
- Session database writes.
- Compression leases implemented against OpenHuman storage inside core.
- Prompt cache mutation mixed into lossy compression APIs.