tinyagents 0.1.0

# Harness Cache Feature

Caching avoids repeated model, prompt, summary, and artifact work when policy
allows it.

## Responsibilities

- Build stable cache keys.
- Cache prompt rendering.
- Cache embeddings where safe.
- Cache model responses where safe.
- Cache summaries.
- Cache tool artifacts.
- Record cache hits and misses.
- Feed cached token counts into usage and cost accounting.
- Distinguish local response caching from provider prompt caching.
- Preserve provider prompt/KV-cache stability through explicit prompt segment
  boundaries.
- Track which middleware invalidated or preserved provider prompt-cache
  prefixes.
- Emit cache events with key fingerprints rather than full sensitive payloads.
- Support in-memory, store-backed, and provider-specific cache metadata.

## Source Inspiration

LangChain core has a beta local model cache with `lookup`, `update`, and async
variants:

- <https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/caches.py>

Provider prompt caching is different. It usually affects provider billing and
usage metadata, not whether RustAgents skips the provider call entirely.

## Provider Prompt And KV Cache

Provider prompt caching, prefix caching, and KV-cache reuse are first-class
targets. The harness must make it hard to accidentally invalidate a large stable
prefix by inserting volatile context near the front of a request.

Prompt assembly should support explicit segments:

```rust
pub struct PromptSegment {
    pub id: PromptSegmentId,
    pub cache_role: CacheRole,
    pub content: Vec<Message>,
    pub fingerprint: PromptFingerprint,
}

pub enum CacheRole {
    StablePrefix,
    StableButProviderSpecific,
    VolatileTail,
    NeverCache,
}
```

Stable prefix segments are for content that should remain byte/token stable
across many turns:

- system prompts
- policy and safety text
- reusable developer instructions
- tool declarations and schemas
- structured output schemas
- long-lived examples
- durable project or tenant context

Volatile tail segments are for content likely to change every turn:

- latest user message
- current retrieved documents
- timestamps and run ids
- tool results
- scratchpads and temporary reasoning traces
- per-run configurable metadata

Middleware that edits prompts must report whether it changed the stable prefix
or only the volatile tail. This lets tests, traces, and cost accounting explain
why provider prompt-cache hits were preserved or lost.

## KV-Cache-Safe Layout Rules

Request builders and middleware should follow these rules:

- never insert timestamps, run ids, random ids, or dynamic retrieval output into
  a stable prefix by default
- append volatile context after stable instructions and schemas
- keep stable tool/schema serialization canonical and deterministic
- preserve segment ordering unless a middleware explicitly declares a cache
  layout migration
- fingerprint prompt segments separately from the full request
- include middleware policy fingerprints when a middleware can affect
  model-visible bytes
- emit `cache.layout_preserved`, `cache.layout_changed`, and
  `cache.prefix_invalidated` events for observability

Regression tests should be able to assert that a prompt edit preserves the
stable prefix fingerprint even if the full request changes.

## Cache Policy

```rust
pub struct CachePolicy {
    pub enabled: bool,
    pub ttl: Option<Duration>,
    pub scope: CacheScope,
    pub include_tools: bool,
    pub include_model_responses: bool,
    pub preserve_provider_prefix: bool,
    pub stable_prefix_min_tokens: Option<usize>,
}
```

Cache keys must include every behavior-affecting input: model, messages, tools,
tool schemas, response format, provider options, and relevant metadata. Unsafe
or side-effecting tool calls should not be cached by default.

## Cache Key Inputs

Model response cache keys should include:

- provider and model id
- canonical serialized messages
- content block order and ids when ids affect behavior
- tool declarations and schemas
- tool choice
- response format
- normalized model settings
- provider options
- relevant metadata/configurable values
- prompt template version
- prompt segment ids and segment fingerprints
- provider prompt-cache options
- middleware version or policy fingerprint when middleware changes requests

Cache keys should store fingerprints, not raw prompts, where the backing store
may be inspected by humans or external systems.

Embedding cache keys should include provider, model, input text, document/query
mode, requested dimensions, preprocessing version, and provider options. A query
embedding and document embedding for the same text must not share a key unless
the provider adapter explicitly declares they are equivalent.

## Cache Decisions

Every lookup should produce a decision:

- disabled by policy
- skipped because request is unsafe
- miss
- hit
- stale
- provider prefix preserved
- provider prefix invalidated
- write skipped
- write completed

The usage feature should record provider prompt-cache hits separately from local
response-cache hits.