hh-cli 0.1.2

A terminal-based coding agent runtime
Documentation
# Protocol Normalization Plan

## Background

We observed leaked `</think>` tokens in assistant-visible output.
Current behavior in `src/provider/openai_compatible.rs` already consumes native reasoning fields (`reasoning`, `thinking`, `reasoning_content`) into the thinking channel, but assistant `content` is still forwarded verbatim.
If upstream providers leak reasoning delimiters in `content`, those delimiters are rendered and persisted.

This indicates a protocol contract gap, not just a rendering bug.

## Rationale

### Why this is not a UI bug
- The leak is present in persisted session events (`message` content), not only in display.
- TUI and CLI renderers correctly display what they receive.
- Fixing only UI would leave corrupted session history, replay output, and downstream behavior.

### Why this is not solved by native reasoning fields alone
- Native reasoning fields reduce dependence on tags, but do not guarantee `content` cleanliness.
- Some providers emit mixed payloads:
  - reasoning in structured fields
  - plus tagged or stray delimiters in `content`
- Without normalization, assistant channel remains vulnerable.

### Architectural principle
Normalize at the provider boundary so the runtime can rely on a stable invariant:
- `assistant.content` is user-visible answer text only.
- `thinking` contains reasoning text only.
- Reasoning delimiters do not cross into assistant content.

This keeps core loop, persistence, and UI provider-agnostic and consistent.

## Goals

1. Enforce channel separation invariants in provider adapter.
2. Prevent reasoning delimiter leakage into assistant content.
3. Support both streaming and non-streaming responses.
4. Handle chunk-split tag boundaries robustly.
5. Preserve compatibility with providers that only use native reasoning fields.
6. Avoid duplicate reasoning emission when both native fields and tags are present.

## Non-goals

- Building a full XML parser.
- Reformatting or post-processing natural assistant content beyond reasoning-delimiter normalization.
- Provider-specific UI behaviors.

## Invariants to Enforce

1. Assistant channel (`AssistantDelta` / `assistant_message.content`) contains no control delimiters: `<think>`, `</think>`, `<thinking>`, `</thinking>`.
2. Thinking channel aggregates:
   - native reasoning fields (`reasoning`, `thinking`, `reasoning_content`)
   - optional extracted tag-body reasoning from `content` when configured/needed.
3. Delimiter fragments split across stream chunks must not leak.
4. Session persistence receives already-normalized data.

## Normalization Policy

### Tag set
Initial supported tag pairs:
- `<think>` ... `</think>`
- `<thinking>` ... `</thinking>`

### Precedence / dedupe policy
- Native reasoning fields are authoritative.
- If native reasoning appears in a chunk/response and a tagged reasoning block is also found in content:
  - prefer native reasoning for thinking channel;
  - strip tagged delimiters from assistant channel;
  - avoid double-appending equivalent tagged reasoning text unless we explicitly choose merge mode.
- If no native reasoning exists, extract tag-body text into thinking channel.

### Stray delimiter handling
- Remove standalone opening/closing reasoning tags from assistant channel.
- Do not remove arbitrary angle-bracket text that is not one of the supported reasoning tags.

## Detailed Implementation Plan

### 1) Add protocol normalizer module (provider layer)
Create a focused internal normalizer in `src/provider/openai_compatible.rs` or a sibling module (e.g., `src/provider/reasoning_normalizer.rs`) with:

- A small state struct for streaming:
  - `inside_reasoning_block: bool`
  - `carry: String` (for partial token boundaries)
- Pure function for non-streaming whole-string normalization.
- Streaming function:
  - input: raw assistant `delta.content`, boolean/native reasoning presence
  - output:
    - `assistant_visible_delta`
    - `thinking_extracted_delta`
  - consumes split tokens safely.

### 2) Integrate into streaming path
In `apply_stream_chunk` (`src/provider/openai_compatible.rs`):
- Process native thinking fields first, track `native_reasoning_seen`.
- Pass `delta.content` through streaming normalizer.
- Emit:
  - `AssistantDelta(normalized_visible)`
  - `ThinkingDelta(extracted_reasoning)` only when policy permits and non-empty.
- Ensure the aggregate `assistant` and `thinking` buffers use normalized values, not raw values.

### 3) Integrate into non-streaming path
In `parse_chat_response`:
- Normalize `message.content` before assigning `assistant_message.content`.
- Merge extracted tagged reasoning into `thinking` only when policy allows and native reasoning absence/dedupe conditions are met.

### 4) Keep agent loop unchanged
`src/core/agent/mod.rs` should remain unchanged. It already handles distinct channels correctly once provider output is normalized.

### 5) Add tests (provider-level)
Add focused tests in provider test files (new or existing):

1. **Trailing close tag**
   - input content: `"...done</think>"`
   - expected assistant: `"...done"`
   - expected thinking: unchanged/empty unless extractable block exists.

2. **Inline tagged block**
   - input content: `"<think>hidden steps</think>Final answer"`
   - expected assistant: `"Final answer"`
   - expected thinking: `"hidden steps"` (if no native reasoning).

3. **Streaming split delimiter**
   - chunks: `"</thi"` + `"nk>"` (or split open/close tags)
   - expected: no delimiter leakage.

4. **Native + tagged mixed**
   - native reasoning present and content includes `<think>...</think>`
   - expected: assistant stripped, thinking deduped per policy.

5. **Non-reasoning angle bracket text**
   - content with `<tag>` not in supported tag set
   - expected: preserved.

### 6) Optional telemetry/debug signal
Add debug-only counters/logs (if project conventions allow) for:
- stripped delimiter count
- extracted tagged reasoning count
This helps validate behavior against real provider streams.

## Rollout / Safety Strategy

1. Implement behind deterministic default behavior (no runtime flag required initially).
2. Run:
   - `cargo check`
   - `cargo test`
   - targeted provider tests
3. Validate with a captured problematic session replay and confirm no new `</think>` persistence.
4. If risk concerns remain, gate extraction-vs-strip behavior behind a config toggle after baseline fix.

## Risks and Mitigations

- **Risk:** Over-stripping legitimate user text containing `<think>` literals.
  - **Mitigation:** strip only exact supported reasoning tags; preserve unknown tags.
- **Risk:** Duplicate reasoning when native and tagged reasoning coexist.
  - **Mitigation:** native-first policy + dedupe guard.
- **Risk:** Streaming boundary bugs.
  - **Mitigation:** explicit state machine tests for split chunks and EOF flush behavior.

## Acceptance Criteria

1. No `</think>` or `<think>` tokens appear in assistant-visible output for known leak cases.
2. Existing native reasoning behavior remains functional.
3. Session message events store normalized assistant content.
4. Provider tests cover streaming and non-streaming normalization cases.
5. No regressions in existing TUI thinking rendering tests.

## Open Decisions (to confirm before implementation)

1. Should extracted tag-body reasoning be appended when native reasoning also exists (`merge`) or dropped (`native-only`)?
   - Recommended default: `native-only` to avoid duplication.
2. Should `<thinking>` tags be supported immediately alongside `<think>`?
   - Recommended default: yes, both.
3. Should we add a user-facing config switch now or only if needed?
   - Recommended default: no switch initially; keep policy internal and deterministic.