# J-CLI Compaction System - Detailed Analysis
## Overview: Why Compaction?
In long conversations, the message history accumulates tokens. Three-layer compaction prevents context window overflow:
1. **Micro Compact**: In-memory placeholder replacement (free)
2. **Auto Compact**: LLM summarization (costs tokens, saves more)
3. **Explicit Compact**: User/tool-triggered (manual control)
## Layer 1: Micro Compact (In-Memory)
### Location
`src/command/chat/compact.rs`, lines 61-127
Called: Every agent loop iteration (line 47 in `agent.rs`)
### Algorithm
```rust
fn micro_compact(messages: &mut [ChatMessage], keep_recent: usize)
```
### What It Does
1. **Build tool name mapping** (lines 66-76)
- Scans all assistant messages for `tool_calls`
- Creates `tool_call_id → tool_name` map
- Example: `"call_123" → "Bash"`
2. **Find tool result messages** (lines 78-84)
- Collects indices of all `role="tool"` messages
- These contain results from previous tool executions
3. **Determine compaction targets** (lines 86-91)
- If fewer than `keep_recent` tool messages: do nothing
- Otherwise: compact messages older than the last `keep_recent`
- Default `keep_recent = 10`
4. **Replace with placeholders** (lines 102-116)
- For each old tool result message:
- Check content size: if > 800 bytes
- Skip if tool is exempt (see below)
- Replace content with: `[Previous: used {tool_name}]`
- Log count of compacted messages
### Exempt Tools
Tools NOT compacted even if large (lines 94-100):
```rust
const EXEMPT_TOOLS: &[&str] = &[
LoadSkillTool::NAME, // Skills carry workflow instructions
TaskTool::NAME, // Task definitions important for tracking
TodoWriteTool::NAME, // Todo state management
TodoReadTool::NAME, // Todo state management
AskTool::NAME, // Ask dialogs may be needed context
];
```
**Why exempt?** These tools deliver state/instructions, not just informational results.
### Example
Before micro_compact:
```
Message 1: role=assistant, content="I'll search for that file"
Message 2: role=tool (Bash), tool_call_id=call_456
content="... 1500 bytes of file listing ..."
Message 3: role=assistant, content="Found 3 matches, let me read them"
Message 4: role=tool (Read), tool_call_id=call_789
content="... 2000 bytes of file content ..."
Message 5: role=assistant, content="I'm loading the skill"
Message 6: role=tool (LoadSkill), tool_call_id=call_abc
content="... 3000 bytes of skill definition ... [EXEMPT]"
```
After micro_compact (keep_recent=3):
```
Message 1: role=assistant, content="I'll search for that file"
Message 2: role=tool (Bash), tool_call_id=call_456
content="[Previous: used Bash]" ← Compacted!
Message 3: role=assistant, content="Found 3 matches, let me read them"
Message 4: role=tool (Read), tool_call_id=call_789
content="[Previous: used Read]" ← Compacted!
Message 5: role=assistant, content="I'm loading the skill"
Message 6: role=tool (LoadSkill), tool_call_id=call_abc
content="... 3000 bytes of skill definition ... [KEPT]"
```
### Performance
- **Cost**: O(n) scan, very fast
- **Token saved**: Highly variable (depends on tool result sizes)
- **Data loss**: Minimal (just tool results, not logic)
- **Frequency**: Every agent loop iteration
---
## Layer 2: Auto Compact (LLM Summarization)
### Location
`src/command/chat/compact.rs`, lines 174-246
Called: When token threshold exceeded (lines 48-56 in `agent.rs`)
### Trigger Condition
```rust
if compact::estimate_tokens(&messages) > compact_config.token_threshold {
auto_compact(&mut messages, &provider).await
}
```
Default threshold: **204,800 tokens** (256 * 800)
Rough estimation: `json_length / 4` (4 chars ≈ 1 token)
### Process
#### Step 1: Save Full Transcript (lines 178-168)
```rust
fn save_transcript(messages: &[ChatMessage]) -> Option<String>
```
- Creates `.transcripts/` directory in agent data
- Saves all messages as JSONL (one message per line)
- Filename: `transcript_{unix_timestamp}.jsonl`
- **Purpose**: Preserve complete history for debugging/recovery
- **Logged**: Path to transcript file
#### Step 2: Truncate Conversation (lines 182-184)
```rust
let conversation_text = serde_json::to_string(messages).unwrap_or_default();
let truncated: String = conversation_text.chars().take(80000).collect();
```
- Serialize all messages to JSON
- Keep first 80,000 characters
- Prevents sending huge prompts to LLM
#### Step 3: Build Summary Request (lines 186-192)
```rust
let summary_prompt = format!(
"Summarize this conversation for continuity. Include: \
1) What was accomplished, 2) Current state, 3) Key decisions made. \
4) If a skill/workflow was actively being followed, preserve its key steps and current progress so the model can continue following it. \
Be concise but preserve critical details.\n\n{}",
truncated
);
```
**Key instruction**: "If a skill/workflow was actively being followed, preserve its key steps and current progress"
This is **critical** for workflow preservation!
#### Step 4: Call LLM (lines 194-212)
- Uses **non-streaming** request (easier for text extraction)
- Single-turn: no tools, just summarization
- `max_tokens=20000` (allow large summaries)
- API: OpenAI (same provider as main agent)
#### Step 5: Extract Summary (lines 214-218)
```rust
let summary = response
.choices
.first()
.and_then(|c| c.message.content.clone())
.unwrap_or_else(|| "(empty summary)".to_string());
```
#### Step 6: Replace Message History (lines 225-243)
```rust
messages.clear();
messages.push(ChatMessage {
role: "user",
content: format!(
"[Conversation compressed. Transcript: {}]\n\n{}",
transcript_path, summary
),
...
});
messages.push(ChatMessage {
role: "assistant",
content: "Understood. I have the context from the summary. Continuing.",
...
});
```
**Result:**
- Only 2 messages remain: one user (summary), one assistant (ack)
- Transcript location provided for reference
- Model acknowledges context is loaded
### Performance & Cost
- **Token cost**: ~80K chars input + 20K output = ~25K tokens ≈ $0.25
- **Token savings**: Typically 100K-200K+ tokens freed
- **ROI**: Usually positive, saves more than it costs
- **Timing**: Happens automatically when needed
### Graceful Degradation
```rust
if let Err(e) = compact::auto_compact(&mut messages, &provider).await {
write_error_log("agent_loop", &format!("auto_compact failed: {}", e));
// Continue with original messages! Don't crash.
}
```
- If LLM call fails: error logged, conversation continues
- Original messages unchanged
- Provides robustness against API issues
---
## Layer 3: Explicit Compact Tool
### Location
`src/command/chat/tools/compact.rs`, lines 1-45
### Tool Definition
```rust
pub struct CompactTool;
impl Tool for CompactTool {
fn name(&self) -> &str { "Compact" }
fn description(&self) -> &str {
"Trigger conversation compression to free up context window. \
Use this when the conversation is getting long and you want to \
summarize and compress the history to continue working efficiently."
}
fn parameters_schema(&self) -> Value {
json!({
"type": "object",
"properties": {
"focus": {
"type": "string",
"description": "What to preserve in the summary (optional)"
}
}
})
}
}
```
### How It Works
1. **Tool call**: Agent/user can call `Compact` tool
2. **Optional parameter**: `focus` string (not currently used by LLM, for future)
3. **Execution**: Returns "Compression requested" (lines 34-40)
4. **Agent detection**: Main loop checks for `CompactTool` call (line 606 in `agent.rs`)
5. **Triggering**: If detected, calls `auto_compact()` (lines 318-319, 451-452, etc.)
### When to Use
User/Agent should use when:
- Conversation feels long
- Want to resume work in new session
- Running long multi-step workflows
- Need to clear mental state before next phase
### Workflow Preservation
The `focus` parameter is **reserved for future** workflow tracking:
```rust
"focus": {
"type": "string",
"description": "What to preserve in the summary (optional)"
}
```
Could be used like: `{"focus": "Current status of feature X, steps completed so far"}`
---
## Complete Compaction Flow
```
Agent Loop Iteration N
│
├─ Step 1: Drain pending user messages
│
├─ Step 2: Run micro_compact()
│ ├─ Scan tool messages
│ ├─ Keep recent 10
│ ├─ Replace old large results with placeholders
│ └─ Log count compacted
│
├─ Step 3: Check token count
│ ├─ estimate_tokens() = json_length / 4
│ └─ Compare to threshold (default 204,800)
│
├─ Step 4: If threshold exceeded
│ ├─ save_transcript() to ~/.jdata/transcripts/
│ ├─ Truncate conversation to 80K chars
│ ├─ Build summary prompt (with workflow preservation instruction)
│ ├─ Call LLM non-streaming
│ ├─ Extract summary
│ ├─ Replace messages with [summary] + [ack]
│ └─ Log completion
│
├─ Step 5: Build LLM request
│ ├─ Add system prompt
│ ├─ Add current messages
│ ├─ Add tools
│ └─ Send to API
│
├─ Step 6: Process response
│ ├─ If tool_calls
│ │ ├─ Check if CompactTool called
│ │ ├─ If yes, trigger auto_compact() (Layer 3)
│ │ └─ Execute tools
│ └─ Else (text response)
│ └─ Add to messages
│
└─ Loop continues if pending user messages or tool calls remain
```
---
## Important Design Points
### 1. Workflow Preservation Priority
The compaction prompt explicitly instructs the LLM:
> "If a skill/workflow was actively being followed, preserve its key steps and current progress so the model can continue following it."
This ensures that:
- Skill/tool sequences are maintained
- Multi-step workflows can resume
- Current progress is not lost
- Instructions for "what's next" are preserved
### 2. Exempt Tools Philosophy
Exempt tools carry **state and instructions**, not just results:
- `LoadSkill` → workflow instructions
- `Task` → task definitions and tracking
- `TodoWrite/TodoRead` → state management
- `Ask` → dialog context
These are kept because they're instructions for the next iteration.
### 3. Graceful Degradation
All compaction is **non-blocking**:
- Network error in auto_compact? → Log and continue
- Micro_compact always succeeds (no I/O)
- Explicit compact tool just signals intent
System doesn't crash or hang if compression fails.
### 4. Session Continuity
After auto_compact:
```
Message history is: [user_msg(summary), assistant_msg(ack)]
Agent continues from here with new messages
Next tool results will be added fresh
Micro_compact will protect them
```
Model can seamlessly continue from the summary.
---
## Configuration
### CompactConfig (in config.yaml)
```yaml
[compact]
enabled = true # Enable/disable compaction
token_threshold = 204800 # Auto-compact trigger (default: 256 * 800)
keep_recent = 10 # Micro-compact: keep this many recent tool results
```
### Debug: Check Compaction
To see if compaction happened:
```bash
# Check logs
tail -f ~/.jdata/logs/agent_loop.log
# Look for:
# - "micro_compact triggered"
# - "auto_compact triggered"
# - "Transcript saved: ..."
```
### Manual Compaction
User can call during chat:
```
/Compact
Parameters (optional):
- focus: "What to preserve"
```
Or agent can detect it's getting long and call it autonomously.