tldr-core 0.1.6

# Premortem Analysis - Pass 2/3
## Semantic Search Module for tldr-rs

**Date:** 2026-02-03  
**Pass Focus:** User Experience, Accuracy, Edge Cases, Compatibility, Concurrency  
**Assumption:** The project has FAILED. What went wrong?

---

## 1. User Experience Failures

### 1.1 Confusing Score Interpretation

**Failure Scenario:** Users see scores like `0.52` and don't know if that's good or bad. They filter with `--threshold 0.8` expecting "good matches" but get zero results, then lower to `0.3` and get garbage.

**Likelihood:** HIGH  
**Impact:** MAJOR

**Mitigation:**
```rust
/// In SemanticSearchReport, add interpretive guidance
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SemanticSearchReport {
    // ... existing fields ...
    
    /// Human-readable score interpretation guide
    pub score_guide: ScoreGuide,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ScoreGuide {
    /// Suggested threshold for "strong match"
    pub strong_threshold: f64,  // 0.75+
    /// Suggested threshold for "relevant"
    pub relevant_threshold: f64,  // 0.55+
    /// Warning if all scores are low
    pub low_score_warning: Option<String>,
}

// In text output, show:
// "Results (8 matches):
//  Score guide: >=0.75 = strong match, >=0.55 = relevant, <0.55 = weak
// 
//  1. src/config.rs:parse_config (score: 0.89) [STRONG]
//  2. src/loader.rs:load_config (score: 0.62) [RELEVANT]
//  3. src/util.rs:init (score: 0.41) [WEAK]"
```

**Add to spec Section 4.1:** Include `score_guide` in output and text formatting showing score interpretation badges.

---

### 1.2 Silent Truncation of Long Functions

**Failure Scenario:** A 500-line function exceeds the 512 token context limit. The embedding only represents the first ~80 lines. User searches for logic in line 400, gets no match, thinks the tool is broken.

**Likelihood:** HIGH  
**Impact:** CRITICAL

**Mitigation:**
```rust
/// Track truncation in chunk metadata
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CodeChunk {
    // ... existing fields ...
    
    /// True if content was truncated to fit model context
    pub truncated: bool,
    
    /// Original character count before truncation
    pub original_length: usize,
}

/// In output, warn about truncated functions
impl SemanticSearchReport {
    pub fn truncation_warnings(&self) -> Vec<String> {
        // Return warnings for any truncated results
    }
}

// Text output should show:
// "WARNING: 3 functions were truncated (>512 tokens). 
//  Consider using --model m-long for better coverage:
//  - src/big_module.rs:process_all (truncated from 15KB)
//  - ..."
```

**Add to spec Section 2.1:** Add `truncated: bool` and `original_length: usize` to CodeChunk. Add truncation warnings to report output.

---

### 1.3 Opaque Model Download Experience

**Failure Scenario:** First-time user runs `tldr semantic "error handling"` and the terminal hangs for 30+ seconds with no output while the 110MB model downloads. User thinks it's frozen and Ctrl+C's.

**Likelihood:** HIGH  
**Impact:** MAJOR

**Mitigation:**
```rust
/// In embedder.rs, always show download progress
impl Embedder {
    pub fn new(model: EmbeddingModel) -> TldrResult<Self> {
        // Check if model exists locally first
        if !model.is_cached() {
            eprintln!(
                "Downloading {} embedding model (~{}MB)...",
                model.name(),
                model.size_mb()
            );
            eprintln!("This is a one-time download. Future runs will be instant.");
        }
        
        // Use fastembed's progress callback if available
        // ...
    }
}

// Add --offline flag to fail fast without download
#[arg(long)]
pub offline: bool,
```

**Add to spec Section 6.3:** Require progress message before any model download. Add `--offline` flag to all semantic commands.

---

### 1.4 Meaningless Results for Non-Code Queries

**Failure Scenario:** User searches `"fix the bug"` or `"TODO"` - gets results with scores ~0.4-0.5 that look plausible but are essentially random. User doesn't realize semantic search works best for conceptual queries.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Detect low-quality queries and warn
fn analyze_query_quality(query: &str) -> QueryQuality {
    let words: Vec<&str> = query.split_whitespace().collect();
    
    // Flag queries that are too short
    if words.len() < 2 {
        return QueryQuality::TooShort;
    }
    
    // Flag queries that are just keywords (should use grep instead)
    let keyword_patterns = ["TODO", "FIXME", "BUG", "error"];
    if words.iter().all(|w| keyword_patterns.contains(w) || w.len() < 3) {
        return QueryQuality::UseGrepInstead;
    }
    
    QueryQuality::Good
}

// In output:
// "HINT: Your query 'TODO' may work better with `tldr search TODO` (keyword search)
//  Semantic search excels at conceptual queries like 'handle authentication errors'"
```

**Add to spec Section 4.1:** Add query quality analysis with hints suggesting keyword search when appropriate.

---

### 1.5 No Indication of Index Staleness

**Failure Scenario:** User adds new functions to codebase, runs semantic search, wonders why new code isn't found. Cache has old embeddings, user doesn't know to use `--no-cache`.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// In SemanticSearchReport, add cache freshness info
pub struct SemanticSearchReport {
    // ... existing fields ...
    
    /// Files that were re-indexed (changed since cache)
    pub files_reindexed: usize,
    
    /// Files loaded from cache
    pub files_from_cache: usize,
    
    /// Oldest cache entry age (if using cache)
    pub oldest_cache_age_hours: Option<u64>,
}

// In text output:
// "Index: 150 chunks (142 cached, 8 re-indexed)
//  Cache age: oldest entry is 72 hours old
//  TIP: Use --no-cache to force full re-index"
```

**Add to spec Section 4.1:** Report cache statistics including age of oldest entry and count of re-indexed files.

---

## 2. Accuracy Failures

### 2.1 Cross-Language Semantic Mismatch

**Failure Scenario:** User searches `"parse JSON"` in a mixed Python/Rust codebase. The Python `json.loads()` wrapper ranks higher than the Rust `serde_json::from_str()` implementation because the embedding model was trained more on Python.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Support language-aware search boost
#[derive(Debug, Clone)]
pub struct SearchOptions {
    // ... existing fields ...
    
    /// Boost factor for specific language (1.0 = no boost)
    pub language_boost: Option<(Language, f64)>,
}

// Example: --lang-boost rust:1.2
// This multiplies Rust results' scores by 1.2

// In similarity.rs:
fn apply_language_boost(
    results: &mut [SemanticSearchResult],
    boost: Option<(Language, f64)>,
) {
    if let Some((lang, factor)) = boost {
        for r in results.iter_mut() {
            if r.language == lang {
                r.score *= factor;
                r.boosted = true;
            }
        }
        results.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
    }
}
```

**Add to spec Section 2.4:** Add optional `language_boost` to SearchOptions. Add `--lang-boost` CLI flag.

---

### 2.2 Docstring-Dominated Embeddings

**Failure Scenario:** Functions with verbose docstrings match queries based on docstring content, not actual implementation. User searches `"sort by timestamp"` and gets functions whose docstrings mention sorting but actually do something else.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Chunk options for docstring handling
pub struct ChunkOptions {
    // ... existing fields ...
    
    /// How to handle docstrings
    pub docstring_mode: DocstringMode,
}

#[derive(Debug, Clone, Copy, Default)]
pub enum DocstringMode {
    /// Include docstrings (default)
    #[default]
    Include,
    /// Exclude docstrings entirely
    Exclude,
    /// Give docstrings lower weight (separate embedding, lower score contribution)
    Downweight,
}

// Alternative: Generate TWO embeddings per function
// - One for code only
// - One for docstring only
// Then combine with configurable weight
```

**Add to spec Section 2.3:** Add `DocstringMode` enum and `docstring_mode` to ChunkOptions. Default to Include but document trade-offs.

---

### 2.3 Variable Name Over-Influence

**Failure Scenario:** User searches `"authentication"` and gets functions that just happen to have a variable named `auth` but do unrelated work (e.g., `let auth = config.get("auth");` in a logging function).

**Likelihood:** MEDIUM  
**Impact:** MINOR

**Mitigation:**
```rust
/// Consider adding code normalization before embedding
fn normalize_code_for_embedding(code: &str, language: Language) -> String {
    // Option 1: Replace variable names with generic placeholders
    // Option 2: Extract only structural patterns (risky, loses semantics)
    // Option 3: Document this limitation and suggest using function-name queries
    
    // For now, document the limitation:
    // "Semantic search matches against full code including variable names.
    //  For precise identifier matching, use `tldr search --pattern 'auth'`"
    code.to_string()
}
```

**Add to spec Section 8.3:** Document that variable names influence embeddings. Suggest keyword search for identifier matching.

---

### 2.4 Boilerplate Code Pollution

**Failure Scenario:** Large codebase has many similar boilerplate functions (constructors, getters, init methods). These dominate search results because they all have similar embeddings and match generic queries.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Filter out boilerplate patterns
pub struct SearchOptions {
    // ... existing fields ...
    
    /// Exclude common boilerplate patterns
    pub exclude_boilerplate: bool,
}

/// Boilerplate detection heuristics
fn is_likely_boilerplate(chunk: &CodeChunk) -> bool {
    let name = chunk.function_name.as_deref().unwrap_or("");
    
    // Common boilerplate function names
    let boilerplate_names = [
        "new", "default", "clone", "from", "into",
        "__init__", "__str__", "__repr__",
        "get_", "set_", "is_", "has_",
        "toString", "hashCode", "equals",
    ];
    
    boilerplate_names.iter().any(|b| name.starts_with(b) || name == *b)
        || chunk.content.lines().count() < 5  // Very short functions
}

// In search, optionally filter:
// "Tip: Use --no-boilerplate to hide 23 constructor/getter results"
```

**Add to spec Section 2.4:** Add `exclude_boilerplate: bool` to SearchOptions with configurable patterns.

---

## 3. Edge Case Failures

### 3.1 Empty Repository / No Code Files

**Failure Scenario:** User runs `tldr semantic` on a directory with only markdown, configs, and no source code. Gets cryptic error or silently returns empty results.

**Likelihood:** MEDIUM  
**Impact:** MINOR

**Mitigation:**
```rust
/// Provide helpful error for no-code scenarios
pub fn build(root: &Path, ...) -> TldrResult<Self> {
    let chunks = chunk_code(root, chunk_options)?;
    
    if chunks.is_empty() {
        // Check WHY it's empty
        let file_count = count_files(root);
        let code_extensions = [".py", ".rs", ".ts", ".go"];
        let has_code_files = find_files_with_extensions(root, &code_extensions).count() > 0;
        
        if file_count == 0 {
            return Err(TldrError::NoChunksFound {
                path: root.to_path_buf(),
                hint: "Directory is empty".into(),
            });
        } else if !has_code_files {
            return Err(TldrError::NoChunksFound {
                path: root.to_path_buf(),
                hint: format!(
                    "Found {} files but none with supported extensions ({:?}). \
                     Use --lang to specify language.",
                    file_count, code_extensions
                ),
            });
        } else {
            return Err(TldrError::NoChunksFound {
                path: root.to_path_buf(),
                hint: "Files found but no functions extracted. Check for parse errors.".into(),
            });
        }
    }
    // ...
}
```

**Add to spec Section 6.1:** Enhance `NoChunksFound` error with diagnostic hints.

---

### 3.2 Single-File Project

**Failure Scenario:** User runs `tldr similar src/main.rs` on a project with only one file. Gets either self-match or empty results. User confused about what "similar" means with no comparisons.

**Likelihood:** LOW  
**Impact:** MINOR

**Mitigation:**
```rust
/// Special handling for single-file projects
impl SemanticIndex {
    pub fn find_similar(&self, chunk: &CodeChunk, options: SearchOptions) -> TldrResult<SimilarityReport> {
        let candidates = self.chunks_excluding(chunk, options.exclude_self);
        
        if candidates.is_empty() {
            return Ok(SimilarityReport {
                similar: vec![],
                note: Some("No other functions to compare. Use --include-self to see self-similarity.".into()),
                // ...
            });
        }
        // ...
    }
}
```

**Add to spec Section 3.3:** Handle single-file/single-function gracefully with explanatory message.

---

### 3.3 Monorepo with 100K+ Functions

**Failure Scenario:** User runs semantic search on a massive monorepo. Index building takes 10+ minutes, runs out of memory, or search becomes unusably slow.

**Likelihood:** MEDIUM  
**Impact:** CRITICAL

**Mitigation:**
```rust
/// Add hard limits and user feedback
pub const MAX_INDEX_SIZE: usize = 100_000;
pub const WARNING_INDEX_SIZE: usize = 50_000;

impl SemanticIndex {
    pub fn build(root: &Path, ...) -> TldrResult<Self> {
        let chunks = chunk_code(root, chunk_options)?;
        
        if chunks.len() > MAX_INDEX_SIZE {
            return Err(TldrError::IndexTooLarge {
                count: chunks.len(),
                limit: MAX_INDEX_SIZE,
                suggestion: format!(
                    "Index exceeds {} chunks. Suggestions:\n\
                     - Use --path to target a subdirectory\n\
                     - Use --lang to filter by language\n\
                     - Use --exclude to skip directories (e.g., vendor/, node_modules/)",
                    MAX_INDEX_SIZE
                ),
            });
        }
        
        if chunks.len() > WARNING_INDEX_SIZE {
            eprintln!(
                "Warning: Large index ({} chunks). Search may be slow. \
                 Consider narrowing scope with --path or --exclude.",
                chunks.len()
            );
        }
        // ...
    }
}

// Add --exclude patterns to CLI
#[arg(long, value_delimiter = ',')]
pub exclude: Vec<String>,  // e.g., --exclude vendor,node_modules,test
```

**Add to spec Section 8.4:** Add `MAX_INDEX_SIZE`, `WARNING_INDEX_SIZE` constants. Add `--exclude` CLI flag. Add `IndexTooLarge` error type.

---

### 3.4 Files with Syntax Errors

**Failure Scenario:** Codebase has a file with a syntax error (WIP code). Tree-sitter fails to parse, entire file is skipped silently. User's search misses important function.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Track parse failures in report
pub struct EmbedReport {
    // ... existing fields ...
    
    /// Files that failed to parse (with errors)
    pub parse_failures: Vec<ParseFailure>,
}

pub struct ParseFailure {
    pub file_path: PathBuf,
    pub error: String,
    pub line: Option<u32>,
}

// In output:
// "Indexed 145 chunks from 23 files
//  WARNING: 2 files skipped due to parse errors:
//  - src/wip.py:42: SyntaxError: unexpected indent
//  - src/broken.rs:10: expected `;`"
```

**Add to spec Section 3.2:** Track and report parse failures. Add `parse_failures` to EmbedReport.

---

### 3.5 Unicode/Non-ASCII Code

**Failure Scenario:** Codebase has functions with non-ASCII names (e.g., `calculate_日期()` or comments in Chinese). Embedding model handles them poorly, search accuracy degrades.

**Likelihood:** LOW  
**Impact:** MINOR

**Mitigation:**
```rust
/// Document Unicode handling
// In spec: "The Snowflake Arctic models are trained primarily on English text.
// Non-ASCII content (variable names, comments in other languages) may have
// reduced search accuracy. Consider using keyword search for non-English queries."

/// Optionally strip non-ASCII for embedding
pub struct ChunkOptions {
    // ... existing fields ...
    
    /// Normalize to ASCII for embedding (experimental)
    pub ascii_only: bool,
}
```

**Add to spec Section 5.1:** Document Unicode limitations of Arctic models. Add note about reduced accuracy for non-English content.

---

## 4. Compatibility Failures

### 4.1 ONNX Runtime Version Mismatch

**Failure Scenario:** User has `onnxruntime 1.16` installed system-wide, but fastembed requires `1.18`. Build succeeds but runtime crashes with cryptic C++ errors.

**Likelihood:** MEDIUM  
**Impact:** CRITICAL

**Mitigation:**
```toml
# In Cargo.toml, pin ONNX runtime version strictly
[dependencies]
fastembed = { version = "5.8", features = ["static-onnx"] }
# OR
ort = { version = "2.0", features = ["static"] }  # Static linking avoids system conflicts
```

```rust
/// Add version check at startup
fn check_onnx_compatibility() -> TldrResult<()> {
    // If using dynamic linking, verify version
    if let Some(version) = ort::sys::version() {
        if version < MIN_ONNX_VERSION {
            return Err(TldrError::IncompatibleRuntime {
                found: version,
                required: MIN_ONNX_VERSION,
            });
        }
    }
    Ok(())
}
```

**Add to spec Section 10.1:** Recommend static ONNX linking. Add `IncompatibleRuntime` error type.

---

### 4.2 Rust Version Incompatibility

**Failure Scenario:** User on Rust 1.70 tries to build tldr-rs but fastembed requires 1.75+ for certain features. Build fails with confusing type errors.

**Likelihood:** LOW  
**Impact:** MAJOR

**Mitigation:**
```toml
# In Cargo.toml
[package]
rust-version = "1.75"  # Explicit MSRV

# In lib.rs
#[cfg(not(rust_version = "1.75"))]
compile_error!("Semantic search requires Rust 1.75 or later");
```

**Add to spec Section 10:** Document minimum Rust version (1.75+) for semantic module.

---

### 4.3 Platform-Specific ONNX Issues

**Failure Scenario:** Works on macOS/Linux but fails on Windows due to missing ONNX runtime redistributable or ARM-specific issues on M1/M2 Macs.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Add platform-specific fallback
#[cfg(target_os = "windows")]
fn setup_onnx_runtime() -> TldrResult<()> {
    // Check for Visual C++ redistributable
    // Provide helpful error if missing
}

#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
fn setup_onnx_runtime() -> TldrResult<()> {
    // M1/M2 specific checks
    // May need to use CPU execution provider instead of CoreML
}

// Graceful degradation:
// "Semantic search unavailable on this platform. Using keyword search as fallback."
```

**Add to spec Section 6.3:** Document platform requirements. Add graceful degradation when ONNX unavailable.

---

### 4.4 Disk Space Exhaustion During Model Download

**Failure Scenario:** User's home directory is nearly full. Model download (110MB) starts, partially downloads, then fails. Leaves corrupted partial file that causes future failures.

**Likelihood:** LOW  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Atomic model download with space check
fn download_model(model: EmbeddingModel) -> TldrResult<PathBuf> {
    let required_mb = model.size_mb() + 50;  // Buffer for extraction
    let available_mb = get_available_space_mb(&model_cache_dir())?;
    
    if available_mb < required_mb {
        return Err(TldrError::InsufficientDiskSpace {
            required_mb,
            available_mb,
            cache_dir: model_cache_dir(),
        });
    }
    
    // Download to temp file first
    let temp_path = cache_dir.join(format!("{}.downloading", model.name()));
    download_to(&temp_path)?;
    
    // Atomic rename on success
    std::fs::rename(&temp_path, &final_path)?;
    
    Ok(final_path)
}
```

**Add to spec Section 6.2:** Check disk space before download. Use atomic download pattern (temp file + rename).

---

## 5. Concurrency Failures

### 5.1 Concurrent Cache Writes Corrupt JSON

**Failure Scenario:** User runs `tldr semantic` in two terminals simultaneously. Both try to write to the same cache JSON file. Result: corrupted JSON, all cached embeddings lost.

**Likelihood:** HIGH  
**Impact:** CRITICAL

**Mitigation:**
```rust
/// Use file locking for cache writes
impl EmbeddingCache {
    pub fn flush(&mut self) -> TldrResult<()> {
        use fs2::FileExt;
        
        let lock_path = self.path.with_extension("lock");
        let lock_file = File::create(&lock_path)?;
        
        // Exclusive lock for writing
        lock_file.lock_exclusive()?;
        
        // Write to temp file first (atomic)
        let temp_path = self.path.with_extension("tmp");
        let temp_file = File::create(&temp_path)?;
        serde_json::to_writer_pretty(&temp_file, &self.entries)?;
        temp_file.sync_all()?;
        
        // Atomic rename
        std::fs::rename(&temp_path, &self.path)?;
        
        // Release lock
        lock_file.unlock()?;
        
        Ok(())
    }
    
    pub fn open(config: CacheConfig) -> TldrResult<Self> {
        use fs2::FileExt;
        
        let lock_path = config.cache_path().with_extension("lock");
        let lock_file = File::open(&lock_path).or_else(|_| File::create(&lock_path))?;
        
        // Shared lock for reading
        lock_file.lock_shared()?;
        
        let entries = if config.cache_path().exists() {
            let file = File::open(&config.cache_path())?;
            serde_json::from_reader(file)?
        } else {
            HashMap::new()
        };
        
        lock_file.unlock()?;
        
        Ok(Self { entries, .. })
    }
}
```

**Add to spec Section 5.3:** Require file locking for cache operations. Use `fs2` crate for cross-platform locks.

**Add to Cargo.toml:**
```toml
fs2 = "0.4"
```

---

### 5.2 Race Condition in Cache Invalidation

**Failure Scenario:** Process A reads file, computes hash, checks cache. Process B modifies file. Process A writes embedding with stale hash. Future lookups use wrong embedding.

**Likelihood:** MEDIUM  
**Impact:** MAJOR

**Mitigation:**
```rust
/// Include file mtime in cache key for extra safety
struct CacheKey {
    file_path: PathBuf,
    content_hash: String,
    model: EmbeddingModel,
    mtime: SystemTime,  // Additional check
}

impl EmbeddingCache {
    pub fn get(&self, chunk: &CodeChunk, model: EmbeddingModel) -> Option<Vec<f32>> {
        let key = self.make_key(chunk, model);
        
        if let Some(entry) = self.entries.get(&key.to_string()) {
            // Double-check mtime hasn't changed
            if let Ok(metadata) = std::fs::metadata(&chunk.file_path) {
                if metadata.modified().ok() != Some(entry.mtime) {
                    return None;  // File changed, cache invalid
                }
            }
            return Some(entry.embedding.clone());
        }
        None
    }
}
```

**Add to spec Section 5.3:** Include file mtime in cache validation for defense-in-depth.

---

### 5.3 Model Loading Race Condition

**Failure Scenario:** Two parallel processes both detect model not cached, both try to download simultaneously. Either: double download (wasted bandwidth) or one corrupts the other's download.

**Likelihood:** MEDIUM  
**Impact:** MINOR (wasteful) to MAJOR (corruption)

**Mitigation:**
```rust
/// Use lockfile for model download
fn ensure_model_downloaded(model: EmbeddingModel) -> TldrResult<PathBuf> {
    let model_dir = model_cache_dir().join(model.name());
    let lock_path = model_dir.with_extension("downloading.lock");
    
    // Check if already downloaded
    if model_dir.exists() && model_dir.join("model.onnx").exists() {
        return Ok(model_dir);
    }
    
    // Acquire exclusive lock
    let lock_file = File::create(&lock_path)?;
    lock_file.lock_exclusive()?;
    
    // Double-check after acquiring lock (another process may have completed)
    if model_dir.exists() && model_dir.join("model.onnx").exists() {
        lock_file.unlock()?;
        return Ok(model_dir);
    }
    
    // We're the first - do the download
    download_model_inner(model, &model_dir)?;
    
    lock_file.unlock()?;
    std::fs::remove_file(&lock_path).ok();  // Clean up lock file
    
    Ok(model_dir)
}
```

**Add to spec Section 3.1:** Use file lock during model download to prevent concurrent download race.

---

### 5.4 Index Build During Active Writes

**Failure Scenario:** User starts `tldr embed .` which reads many files. Meanwhile, editor auto-saves changes. Index ends up with mix of old and new embeddings, inconsistent with actual file states.

**Likelihood:** LOW  
**Impact:** MINOR (self-correcting on next run)

**Mitigation:**
```rust
/// Document limitation and provide --consistent flag
// In spec: "Index building is not atomic. If files change during indexing,
// the index may have inconsistent embeddings. Use --consistent for strict
// snapshotting (slower)."

#[arg(long)]
pub consistent: bool,  // Read all files into memory first, then embed

fn build_consistent(root: &Path, ...) -> TldrResult<Self> {
    // First pass: read all files into memory with mtimes
    let snapshots: Vec<FileSnapshot> = collect_files(root)
        .map(|p| FileSnapshot::new(p))
        .collect();
    
    // Second pass: embed from snapshots (not live files)
    // ...
}
```

**Add to spec Section 4.2:** Document non-atomic indexing. Add `--consistent` flag for strict mode.

---

## Summary of Mitigations by Priority

| Priority | Failure | Mitigation |
|----------|---------|------------|
| P0 | Cache write corruption | File locking with fs2 |
| P0 | Silent truncation | Track `truncated` flag, warn users |
| P0 | Monorepo explosion | Hard limit + helpful error |
| P1 | Opaque model download | Progress messages + --offline |
| P1 | Score confusion | Score guide + badges in output |
| P1 | Concurrent model download | Download lockfile |
| P1 | Parse failure silence | Report failures in output |
| P2 | Index staleness | Report cache age + re-index count |
| P2 | Boilerplate pollution | --no-boilerplate filter |
| P2 | ONNX version mismatch | Static linking + version check |
| P2 | Platform issues | Graceful degradation |
| P3 | Docstring dominance | DocstringMode options |
| P3 | Cross-language bias | --lang-boost option |
| P3 | Single-file edge case | Explanatory message |
| P3 | Unicode handling | Document limitation |

---

## Files to Update

1. **spec.md Section 2.1 (Types):** Add `truncated`, `original_length` to CodeChunk
2. **spec.md Section 2.4 (Index Types):** Add `language_boost`, `exclude_boilerplate` to SearchOptions
3. **spec.md Section 3.5 (Cache):** Add file locking requirement
4. **spec.md Section 4.1 (CLI):** Add `--offline`, `--exclude`, `--no-boilerplate`, `--consistent`
5. **spec.md Section 6 (Errors):** Add `IndexTooLarge`, `InsufficientDiskSpace`, `IncompatibleRuntime`
6. **spec.md Section 10 (Dependencies):** Add `fs2`, pin ONNX version, document MSRV

---

*End of Premortem Pass 2*