candle-mi 0.1.3

Mechanistic interpretability for language models in Rust, built on candle
Documentation
# candle-mi Coding Conventions (Grit + Grit-MI Extensions)

This document describes the [Amphigraphic coding](https://github.com/PCfVW/Amphigraphic-Strict) conventions used in candle-mi. It is a superset of
the [Grit — Strict Rust for AI-Assisted Development](https://github.com/PCfVW/Amphigraphic-Strict/tree/master/Grit).

## Annotation Patterns

Every annotation below is mandatory when the corresponding situation applies.

### `// TRAIT_OBJECT: <reason>`
Required on every `Box<dyn Trait>` or `&dyn Trait` usage.
> Example: `// TRAIT_OBJECT: heterogeneous model backends require dynamic dispatch`

### `// EXHAUSTIVE: <reason>`
Required on `#[allow(clippy::exhaustive_enums)]`.
> Example: `// EXHAUSTIVE: internal dispatch enum; crate owns and matches all variants`

### `// EXPLICIT: <reason>`
Required when a match arm is intentionally a no-op, or when an imperative
loop is used instead of an iterator chain for a stateful computation.
> Example: `// EXPLICIT: WKV recurrence is stateful; .map() would hide the state update`

### `// PROMOTE: <reason>`
Required immediately before any `.to_dtype(DType::F32)?` call that promotes
a tensor from a lower-precision dtype (F16, BF16) to F32 for numerical
correctness. Common reasons include:

- **Numerical functions**: softmax, log, exp, norm, sqrt
- **Matmul precision**: decoder weights stored as BF16 on disk
- **Accumulation**: running sums, averages, WKV recurrence
- **Dot-product precision**: matching a Python reference implementation
- **DType extraction**: `to_vec1::<f32>()` from BF16 safetensors

The reason must be specific to the call site, not a generic "numerical stability".

> Example: `// PROMOTE: softmax over F16 produces NaN; compute in F32`
> Example: `// PROMOTE: decoder weights are BF16 on disk; F32 for matmul precision`
> Example: `// PROMOTE: WKV recurrence must be in F32 for numerical stability`

### `// CONTIGUOUS: <reason>`
Required immediately before any `.contiguous()?` call that precedes a matmul.
> Example: `// CONTIGUOUS: transpose produces non-unit strides; matmul requires contiguous layout`

### `// BORROW: <what is converted>`
Required on explicit `.as_str()`, `.as_bytes()`, `.to_owned()` conversions (Grit Rule 2).
> Example: `// BORROW: explicit .as_str() instead of Deref coercion`

### `// SAFETY: <invariants>`
Required on every `unsafe` block or function (inline comment, not a doc comment).

candle-mi is `#![forbid(unsafe_code)]` **by default**. Specific feature flags
relax this to `#![deny(unsafe_code)]` for narrowly scoped platform FFI:

| Feature | Accepted `unsafe` scope |
|---------|------------------------|
| `mmap` | Memory-mapped file I/O for sharded safetensors |
| `memory` | OS/GPU memory queries (`GetProcessMemoryInfo`, NVML FFI, DXGI COM) |

Each accepted use must satisfy all of:
1. The `unsafe` block is in a **single, dedicated module** (e.g., `src/mmap.rs`,
   `src/memory.rs`) — never scattered across the codebase.
2. Every `unsafe` block carries a `// SAFETY:` comment documenting the invariants.
3. The module is gated behind `#[cfg(feature = "...")]` — users who don't
   enable the feature get `forbid(unsafe_code)` with zero exceptions.

Adding a new accepted use requires updating this table and the `cfg_attr` lines
in `lib.rs`.

### `// INDEX: <reason>`
Required on every direct slice index (`slice[i]`, `slice[a..b]`) that cannot
be replaced by an iterator. Direct indexing panics on out-of-bounds; prefer
`.get(i)` with `?` or explicit error handling. Use direct indexing only when
the bound is provably valid and an iterator idiom would be significantly less
readable.
> Example: `// INDEX: i is bounded by dims.len() checked two lines above`

### `// CAST: <from> → <to>, <reason>`
Required on every `as` cast between numeric types. Prefer `From`/`Into` for
lossless conversions and `TryFrom`/`TryInto` with `?` for fallible ones.
Use `as` only when truncation or wrapping is the deliberate intent, or when
interfacing with a C-style API that mandates it.
> Example: `// CAST: usize → u32, tensor dim fits in u32 (checked at construction)`
> Example: `// CAST: f64 → f32, precision loss acceptable; value is a display scalar`

---

## Doc-Comment Rules

### Backtick Hygiene (`doc_markdown`)

All identifiers, types, trait names, field names, crate names, and
file-format names in doc comments must be wrapped in backticks so that
rustdoc renders them as inline code and Clippy's `doc_markdown` lint passes.

Applies to: struct/enum/field names, method names (`fn foo`), types
(`Vec<T>`, `Option<f32>`), crate names (`candle_core`, `safetensors`),
file extensions (`.npy`, `.npz`, `.safetensors`), and acronyms that double
as types (`DType`, `NaN`, `CPU`, `GPU`).

>`/// Loads weights from a [`.safetensors`] file into a [`Tensor`].`
>`/// Loads weights from a .safetensors file into a Tensor.`

### Intra-Doc Link Safety

Rustdoc intra-doc links must resolve under all feature-flag combinations
(enforced by `#![deny(warnings)]` → `rustdoc::broken_intra_doc_links`).

Two patterns to watch:

1. **Feature-gated items** — items behind `#[cfg(feature = "...")]` are absent
   when that feature is off. Use plain backtick text, not link syntax:

   >`` /// Implemented by `CltFeatureId` (requires `clt` feature). ``
   >`` /// Implemented by [`CltFeatureId`](crate::clt::CltFeatureId). ``

2. **Cross-module links** — items re-exported at the crate root (e.g.,
   `MIError`) are not automatically in scope inside submodules. Use explicit
   `crate::` paths:

   >`` /// Returns [`MIError::Model`](crate::MIError::Model) on failure. ``
   >`` /// Returns [`MIError::Model`] on failure. ``

### Field-Level Docs

Every field of every `pub` struct must carry a `///` doc comment describing:
1. what the field represents,
2. its unit or valid range where applicable.

Fields of `pub(crate)` structs follow the same rule. Private fields inside a
`pub(crate)` or `pub` struct must have at minimum a `//` comment if their
purpose is not self-evident from the name alone.

> Example:
> ```rust
> pub struct AttentionConfig {
>     /// Number of query heads. Must be a multiple of `n_kv_heads`.
>     pub n_heads: usize,
>     /// Number of key/value heads (grouped-query attention).
>     pub n_kv_heads: usize,
>     /// Per-head dimension. Total hidden size = `n_heads * head_dim`.
>     pub head_dim: usize,
> }
> ```

---

## Control-Flow Rules

### `if let` vs `match` (`match_like_matches_macro`, `single_match`)

Use the most specific construct for the pattern at hand:

| Situation | Preferred form |
|---|---|
| Testing a single variant, no binding needed | `matches!(expr, Pat)` |
| Testing a single variant, binding needed | `if let Pat(x) = expr { … }` |
| Two or more variants with different bodies | `match expr { … }` |
| Exhaustive dispatch over an enum | `match expr { … }` (never `if let` chains) |

Never use a `match` with a single non-`_` arm and a no-op `_ => {}` where
`if let` or `matches!` would be clearer. Conversely, never chain three or
more `if let … else if let …` arms where a `match` would be exhaustive.

>`if let Some(w) = weight { apply(w); }`
>`matches!(dtype, DType::F16 | DType::BF16)`
>`match weight { Some(w) => apply(w), None => {} }`

---

## Function Signature Rules

### `const fn`

Declare a function `const fn` when **all** of the following hold:
1. The body contains no heap allocation, I/O, or `dyn` dispatch.
2. All called functions are themselves `const fn`.
3. There are no trait-method calls that are not yet `const`.

This applies to constructors, accessors, and pure arithmetic helpers.
When in doubt, annotate and let the compiler reject it — do not omit `const`
preemptively.

>`pub const fn head_dim(&self) -> usize { self.hidden_size / self.n_heads }`
>`pub fn head_dim(&self) -> usize { self.hidden_size / self.n_heads }`

### Pass by Value vs Reference (`needless_pass_by_ref_mut`, `trivially_copy_pass_by_ref`)

Follow these rules for function parameters:

| Type | Rule |
|---|---|
| `Copy` type ≤ 2 words (`usize`, `f32`, `bool`, small `enum`) | Pass by value |
| `Copy` type > 2 words | Pass by reference |
| Non-`Copy`, not mutated | Pass by `&T` or `&[T]` |
| Non-`Copy`, mutated | Pass by `&mut T` |
| Owned, consumed by callee | Pass by value (move semantics) |
| `&mut T` not actually mutated in body | Change to `&T` |

Never accept `&mut T` when the function body never writes through the reference;
Clippy's `needless_pass_by_ref_mut` will flag it and callers lose the ability
to pass shared references.

>`fn scale(x: f32, factor: f32) -> f32`
>`fn scale(x: &f32, factor: &f32) -> f32`
>`fn apply_mask(tensor: &mut Tensor, mask: &Tensor)`
>`fn apply_mask(tensor: &mut Tensor, mask: &mut Tensor)` ← if mask is only read

---

## Shape Documentation Format (Rule 12)

All public functions that accept or return `Tensor` must document shapes in
their doc comment using the following format:

    /// # Shapes
    /// - `q`: `[batch, n_heads, seq_q, head_dim]` -- query tensor
    /// - `k`: `[batch, n_kv_heads, seq_k, head_dim]` -- key tensor
    /// - returns: `[batch, n_heads, seq_q, head_dim]`

Rules:
- Use concrete dimension names, never `d0`/`d1`.
- Batch dimension is always first.
- Document every tensor argument and the return tensor.

## #[non_exhaustive] Policy (Rule 11)

- Public enums that may gain new variants: `#[non_exhaustive]`.
- Internal dispatch enums matched exhaustively by this crate:
  `#[allow(clippy::exhaustive_enums)] // EXHAUSTIVE: <reason>`.

## Hook Purity Contract (Rule 16)

- `HookSpec::capture()` takes only a hook point name -- no callback. The
  captured tensor is stored in `HookCache` and retrieved after `forward()`.
  The absence of a mutation mechanism is the enforcement.
- `HookSpec::intervene()` takes a typed `Intervention` value. All mutations
  go through this path and are visible at the call site.

## `#[must_use]` Policy (Rule 17)

All public functions and methods that return a value and have no side effects
must be annotated `#[must_use]`.  This includes constructors (`new`,
`with_capacity`), accessors (`len`, `is_empty`, `get_*`), and pure queries.
Without the annotation, a caller can silently discard the return value — which
for these functions is always a bug, since the call has no other effect.

The `clippy::must_use_candidate` lint enforces this at `warn` level
(promoted to error by `#![deny(warnings)]`).

## `# Errors` Doc Section

All public fallible methods (`-> Result<T>`) must include an `# Errors` section
in their doc comment. Each bullet uses the format:

    /// # Errors
    /// Returns [`MIError::Config`] if the model type is unsupported.
    /// Returns [`MIError::Model`] on weight loading failure.

Rules:
- Start each bullet with `Returns` followed by the variant in rustdoc link
  syntax, e.g., `` [`MIError::Config`] ``.
- Follow with `if` (condition), `on` (event), or `when` (circumstance).
- Use the concrete variant name, not the generic `MIError`.
- One bullet per distinct error path.

## Error Message Wording

Error strings passed to `MIError` variants follow two patterns:

- **External failures** (I/O, serde, network): `"failed to <verb>: {e}"`
  > Example: `MIError::Config(format!("failed to parse config: {e}"))`
- **Validation failures** (range, shape, lookup): `"<noun> <problem> (<context>)"`
  > Example: `MIError::Config(format!("source_layer {src} out of range (max {max})"))`
  > Example: `MIError::Hook(format!("hook point {point:?} not captured"))`

Rules:
- Use lowercase, no trailing period.
- Include the offending value and the valid range or constraint when applicable.
- Wrap external errors with `: {e}`, not `.to_string()`.

## `# Memory` Doc Section

Public methods that load large files (>100 MB, typically safetensors decoder
files) must include a `# Memory` section documenting:

1. **Peak allocation** — how much memory the method allocates at its peak.
2. **Residency** — whether the large allocation lives on CPU, GPU, or both.
3. **Lifetime** — whether the allocation is dropped before the method returns
   or persists in the returned value.

Format:

    /// # Memory
    /// Loads one decoder file (~2 GB) to CPU per source layer. Each file is
    /// dropped before loading the next. Peak: ~2 GB CPU.

## OOM-safe Decoder Loading Pattern

When loading large safetensors files (decoder weights, encoder weights),
follow the 7-step pattern to bound peak memory:

1. `ensure_path()` — resolve the file path (may trigger download).
2. `fs::read(&path)` — read the entire file into a `Vec<u8>` on CPU.
3. `SafeTensors::deserialize(&bytes)` — zero-copy parse of the byte buffer.
4. Extract the tensor view and build a candle `Tensor` on CPU.
5. Slice or narrow to the needed subset.
6. `drop(bytes)` (or let it go out of scope) — free the raw file buffer
   **before** loading the next file.
7. Optionally `.to_device(device)` if GPU computation follows.

The key invariant is: **at most one large file buffer is alive at any time**.
This bounds peak memory to roughly 1× the largest decoder file (~2 GB for
Gemma 2 2B CLTs).

> Example location: `CrossLayerTranscoder::score_features_by_decoder_projection`

## HashMap Grouping Idiom

When operations must be batched by a key (e.g., grouping features by source
layer to load each decoder file only once), use the `Entry` API:

```rust
let mut by_source: HashMap<usize, Vec<Item>> = HashMap::new();
for item in items {
    by_source.entry(item.key()).or_default().push(item);
}
```

Rules:
- Name the map `by_<grouping_key>` (e.g., `by_source`, `by_layer`).
- Use `.entry(key).or_default().push()` — never `if let Some` + `else insert`.
- Iterate the map to perform the batched operation (one file load per key).