littlefs2-rust 0.1.1

# Architecture Notes

This document records how the Rust implementation currently relates to
upstream littlefs. It is a working engineering map, not a user manual. Keep it
updated when a write path changes its transaction shape or when a test starts
depending on a new invariant.

## Compatibility Goal

The project is moving in two phases:

1. Produce littlefs v2.1 images that upstream C littlefs can mount and read.
2. Converge the Rust transaction model toward littlefs' own algorithms where
   correctness depends on more than final visible state.

The first phase lets us build useful behavior incrementally. The second phase
is required for relocation, orphan handling, global move state, wear leveling,
and exact crash semantics.

## Strictly Shared Format

These pieces are not design space for the Rust implementation. They must stay
aligned with `littlefs_c/SPEC.md` and are continuously checked by mounting the
same bytes with upstream C:

- metadata pair selection by revision and sequence ordering
- metadata commit encoding: XOR tags, big-endian tag words, CRC tags, and
  program-size padding
- visible directory folding: create/delete id shifting, names, structs, and
  user attributes
- superblock entry id 0 in the root pair
- inline file structs and CTZ file structs
- CTZ block layout and skip pointers
- hardtail directory chains
- NOR programming semantics: writes can clear bits, erase is needed for 0-to-1

When a C oracle disagrees with Rust on these bytes, the Rust code is wrong.

## Configuration Semantics

`Config` is intentionally only physical geometry: `block_size` and
`block_count`. The C-like runtime and formatting knobs live in
`FilesystemOptions`, which is accepted by `format_device_with_options`,
`mount_device_with_options`, `mount_device_mut_with_options`, and
`ImageBuilder::new_with_options`.

The first configuration-parity slice wires the fields that already affect
visible behavior or memory shape:

- `read_size`, `prog_size`, `cache_size`, and `lookahead_size` are validated
  with C-style divisibility rules; `prog_size` feeds metadata commit padding,
  and cache size feeds read-only and mutable block caches.
- `block_cycles: Option<u32>` maps Rust `None` to C's `-1` disabled setting and
  initializes mounted metadata relocation cadence.
- `name_max`, `file_max`, and `attr_max` are written into the superblock during
  format. Mount options act as maximum supported limits, so an image advertising
  larger limits is rejected.
- `InlineMax` represents C's overloaded `inline_max` sentinels explicitly:
  default derives `min(cache_size, attr_max, metadata_limit / 8)`, disabled
  maps to a zero inline threshold, and limit validates the explicit bound.

The remaining configuration gaps are algorithmic, not just API surface:
`lookahead_size` does not yet drive a persistent checkpoint allocator,
`metadata_max` is used for inline-limit derivation and validation but not full
compaction policy, `compact_thresh` is not exposed yet, and the mutable dirty
program buffer is still a pragmatic Rust cache rather than an exact C cache
state machine.

## Current Rust-Native Design Choices

Some implementation strategies are deliberately simpler than littlefs C. These
are acceptable only while tests describe the boundary and the gap is recorded:

- The mounted allocator uses an in-memory used-block bitmap with a lookahead
  window. It rebuilds from the visible graph after commits that free storage and
  starts remounted scans at C's metadata-CRC allocation seed instead of always
  scanning from block 0. Metadata pair callers receive the same post-compaction
  order that C publishes in `DIRSTRUCT` records. Internally, C's `lfs_dir_alloc`
  fills a temporary pair backwards and `lfs_dir_compact` swaps it after writing
  pair[1]; Rust starts at the already-committed abstraction. The allocator is
  not yet littlefs' full persistent checkpoint policy. Bad blocks discovered
  while writing unreachable CTZ data, streaming data, relocated metadata halves,
  or new split-tail pairs stay reserved in the live allocator across visible
  graph rebuilds, so a successful transaction does not immediately reuse a
  candidate that just failed. The reservation is intentionally session-local; a
  future remount starts from the disk-visible graph just like C.
- Existing-file append, truncate replacement, and write-only partial overwrite
  through `open_file` can stream in the CTZ cases that do not require
  read-your-writes caching. Partial overwrite keeps patch ranges, reads old
  data in block-sized chunks during flush, overlays patches, and streams a
  replacement CTZ chain before the exposing metadata commit. Read/write handles
  and arbitrary-position extension still fall back to buffering.
- Cross-parent file and directory rename currently produce a C-compatible final
  state using separate create/unlink commits. This is not littlefs' global
  atomic move-state machinery.
- Directory handles are now borrowing cursors. They store the directory head and
  logical cursor position, then fold metadata on each `read` instead of keeping
  a `Vec<DirEntry>` snapshot. This trades large-directory iteration speed for
  low memory until a pair-local cursor can preserve both.
- File sync exists as a documented public boundary in `docs/sync_semantics.md`,
  with retry coverage for buffered writes, CTZ append, write-only partial
  overwrite, explicit sync retry, and dropped dirty handles. Sync faults inside
  lower-level split/relocation/orphan transactions remain open.
- Read-only and mutable block-device mounts are no longer full-image resident.
  `Filesystem::mount_device(&device)` borrows the device and lets the existing
  parser fetch metadata and CTZ blocks through `BlockDevice::read` on demand.
  Public read-only mounts use four 256-byte chunked read-cache slots by default,
  so cache memory no longer scales with `block_size`; mutable mounts use two
  256-byte parser chunks. The mutable `BlockCache` keeps two 256-byte read
  chunks and a dirty-range program buffer rather than a full block-sized program
  image. `FileBlockDevice::prog` and `erase` also use 256-byte temporary chunks
  so desktop resource tests do not count a full-block helper allocation.
  `MetadataPair` stores only the committed active-block prefix and exposes
  `visit_entries`,
  `fold_dir`, and `find_name` helpers instead of retaining a permanent
  `Vec<Entry>` or requiring path lookup to allocate a full `Vec<FileRecord>`
  per directory component. Commit CRCs are folded from the committed prefix on
  demand for allocation seeding, so metadata pairs no longer retain a separate
  CRC vector. Targeted lookup avoids cloning attrs or inline data for
  nonmatching names, and allocator mount scans use metadata storage refs instead
  of full `FileRecord` values. The public `read_dir_with`, `walk_with`, and
  `read_attr_into` methods let embedded callers stream common read results
  without forcing the collecting `Vec` wrappers. The mutable writer now owns
  its block device in a boxed slot and uses the same borrowed-device read view,
  rather than keeping a resident sparse metadata image. `Filesystem::mount(&[u8])`
  remains the fixture/corruption-test API for callers that already have a
  complete image in memory. `FilesystemMut::as_filesystem()` should be treated
  as a metadata view; mounted payload reads should use
  `FilesystemMut::read_file` or file handles. Mounted mutation APIs no longer
  use hidden whole-image `ImageEditor` fallbacks; a path shape without native
  transaction coverage returns `Error::Unsupported`. `ImageEditor` remains the
  explicit offline editor for tests and host tools that already own a complete
  image. The current resource guard keeps 32 MiB read-only and mutable
  mount/read peaks below 512 KiB; the
  latest 32 MiB resource-mode samples are now in the intended embedded band:
  mutable empty mount retained about 9.7 KiB, populated mutable mount peaked
  about 21.0 KiB, mounted 128 KiB streaming write peaked about 29.7 KiB
  including the evaluator's 4 KiB caller buffer, and read-file-into added about
  2.1 KiB over the caller-owned output buffer. CTZ range reads are offset-aware,
  read only the
  requested data/skip-pointer byte ranges from devices, and read-only file
  handles cache their resolved immutable file data, so small repeated reads
  avoid prefix scans, whole-block CTZ materialization, and repeated path lookup.
  Create-only writer open-time validation no longer opens an `ImageEditor` over
  a full device image.

These shortcuts should shrink over time. They must not silently expand.

## Write Ordering Invariants

Mounted writes should preserve old-or-new visibility across recorded block
operations whenever the operation is intended to be atomic.

| Operation shape | Ordering rule |
|---|---|
| CTZ create or overwrite | Program newly allocated CTZ blocks before the metadata commit that points at them. |
| Directory create | Initialize the child metadata pair before the parent `DIRSTRUCT` commit exposes it. |
| Split directory | Program the new tail pair while unreachable, then commit the source pair hardtail. |
| Pair compaction | Write the compacted alternate side with a higher revision; the old active side remains mountable until the compacted commit is valid. |
| Pair relocation | Program the newly relocated metadata block first, then update the parent or predecessor pointer. Root-child pair relocation also writes a pending orphan count before the parent pointer update and repairs the half-orphan thread edge before returning. |
| Bad newly allocated blocks | If an unreachable CTZ data block, streaming data block, relocated metadata half, or split tail pair reports `Error::Corrupt`, abandon that candidate, reserve it in the live allocator, allocate a replacement, rebuild CTZ skip pointers / relocation pointer updates / split tail payloads, and only then publish the metadata pointer. |
| Split-chain pressure rebalance | If a split-chain update/attr no longer fits after local compaction, write newly packed tail pairs while unreachable, then publish a compacted head that points at the rebuilt chain. |

Tests should record snapshots around these operations with
`RecordingBlockDevice` and assert every snapshot is mountable and visibly old or
new.

## Split Directory Model

The reader follows hardtail chains for root and child directories. The mounted
writer now also treats root and child split chains as ordered by lexical name
range:

- Create target selection walks the chain and picks the pair whose current
  range owns the new name.
- A first split compacts visible entries into two balanced halves, writes the
  new tail first, and then commits the source pair to its alternate side with a
  hardtail pointer.
- If a later create targets a non-tail pair and append-plus-compaction no longer
  fits, the same split compaction inserts a new pair before the old hardtail
  successor. The inserted pair inherits the old tail edge, so the chain remains
  ordered and reachable only after the source pair's hardtail commit lands.
- Updates, deletes, renames, attrs, and directory operations locate the pair
  that contains the target entry before appending.
- If an update or attr grows a pair so much that local compaction still cannot
  fit, Rust currently chooses a cost-effective whole-chain rebuild: collect the
  visible records across the split chain, apply the target-pair update, pack
  records into fresh tail pairs, and publish the new chain by committing the
  head last. This preserves C-readable final state and crash ordering, but it
  is not an exact clone of C's adjacent-pair balancing choices.
- Block-cycle relocation now follows C littlefs more closely by relocating one
  half of a metadata pair rather than allocating an entirely disjoint pair. The
  new pair keeps the old active block and adds one freshly allocated block, so
  power-loss recovery can detect a half-orphan through the shared block.
- The relocation trigger uses C's adjusted odd interval:
  `(rev + 1) % ((block_cycles + 1) | 1) == 0`. This matters even for tests with
  `block_cycles = 1`: a fresh `rev=1` pair does not relocate immediately, which
  prevents the non-terminating relocation pattern upstream explicitly avoids.
- Root and non-root split-tail relocation prepares the relocated tail first and
  then updates the predecessor hardtail. Metadata-only updates such as attrs,
  deletes, and same-directory renames can use the same relocation path. CTZ
  updates program new data blocks before the relocated metadata pair is
  exposed. Directory creation initializes the new child pair before exposing
  relocated parent metadata.
- Root-child directory-pair relocation is a two-step visible repair shape:
  prepare the relocated child pair, update the root directory entry together
  with a pending orphan count, then retarget the predecessor softtail to the
  relocated pair and clear the orphan count. Mutable mount runs the same
  half-orphan repair if power loss happens after the parent pointer update.

This is closer to littlefs than the earlier append-only tail split and now
covers create overflow plus update/attr pressure. The remaining
`lfs_dir_splittingcompact` gap is mostly fidelity of internal adjacent-pair
choices, not final C-readable behavior for the covered operations.

## C Reference Areas

The next areas should be implemented by reading and mapping the corresponding C
state machine, not by inventing a merely plausible Rust-native variant:

- remaining exact open/path errno gaps, mostly operation-specific
  NoSpace/FBIG boundaries in larger images and any rare flags real callers need
- sync/fsync behavior inside lower-level split, relocation, pending-move, and
  orphan transactions
- read/write partial-overwrite cache semantics and broader mixed write/sync
  parity
- pending move/orphan repair when rename, replacement, split, and relocation
  interleave
- remaining `lfs_dir_relocatingcommit` and related relocation / `block_cycles`
  behavior, especially pointer-update bad-block handling and combined
  split/compact/relocate state transitions
- `lfs_dir_splittingcompact` adjacent-pair policy where matching C's internal
  layout is worth the extra complexity
- global move state and orphan handling
- cross-parent atomic rename
- persistent allocator checkpoint and bad-block policy across remounts

For these areas, implementation notes should name the relevant C function and
record any intentional deviation.

## Test Strategy

Every meaningful filesystem behavior should use real storage bytes:

- Unit tests are acceptable for local pure logic such as allocator cursor and
  cache behavior, but they must use the real `MemoryBlockDevice` when block
  semantics matter.
- Black-box API tests should use only public APIs, remount the image, and ask C
  littlefs to read/list/getattr the final state.
- Power-loss tests should record every erase/prog snapshot and assert old-or-new
  visibility.
- Fault-matrix tests should inject real `prog`, `erase`, and `sync` errors,
  mutable-remount the resulting image, retry or observe the operation as already
  committed, and ask C to verify the converged final bytes. Bad-block tests use
  real block-device wrappers that return `Error::Corrupt` from `erase` or
  `prog`, never mocks, and then verify the replacement block is visible through
  upstream C littlefs.
- Large scenario tests should combine features that are already individually
  supported, periodically remount, and finally compare Rust and C manifests.
- A small Rust-side visible-state model is allowed in tests. It is not a
  littlefs mock; it records user-visible expected files, directories, and
  attributes while the actual filesystem operations still run on real block
  devices and are checked by C.

## Current Completion Estimate

Approximate, subjective status:

| Target | Completion |
|---|---:|
| Read existing littlefs v2.1 images | 82-87% |
| Single-threaded mounted filesystem | 82-86% |
| Embedded-oriented filesystem with stronger failure semantics | 65-72% |
| Full littlefs behavioral alignment | 48-53% |

The next work should bias toward reliability and convergence rather than adding
more surface API: persistent allocator/checkpoint policy across remounts,
remaining relocation state-machine fidelity where C parity is worth the
complexity, and broader performance/corruption audits.