# Architecture Notes
This document records how the Rust implementation currently relates to
upstream littlefs. It is a working engineering map, not a user manual. Keep it
updated when a write path changes its transaction shape or when a test starts
depending on a new invariant.
## Compatibility Goal
The project is moving in two phases:
1. Produce littlefs v2.1 images that upstream C littlefs can mount and read.
2. Converge the Rust transaction model toward littlefs' own algorithms where
correctness depends on more than final visible state.
The first phase lets us build useful behavior incrementally. The second phase
is required for relocation, orphan handling, global move state, wear leveling,
and exact crash semantics.
## Strictly Shared Format
These pieces are not design space for the Rust implementation. They must stay
aligned with `littlefs_c/SPEC.md` and are continuously checked by mounting the
same bytes with upstream C:
- metadata pair selection by revision and sequence ordering
- metadata commit encoding: XOR tags, big-endian tag words, CRC tags, and
program-size padding
- visible directory folding: create/delete id shifting, names, structs, and
user attributes
- superblock entry id 0 in the root pair
- inline file structs and CTZ file structs
- CTZ block layout and skip pointers
- hardtail directory chains
- NOR programming semantics: writes can clear bits, erase is needed for 0-to-1
When a C oracle disagrees with Rust on these bytes, the Rust code is wrong.
## Configuration Semantics
`Config` is intentionally only physical geometry: `block_size` and
`block_count`. The C-like runtime and formatting knobs live in
`FilesystemOptions`, which is accepted by `format_device_with_options`,
`mount_device_with_options`, `mount_device_mut_with_options`, and
`ImageBuilder::new_with_options`.
The first configuration-parity slice wires the fields that already affect
visible behavior or memory shape:
- `read_size`, `prog_size`, `cache_size`, and `lookahead_size` are validated
with C-style divisibility rules; `prog_size` feeds metadata commit padding,
and cache size feeds read-only and mutable block caches.
- `block_cycles: Option<u32>` maps Rust `None` to C's `-1` disabled setting and
initializes mounted metadata relocation cadence.
- `name_max`, `file_max`, and `attr_max` are written into the superblock during
format. Mount options act as maximum supported limits, so an image advertising
larger limits is rejected.
- `InlineMax` represents C's overloaded `inline_max` sentinels explicitly:
default derives `min(cache_size, attr_max, metadata_limit / 8)`, disabled
maps to a zero inline threshold, and limit validates the explicit bound.
The remaining configuration gaps are algorithmic, not just API surface:
`lookahead_size` does not yet drive a persistent checkpoint allocator,
`metadata_max` is used for inline-limit derivation and validation but not full
compaction policy, `compact_thresh` is not exposed yet, and the mutable dirty
program buffer is still a pragmatic Rust cache rather than an exact C cache
state machine.
## Current Rust-Native Design Choices
Some implementation strategies are deliberately simpler than littlefs C. These
are acceptable only while tests describe the boundary and the gap is recorded:
- The mounted allocator uses an in-memory used-block bitmap with a lookahead
window. It rebuilds from the visible graph after commits that free storage and
starts remounted scans at C's metadata-CRC allocation seed instead of always
scanning from block 0. Metadata pair callers receive the same post-compaction
order that C publishes in `DIRSTRUCT` records. Internally, C's `lfs_dir_alloc`
fills a temporary pair backwards and `lfs_dir_compact` swaps it after writing
pair[1]; Rust starts at the already-committed abstraction. The allocator is
not yet littlefs' full persistent checkpoint policy. Bad blocks discovered
while writing unreachable CTZ data, streaming data, relocated metadata halves,
or new split-tail pairs stay reserved in the live allocator across visible
graph rebuilds, so a successful transaction does not immediately reuse a
candidate that just failed. The reservation is intentionally session-local; a
future remount starts from the disk-visible graph just like C.
- Existing-file append, truncate replacement, and write-only partial overwrite
through `open_file` can stream in the CTZ cases that do not require
read-your-writes caching. Partial overwrite keeps patch ranges, reads old
data in block-sized chunks during flush, overlays patches, and streams a
replacement CTZ chain before the exposing metadata commit. Read/write handles
and arbitrary-position extension still fall back to buffering.
- Cross-parent file and directory rename currently produce a C-compatible final
state using separate create/unlink commits. This is not littlefs' global
atomic move-state machinery.
- Directory handles are now borrowing cursors. They store the directory head and
logical cursor position, then fold metadata on each `read` instead of keeping
a `Vec<DirEntry>` snapshot. This trades large-directory iteration speed for
low memory until a pair-local cursor can preserve both.
- File sync exists as a documented public boundary in `docs/sync_semantics.md`,
with retry coverage for buffered writes, CTZ append, write-only partial
overwrite, explicit sync retry, and dropped dirty handles. Sync faults inside
lower-level split/relocation/orphan transactions remain open.
- Read-only and mutable block-device mounts are no longer full-image resident.
`Filesystem::mount_device(&device)` borrows the device and lets the existing
parser fetch metadata and CTZ blocks through `BlockDevice::read` on demand.
Public read-only mounts use four 256-byte chunked read-cache slots by default,
so cache memory no longer scales with `block_size`; mutable mounts use two
256-byte parser chunks. The mutable `BlockCache` keeps two 256-byte read
chunks and a dirty-range program buffer rather than a full block-sized program
image. `FileBlockDevice::prog` and `erase` also use 256-byte temporary chunks
so desktop resource tests do not count a full-block helper allocation.
`MetadataPair` stores only the committed active-block prefix and exposes
`visit_entries`,
`fold_dir`, and `find_name` helpers instead of retaining a permanent
`Vec<Entry>` or requiring path lookup to allocate a full `Vec<FileRecord>`
per directory component. Commit CRCs are folded from the committed prefix on
demand for allocation seeding, so metadata pairs no longer retain a separate
CRC vector. Targeted lookup avoids cloning attrs or inline data for
nonmatching names, and allocator mount scans use metadata storage refs instead
of full `FileRecord` values. The public `read_dir_with`, `walk_with`, and
`read_attr_into` methods let embedded callers stream common read results
without forcing the collecting `Vec` wrappers. The mutable writer now owns
its block device in a boxed slot and uses the same borrowed-device read view,
rather than keeping a resident sparse metadata image. `Filesystem::mount(&[u8])`
remains the fixture/corruption-test API for callers that already have a
complete image in memory. `FilesystemMut::as_filesystem()` should be treated
as a metadata view; mounted payload reads should use
`FilesystemMut::read_file` or file handles. Mounted mutation APIs no longer
use hidden whole-image `ImageEditor` fallbacks; a path shape without native
transaction coverage returns `Error::Unsupported`. `ImageEditor` remains the
explicit offline editor for tests and host tools that already own a complete
image. The current resource guard keeps 32 MiB read-only and mutable
mount/read peaks below 512 KiB; the
latest 32 MiB resource-mode samples are now in the intended embedded band:
mutable empty mount retained about 9.7 KiB, populated mutable mount peaked
about 21.0 KiB, mounted 128 KiB streaming write peaked about 29.7 KiB
including the evaluator's 4 KiB caller buffer, and read-file-into added about
2.1 KiB over the caller-owned output buffer. CTZ range reads are offset-aware,
read only the
requested data/skip-pointer byte ranges from devices, and read-only file
handles cache their resolved immutable file data, so small repeated reads
avoid prefix scans, whole-block CTZ materialization, and repeated path lookup.
Create-only writer open-time validation no longer opens an `ImageEditor` over
a full device image.
These shortcuts should shrink over time. They must not silently expand.
## Write Ordering Invariants
Mounted writes should preserve old-or-new visibility across recorded block
operations whenever the operation is intended to be atomic.
| CTZ create or overwrite | Program newly allocated CTZ blocks before the metadata commit that points at them. |
| Directory create | Initialize the child metadata pair before the parent `DIRSTRUCT` commit exposes it. |
| Split directory | Program the new tail pair while unreachable, then commit the source pair hardtail. |
| Pair compaction | Write the compacted alternate side with a higher revision; the old active side remains mountable until the compacted commit is valid. |
| Pair relocation | Program the newly relocated metadata block first, then update the parent or predecessor pointer. Root-child pair relocation also writes a pending orphan count before the parent pointer update and repairs the half-orphan thread edge before returning. |
| Bad newly allocated blocks | If an unreachable CTZ data block, streaming data block, relocated metadata half, or split tail pair reports `Error::Corrupt`, abandon that candidate, reserve it in the live allocator, allocate a replacement, rebuild CTZ skip pointers / relocation pointer updates / split tail payloads, and only then publish the metadata pointer. |
| Split-chain pressure rebalance | If a split-chain update/attr no longer fits after local compaction, write newly packed tail pairs while unreachable, then publish a compacted head that points at the rebuilt chain. |
Tests should record snapshots around these operations with
`RecordingBlockDevice` and assert every snapshot is mountable and visibly old or
new.
## Split Directory Model
The reader follows hardtail chains for root and child directories. The mounted
writer now also treats root and child split chains as ordered by lexical name
range:
- Create target selection walks the chain and picks the pair whose current
range owns the new name.
- A first split compacts visible entries into two balanced halves, writes the
new tail first, and then commits the source pair to its alternate side with a
hardtail pointer.
- If a later create targets a non-tail pair and append-plus-compaction no longer
fits, the same split compaction inserts a new pair before the old hardtail
successor. The inserted pair inherits the old tail edge, so the chain remains
ordered and reachable only after the source pair's hardtail commit lands.
- Updates, deletes, renames, attrs, and directory operations locate the pair
that contains the target entry before appending.
- If an update or attr grows a pair so much that local compaction still cannot
fit, Rust currently chooses a cost-effective whole-chain rebuild: collect the
visible records across the split chain, apply the target-pair update, pack
records into fresh tail pairs, and publish the new chain by committing the
head last. This preserves C-readable final state and crash ordering, but it
is not an exact clone of C's adjacent-pair balancing choices.
- Block-cycle relocation now follows C littlefs more closely by relocating one
half of a metadata pair rather than allocating an entirely disjoint pair. The
new pair keeps the old active block and adds one freshly allocated block, so
power-loss recovery can detect a half-orphan through the shared block.
- The relocation trigger uses C's adjusted odd interval:
`(rev + 1) % ((block_cycles + 1) | 1) == 0`. This matters even for tests with
`block_cycles = 1`: a fresh `rev=1` pair does not relocate immediately, which
prevents the non-terminating relocation pattern upstream explicitly avoids.
- Root and non-root split-tail relocation prepares the relocated tail first and
then updates the predecessor hardtail. Metadata-only updates such as attrs,
deletes, and same-directory renames can use the same relocation path. CTZ
updates program new data blocks before the relocated metadata pair is
exposed. Directory creation initializes the new child pair before exposing
relocated parent metadata.
- Root-child directory-pair relocation is a two-step visible repair shape:
prepare the relocated child pair, update the root directory entry together
with a pending orphan count, then retarget the predecessor softtail to the
relocated pair and clear the orphan count. Mutable mount runs the same
half-orphan repair if power loss happens after the parent pointer update.
This is closer to littlefs than the earlier append-only tail split and now
covers create overflow plus update/attr pressure. The remaining
`lfs_dir_splittingcompact` gap is mostly fidelity of internal adjacent-pair
choices, not final C-readable behavior for the covered operations.
## C Reference Areas
The next areas should be implemented by reading and mapping the corresponding C
state machine, not by inventing a merely plausible Rust-native variant:
- remaining exact open/path errno gaps, mostly operation-specific
NoSpace/FBIG boundaries in larger images and any rare flags real callers need
- sync/fsync behavior inside lower-level split, relocation, pending-move, and
orphan transactions
- read/write partial-overwrite cache semantics and broader mixed write/sync
parity
- pending move/orphan repair when rename, replacement, split, and relocation
interleave
- remaining `lfs_dir_relocatingcommit` and related relocation / `block_cycles`
behavior, especially pointer-update bad-block handling and combined
split/compact/relocate state transitions
- `lfs_dir_splittingcompact` adjacent-pair policy where matching C's internal
layout is worth the extra complexity
- global move state and orphan handling
- cross-parent atomic rename
- persistent allocator checkpoint and bad-block policy across remounts
For these areas, implementation notes should name the relevant C function and
record any intentional deviation.
## Test Strategy
Every meaningful filesystem behavior should use real storage bytes:
- Unit tests are acceptable for local pure logic such as allocator cursor and
cache behavior, but they must use the real `MemoryBlockDevice` when block
semantics matter.
- Black-box API tests should use only public APIs, remount the image, and ask C
littlefs to read/list/getattr the final state.
- Power-loss tests should record every erase/prog snapshot and assert old-or-new
visibility.
- Fault-matrix tests should inject real `prog`, `erase`, and `sync` errors,
mutable-remount the resulting image, retry or observe the operation as already
committed, and ask C to verify the converged final bytes. Bad-block tests use
real block-device wrappers that return `Error::Corrupt` from `erase` or
`prog`, never mocks, and then verify the replacement block is visible through
upstream C littlefs.
- Large scenario tests should combine features that are already individually
supported, periodically remount, and finally compare Rust and C manifests.
- A small Rust-side visible-state model is allowed in tests. It is not a
littlefs mock; it records user-visible expected files, directories, and
attributes while the actual filesystem operations still run on real block
devices and are checked by C.
## Current Completion Estimate
Approximate, subjective status:
| Read existing littlefs v2.1 images | 82-87% |
| Single-threaded mounted filesystem | 82-86% |
| Embedded-oriented filesystem with stronger failure semantics | 65-72% |
| Full littlefs behavioral alignment | 48-53% |
The next work should bias toward reliability and convergence rather than adding
more surface API: persistent allocator/checkpoint policy across remounts,
remaining relocation state-machine fidelity where C parity is worth the
complexity, and broader performance/corruption audits.