1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
// Copyright 2026 wolfy <wolfy@shitwolfymakes.com>
// SPDX-License-Identifier: Apache-2.0
//! matrix256v1 — reference Rust implementation of the filesystem-walk
//! fingerprint. Every regular file under the walk root contributes one
//! (relative-path, size) record to a SHA-256 hash. The walk and
//! serialization logic here must stay in lockstep with the normative spec
//! in `SPEC.md`
//! (<https://github.com/shitwolfymakes/matrix256/blob/main/SPEC.md>).
//! If one changes, the other must too.
use fs;
use io;
use ;
use ;
use UnicodeNormalization;
/// The matrix256 algorithm version this module implements (spec §5).
/// Distinct from a crate or package version; future algorithm versions
/// will be added as sibling submodules with their own `VERSION` constants.
pub const VERSION: &str = "1";
/// A regular file selected for matrix256v1 fingerprinting.
///
/// Returned by [`walk`] for callers that want to inspect or display the
/// entry list. [`fingerprint`] consumes these internally and only the
/// `relative` and `size` fields contribute to the digest.
/// Compute the matrix256v1 digest of the filesystem rooted at `root`.
///
/// Walks the tree, sorts entries by UTF-8 path bytes (spec §2.4), feeds the
/// per-entry serialization (`<path-bytes> 0x00 <size-ascii> 0x0A`, spec §2.5)
/// into SHA-256 (spec §2.6). Returns 64 lowercase hex digits. Returns the
/// underlying `io::Error` if any directory or file metadata can't be read —
/// matrix256v1 is all-or-nothing per spec §3.
/// Collect every regular file under `root`, sorted by UTF-8 path bytes.
///
/// Directories are skipped (their existence is implied by the relative paths
/// of contained files), as are symbolic links (not followed, not emitted)
/// and other non-file entries (devices, sockets, FIFOs). Returns an
/// `io::Error` on any metadata failure — matrix256v1 is all-or-nothing per
/// spec §3.
/// Walk `current`, accumulating into `out`. `ancestors` is the chain of
/// root-relative path components leading to `current`; each recursive call
/// pushes its component before descending and pops on the way out, so
/// `Entry::relative` can be built from `ancestors` directly without ever
/// computing a relative path from an absolute one (which would invite a
/// `strip_prefix` that could fail at the type level).
/// Build the canonical UTF-8 byte sequence for the file whose root-relative
/// path is `components`: '/'-joined, U+FFFD substitution for invalid
/// sequences (already done at component capture via `to_string_lossy`),
/// NFC-normalized.
///
/// `to_string_lossy` (called in `scan`) provides the U+FFFD substitution
/// required by spec §2.2: "paths that cannot be represented as valid
/// Unicode are encoded as UTF-8 with the Unicode replacement character ...
/// substituted for each invalid code unit." On Unix this substitutes in
/// raw filename bytes; on Windows it substitutes lone UTF-16 surrogates.
///
/// The `.nfc()` step (from the `unicode-normalization` crate) applies spec
/// §2.2's Unicode Normalization Form C requirement. NFC is not in std;
/// see the README for the dep justification.
///
/// Decode-then-join is equivalent to join-then-decode here because the
/// separator '/' is single-byte ASCII (no UTF-8 sequence can cross a
/// component boundary), and NFC distributes over '/' (U+002F has canonical
/// combining class 0 and is in no canonical (de)composition mapping).