1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
//! Semantic canonicalisation for opaque-payload hash equality.
//!
//! Helpers that normalise payload bytes so two semantically-equivalent
//! programs hash to the same value under `crate::ir::Program::hash`.
//! These concerns are separate from wire-format endianness; they live
//! here so extension authors can opt into canonicalisation without
//! mixing it with the byte-level encode/decode primitives.
/// Canonicalise an inline regex-flag set so payloads encoding the same
/// semantic regex hash equal under `crate::ir::Program::hash`.
///
/// Rust/ICU-style inline regex literals of the shape `(?flags)pattern`
/// accept the flag characters in any order — `(?mi)` and `(?im)`
/// describe the same regex but the differing byte sequences hash
/// distinctly, defeating CSE and cache lookups (F-IR-02 / F-IR-37).
/// Extension authors who encode a regex literal into an opaque payload
/// MUST call this helper on the flag string before framing so two
/// semantically-equal regexes reach the encoder as the same bytes.
///
/// The returned string is the deduplicated, sorted flag set. The input
/// may contain any UTF-8 characters; deduplication is by `char` equality.
/// Canonicalise an `f32` bit pattern for opaque-payload hash equality.
///
/// `-0.0` and `+0.0` compare equal under IEEE-754 `==` but have
/// different bit patterns, so a naive `to_le_bytes` encoder makes them
/// hash distinctly. Extension authors who want two programs that differ
/// only by the sign of zero to hash equal call this helper before
/// [`super::endian::push_f32`]. Non-zero floats and every NaN bit pattern are
/// returned unchanged — callers preserve their distinct identities
/// because sign-of-zero is the only "semantically equal but
/// bit-distinct" shape common enough to justify canonicalising at the
/// wire layer.
///
/// See F-IR-37 in CRITIQUE_IR_SOUNDNESS_2026-04-22.md for the full
/// rationale. The 64-bit companion is [`canonical_f64_zero`] — every
/// extension that wants cross-width sign-of-zero canonicalisation
/// should use the helper matching the literal's width to avoid bit
/// pattern drift between f32 and f64 encodings.
/// Canonicalise an `f64` bit pattern for opaque-payload hash equality.
///
/// Same contract as [`canonical_f32_zero`] but for the 64-bit width:
/// `-0.0f64` bit pattern (`0x8000_0000_0000_0000`) normalises to
/// `+0.0f64` (`0x0000_0000_0000_0000`) so two programs that differ
/// only by the sign of zero hash equal under `Program::hash`. Every
/// non-zero f64 (including every signalling and quiet NaN in either
/// sign, every subnormal, every finite with a non-zero mantissa) is
/// returned unchanged — callers preserve the distinct bit patterns
/// that carry IEEE-754 semantic meaning (NaN payloads, signed zero in
/// division, etc.).
///
/// CRITIQUE_FIX_REVIEW_2026-04-23 Finding #11: without this helper,
/// f64 opaque extensions had no canonicalisation path, so two
/// semantically-equal programs differing only by `-0.0f64 → +0.0f64`
/// would hash distinctly and defeat CSE / cache lookups.