# SIMD Admission Policy
`base64-ng` is scalar by default and admits conservative accelerated encode
paths in the `1.2.x` line: std `x86`/`x86_64` AVX-512 VBMI first, then
AVX2, then SSSE3/SSE4.1, plus little-endian std `aarch64` NEON, for Standard
and URL-safe alphabet families. Future SIMD dispatch remains gated
unless a complete SIMD admission evidence package lands in the same release
commit as the active backend change. The crate uses `#![deny(unsafe_code)]` and permits
reviewed `allow(unsafe_code)` exceptions only for audited cleanup in
`src/cleanup.rs`, CT comparison, byte accumulation, CT scan, and CT result-gate
helpers in `src/ct/`, and the private `src/simd/` boundary.
This is a security decision, not a rejection of hardware acceleration. SIMD
must be added only when it can be isolated, tested, and reviewed without
weakening the scalar trust base.
## Version Roadmap
The SIMD roadmap separates implementation evidence from active acceleration:
- `1.1.x` is the SIMD encode foundation and admission-candidate series. Early
checkpoints contain real fixed-block encode prototypes for SSSE3/SSE4.1,
AVX2, AVX-512 VBMI, NEON, and wasm `simd128`, plus scalar-equivalence tests,
generated assembly evidence, register-cleanup review, fuzz expansion, and
admission-tooling updates. Later checkpoints wire admitted encode backends
into public encode APIs while keeping each checkpoint gated by pentest, CI,
and release evidence. GitHub checkpoint tags in this line moved evidence
forward without a matching crates.io publish until the `1.2.0` family sync.
- `1.1.5` adds the public encode backend boundary while still forcing scalar
execution. This gives future accelerated encode admission one reviewed
integration point for `encode_slice`, clear-tail helpers, alloc helpers,
wrapped helpers, and in-place encode. The same checkpoint also adds a
scalar-forced decode backend boundary for symmetry; decode acceleration
remains out of scope until the later decode line.
- `1.1.6` admits std `x86`/`x86_64` SSSE3/SSE4.1 encode dispatch for Standard
and URL-safe alphabet families. It processes fixed 12-byte blocks with vector
code after runtime CPU probing. Scalar remains the fallback for unsupported
CPUs, `no_std`, custom alphabets, in-place encode, line-ending insertion,
and every decode path. Final tail and padding completion use scalar code.
Wrapped encode helpers may use
the admitted backend for their unwrapped staging step when the normal
`encode_slice` admission conditions are met.
- `1.1.7` admits std `x86`/`x86_64` AVX2 encode dispatch for Standard and
URL-safe alphabet families. AVX2 is selected before SSSE3/SSE4.1 when runtime
CPU probing proves `avx2`; otherwise the existing SSSE3/SSE4.1 or scalar
fallback path is used. Final tail and padding completion use scalar code.
Custom alphabets, `no_std`, in-place encode, line-ending insertion, and every
decode path remain scalar. Wrapped encode helpers may use admitted fixed-block encode for their unwrapped staging
step.
- `1.2.0` is the release where encode acceleration became fully working for
the admitted encode scope. Public encode APIs dispatch to admitted
AVX-512 VBMI, AVX2, SSSE3/SSE4.1, or NEON encode backends when runtime policy
and CPU features allow it, and fall back to scalar for unsupported CPUs,
`no_std`, custom alphabets unless separately admitted, in-place encode,
line-ending insertion, legacy profiles, tails, and padding. Wrapped encode
helpers may use admitted SIMD for the unwrapped staging step when they route
through `encode_slice`. Backends without complete evidence remain real
non-dispatchable prototypes.
- `1.2.1` is a documentation/package patch for the released `1.2.0` encode
acceleration scope. It does not admit additional backends.
- `1.2.2` is an encode ergonomics and sanitization hardening patch that adds
explicit infallible encode convenience helpers and tightens fixed-size locked
secret decode cleanup. It does not admit additional backends.
- `1.2.3` updated the optional `base64-ng-sanitization` companion dependency
to `sanitization` `1.2.2` and synced workspace package metadata.
- After the `1.2.x` encode release, pause feature work for a short soak period
so users can report platform-specific encode regressions before decode
acceleration work starts.
- `1.2.x` is the SIMD decode foundation series. Decode prototypes remain
non-dispatchable while invalid-input handling, canonicality, padding, output
retention, error behavior, fuzz coverage, and timing-oriented evidence are
proven against scalar behavior.
- `1.3.0` is the first release that activates SIMD decode acceleration after
the decode evidence line completed and the encode acceleration line remained
stable.
- The admitted `1.3.0` decode backends are std `x86`/`x86_64` AVX-512 VBMI
first, then AVX2, then SSSE3/SSE4.1 strict decode, plus little-endian std
`aarch64` NEON strict decode for Standard and URL-safe alphabet families.
They validate the complete input with the scalar decoder first so public
error shape and indexes remain scalar-compatible, then use fixed 64-byte
AVX-512 VBMI, fixed 32-byte AVX2, fixed 16-byte SSSE3/SSE4.1, or fixed
16-byte NEON encoded blocks where possible. Tails and every unsupported
decode surface remain scalar.
The `1.3.0` decode scope is frozen to strict
Standard and URL-safe decode only, padded and unpadded, through the normal
strict decode backend boundary. Wrapped decode may use admitted strict decode
after scalar line-profile validation and line-ending compaction. Legacy
whitespace decode may use the admitted strict decode boundary after scalar
whitespace compaction. Strict in-place decode may use admitted strict decode
backends only after stack staging. Custom alphabets, bcrypt-style and
`crypt(3)` profiles, `no_std` SIMD dispatch, broader wasm/browser runtime
dispatch, and the `base64_ng::ct` constant-time-oriented secret decode path
remain scalar unless separately admitted with their own evidence package.
The detailed `1.2.3` to `1.3.0` workflow was commit-based rather than
tag-based. Each planned commit was followed by pentest and CI review before the
next implementation commit started. See
[`docs/PLAN.md`](PLAN.md#commit-based-123-to-130-completion-plan) for the
completed sequence and `1.3.0` acceptance criteria.
Patch releases in the `1.1.x` and `1.2.x` series may be small by design. Each
patch should move one evidence boundary forward without changing the active
runtime behavior for that line.
## Current Status
- Default builds compile audited unsafe cleanup, CT barrier, and comparison
helpers; scalar encode/decode remains safe Rust.
- `scripts/validate-unsafe-boundary.sh` verifies that `allow(unsafe_code)` is
confined to the reviewed cleanup, CT, and SIMD helper files.
- `docs/UNSAFE.md` inventories every current unsafe site and its invariants.
- The scalar implementation is the reference behavior.
- Encode and normal strict decode entry points pass through internal backend
boundaries. In-place encode may use admitted encode backends only after
stack staging protects unread input bytes. Strict decode may use the
admitted AVX-512 VBMI, AVX2, or SSSE3/SSE4.1 backend on std x86/x86_64
builds, or the admitted NEON backend on little-endian std AArch64 builds,
with the `simd` feature; every unsupported decode surface still falls back to
scalar.
- With the `simd` feature enabled, the private dispatch scaffold detects
AVX-512 VBMI, AVX2, SSSE3/SSE4.1, NEON, and wasm `simd128` candidates.
Only std `x86`/`x86_64` AVX-512 VBMI, AVX2, SSSE3/SSE4.1, and
little-endian std `aarch64` NEON encode can become active; normal strict
decode may also use the admitted x86/x86_64 or little-endian AArch64 backend
through its separate decode boundary. All other candidates still execute
scalar code.
- Admitted SIMD encode paths run only when the current input can fill at least
one block for the selected backend: 48 bytes for AVX-512 VBMI, 24 bytes for
AVX2, and 12 bytes for SSSE3/SSE4.1 or NEON. Shorter inputs use scalar encode
before SIMD dispatch, and non-block tails remain scalar.
- Public slice, clear-tail, alloc, and wrapped encode helpers route through the
admitted encode boundary. For wrapped encode, SIMD applies only to the
unwrapped Base64 staging step; line-ending insertion remains scalar.
- Public strict `decode_slice`, `decode_slice_clear_tail`, `decode_buffer`, and
alloc strict decode helpers route through the decode boundary. AVX-512 VBMI
decode applies only to full 64-byte encoded blocks after scalar whole-input
validation and falls back to AVX2, SSSE3/SSE4.1, or scalar for shorter
inputs; little-endian AArch64 NEON decode applies only to full 16-byte
encoded blocks after scalar whole-input validation. Public strict decode
supports every valid encoded length; short inputs and non-block tails are
decoded by scalar code. Wrapped decode may use admitted strict decode after
scalar line-profile validation and line-ending compaction. Legacy whitespace
decode may use the admitted strict decode boundary after scalar whitespace
compaction. Strict in-place decode may use admitted strict decode backends
only after stack staging. CT secret decode, custom alphabets, and big-endian
AArch64 remain scalar.
- AVX-512 VBMI encode is admitted for std `x86`/`x86_64` Standard and URL-safe
alphabet families. It uses AVX-512 lane-local byte shuffling, vector
shifts/masks, and VBMI byte permutes over the alphabet table for fixed
48-byte input blocks, then clears ZMM/YMM state before returning. Runtime
dispatch uses `std::is_x86_feature_detected!` and requires `avx512f`,
`avx512bw`, `avx512vl`, and `avx512vbmi`; unsupported CPUs fall back to
AVX2, SSSE3/SSE4.1, or scalar. Final tail and padding completion use scalar
code. Custom alphabets, `no_std`, line-ending insertion, and every decode
surface outside the separate AVX-512/AVX2/SSSE3/SSE4.1/NEON strict decode
admission stay scalar. In-place encode may enter only through stack staging.
- Runtime backend identifiers expose their required CPU feature bundles through
`runtime::Backend::required_cpu_features()`.
- Runtime backend reports include `candidate_required_cpu_features=[...]` in
their stable key/value display output for audit logs.
- Runtime backend reports include `candidate_detection_mode=...` so logs show
whether a SIMD candidate came from runtime CPU feature probing or from
compile-time target features.
- Runtime backend reports expose `snapshot()` for structured audit logging
without parsing formatted strings.
- SSSE3/SSE4.1 encode is admitted for std `x86`/`x86_64` Standard and
URL-safe alphabet families. It uses SSSE3 byte shuffling, SSE lane
shifts/masks, and SSE4.1 byte blending for fixed 12-byte input blocks, then
clears XMM registers before returning. Runtime dispatch uses
`std::is_x86_feature_detected!`; unsupported CPUs execute scalar code.
Custom alphabets, final tail/padding completion, `no_std`, line-ending
insertion, and every decode surface outside the separate
AVX-512/AVX2/SSSE3/SSE4.1/NEON strict decode admission stay scalar.
In-place encode may enter admitted encode backends only through stack
staging.
- AVX2 encode is admitted for std `x86`/`x86_64` Standard and URL-safe alphabet
families. It uses AVX2 lane-local byte shuffling, vector shifts/masks, and
byte blending for fixed 24-byte input blocks, then clears XMM/YMM state
before returning. Runtime dispatch uses `std::is_x86_feature_detected!`;
unsupported CPUs fall back to SSSE3/SSE4.1 or scalar. Final tail and padding
completion use scalar code. Custom alphabets, `no_std`, line-ending
insertion, and every decode surface outside the separate
AVX-512/AVX2/SSSE3/SSE4.1/NEON strict decode admission stay scalar.
In-place encode may enter admitted encode backends only through stack
staging.
- AArch64 NEON encode is admitted for little-endian std `aarch64` Standard and
URL-safe alphabet families. It uses NEON table lookup, vector shifts/masks,
and byte-select alphabet mapping for fixed 12-byte input blocks, then clears
used NEON registers before returning. NEON is mandatory for the admitted
little-endian AArch64 target. Final tail and padding completion use scalar
code. Custom alphabets, big-endian AArch64, 32-bit `arm+neon`, `no_std`,
line-ending insertion, and every decode surface outside the separate
AVX-512/AVX2/SSSE3/SSE4.1/NEON strict decode admission stay scalar. In-place
encode may enter only through stack staging.
- The non-standard encode surface review keeps alphabet and line-wrapping
claims narrow. In-place encode enters admitted encode backends only through
stack staging so overlapping output never overwrites unread input. Bcrypt,
`crypt(3)`, custom alphabets, and other non-Standard-family alphabets remain
scalar because accelerated alphabet mapping has not been separately proven.
Wrapped encode may still use the admitted unwrapped staging step, but
line-ending insertion is scalar. `no_std` runtime dispatch remains scalar.
- wasm `simd128` is admitted in `1.3.3` for wasm32 binaries compiled with
`target-feature=+simd128`, the `simd` feature, and
`allow-wasm32-best-effort-wipe`. The admitted scope is Standard and URL-safe
public encode plus normal strict decode. It uses wasm byte shuffling, vector
shifts/masks, and branchless Standard-family alphabet mapping for fixed
12-byte encode input blocks and fixed 16-byte decode input blocks after
whole-input scalar validation. Custom alphabets remain scalar because
portable wasm SIMD does not provide a direct 64-byte alphabet lookup.
Wasm encode stages vector output and compares it against scalar output before
copying to caller output. Node/V8, Wasmtime, Chromium-family browser,
Firefox/SpiderMonkey, and Safari/WebKit runtime smoke evidence proves active backend reporting,
deterministic length sweeps, independent scalar reference encode checks, malformed-input
rejection, and round trips for the admitted profile; this is not a
browser-wide timing, register-retention, or cleanup guarantee. The
release-facing decision is
tracked in
[WASM_SIMD128_RUNTIME_REVIEW.md](WASM_SIMD128_RUNTIME_REVIEW.md).
- Big-endian and RISC-V acceleration work is tracked as a QEMU-first evidence
path. QEMU user-mode evidence can prove functional correctness and
scalar/fallback behavior for targets such as `s390x-unknown-linux-gnu` and
`riscv64gc-unknown-linux-gnu`, but it is not hardware performance, timing,
microarchitectural, register-retention, or side-channel evidence. Until real
hardware reports are linked, any such backend must be documented as QEMU-tested and community-hardware evidence requested.
- `runtime::backend_report()` reports the active backend, detected candidate,
detection mode, SIMD feature status, security posture, and a
conservative unsafe-boundary posture flag. The flag is true only when the
reserved `simd` feature is disabled; SIMD-enabled builds include additional
private prototype boundaries and must use the release evidence scripts for
boundary validation.
- On `x86`/`x86_64` with `std`, candidate detection uses
`std::is_x86_feature_detected!` runtime CPU probing. On `no_std`, wasm, and
current ARM builds, candidate detection is compile-time target-feature
reporting. A binary compiled with `-C target-feature=+avx2` can therefore
report an AVX2 candidate even if it is deployed on a CPU that cannot execute
AVX2 instructions. Active x86/x86_64 encode dispatch is std runtime-probed
only; active AArch64 NEON encode dispatch is little-endian std-only and
relies on the mandatory AArch64 NEON target contract. Any future `no_std`
SIMD activation must require an explicit caller-side CPU contract or remain
disabled where runtime probing is unavailable.
- `runtime::require_backend_policy()` allows deployments to enforce scalar
execution, disabled SIMD features, or no detected SIMD candidate.
- `BackendPolicy::HighAssuranceScalarOnly` combines scalar execution, disabled
SIMD features, no detected SIMD candidate, unsafe-boundary enforcement, and a
CT result gate classified as an attested hardware speculation barrier. It
rejects targets that report an unattested hardware barrier, ordering fence,
or compiler fence. On AArch64, the crate emits `isb sy` plus CSDB hint code
but reports `hardware-speculation-barrier-unattested` because deployments
must attest whether that hint is effective on their specific core. Builds
using the explicit `base64_ng_aarch64_csdb_attested` cfg report
`hardware-speculation-barrier-build-asserted` so audit logs show the posture
came from deployment evidence rather than a native target guarantee. On RISC-V,
the reported CT gate is intentionally only `ordering-fence`; the base ISA
does not provide a canonical Spectre-v1 speculation barrier, so
platform-level mitigations are required for that threat model.
- Runtime backend, posture, and policy enums provide stable string identifiers
for logs and release evidence.
- Runtime backend reports and policy failures format as stable key/value
strings suitable for CI and audit logs.
- Unit tests compare dispatch behavior against the scalar reference for
canonical inputs, malformed inputs, and undersized output buffers.
- The `simd` feature enables only the admitted std x86/x86_64 AVX-512 VBMI,
AVX2, SSSE3/SSE4.1, little-endian std aarch64 NEON, and narrow wasm
`simd128` encode paths where the platform requirements are met.
- Current `1.2.x` development keeps every non-admitted backend scalar or
prototype-only unless the SIMD admission manifest, scalar differential tests,
fuzz evidence, unsafe inventory, architecture evidence, benchmark evidence,
and release wording are updated together.
- Decode acceleration is higher risk than encode acceleration because the
accelerated path must match scalar behavior for invalid bytes, padding
placement, non-canonical trailing bits, undersized outputs, partial-output
cleanup, and public error behavior. No decode backend may dispatch until
those properties are covered by tests, fuzz evidence, generated-code review,
unsafe inventory, hardware evidence where applicable, and release wording.
- CI checks the reserved `simd` feature in `no_std` mode for x86_64, aarch64,
FreeBSD, wasm32, and Cortex-M targets.
- Performance claims must be backed by local benchmark evidence, not roadmap
language.
Run the same target check locally for every installed target:
```sh
scripts/check_targets.sh
```
Run a specific target:
```sh
scripts/check_targets.sh aarch64-unknown-linux-gnu
```
Compile-check the reserved SIMD feature bundles:
```sh
scripts/check_simd_feature_bundles.sh
```
This does not execute native accelerated code. It proves the reserved AVX2,
AVX-512, SSSE3/SSE4.1, NEON, and wasm `simd128` feature-gated code still
compiles under `no_std` when the corresponding Rust targets are installed. For
wasm `simd128`, it also builds the wasm test binaries with `simd128` enabled so
the admitted fixed-block wasm code is checked; runtime execution is covered by
`scripts/check_wasm_runtime_dispatch.sh` when Node/V8 and Wasmtime are
installed, by `scripts/check_wasm_browser_dispatch.sh` when a Chromium-family
browser is installed, by `scripts/check_wasm_browser_firefox_dispatch.sh`
when Firefox plus `geckodriver` are installed, and by
`scripts/check_wasm_browser_safari_dispatch.sh` on macOS with Safari remote
automation enabled.
Capture local backend and prototype evidence:
```sh
scripts/check_backend_evidence.sh
```
This prints the runtime backend-report test and runs the gated SIMD
scalar-equivalence tests with `--nocapture`, so local CPU evidence is easy to
copy into release notes or issue discussion. On x86/x86_64 hosts with AVX-512
VBMI, AVX2, or SSSE3/SSE4.1, and on aarch64 hosts with NEON, the runtime report
may show admitted encode acceleration as active. On 32-bit ARM, NEON remains
scaffold evidence. The script also writes
`target/release-evidence/backend/MANIFEST.txt` with toolchain metadata,
commands, status values, artifact checksums, and explicit
`prototype_state=real-non-dispatchable` labels for prototype-only backends,
admitted strict decode status labels for AVX-512 VBMI, AVX2, SSSE3/SSE4.1,
NEON, and wasm `simd128`, and
`active_backend_admitted=avx512-vbmi-or-avx2-or-ssse3-sse4.1-or-neon-or-wasm-simd128-encode`
for admitted encode backends. The runtime report also exposes
`BackendReport::active_decode_backend()` so release evidence can distinguish
the narrower AVX-512/AVX2/SSSE3/SSE4.1/NEON/wasm strict decode admission from
the active encode backend.
Capture generated assembly evidence for x86 encode paths:
```sh
scripts/generate_simd_asm_evidence.sh
```
The script emits release test-harness assembly for the admitted AVX-512 VBMI,
AVX2, and SSSE3/SSE4.1 encode/decode paths, then checks for expected vector and
cleanup instructions. When the `aarch64-unknown-linux-gnu` target is installed,
it also emits AArch64 NEON assembly evidence and checks table lookup,
bit-select, decode packing, and cleanup instructions. Cross-host runs record
NEON library assembly and compile evidence; real AArch64 hosts must also run
`scripts/check_aarch64_linux.sh` or `scripts/check_macos.sh` for test-harness
execution evidence.
## Required Before SIMD Code Lands
Any broader wasm `simd128` runtime/browser profile, additional decode backend,
custom alphabet, in-place extension, or additional runtime-dispatch
implementation must include the surface ledger in
[SIMD_NON_STANDARD_SURFACE_REVIEW.md](SIMD_NON_STANDARD_SURFACE_REVIEW.md) and
must include:
- Completion of
[SIMD_ACTIVATION_CHECKLIST.md](SIMD_ACTIVATION_CHECKLIST.md) before the
backend is wired into dispatch.
- The dedicated `src/simd/` boundary for all architecture-specific code.
- Crate-level `deny(unsafe_code)` must continue to reject unsafe outside the
volatile wipe helpers and SIMD module.
- A local safety comment for every unsafe block.
- Deterministic differential tests against scalar encode/decode behavior.
- Fuzz differential coverage for strict and legacy-compatible inputs where
applicable.
- Runtime dispatch tests that prove unsupported CPUs fall back to scalar.
- Miri coverage for scalar and dispatch-level code that Miri can execute.
- Architecture-specific CI evidence or documented local evidence for each
enabled target.
- Benchmark evidence that reports hardware, OS, Rust version, command, and raw
output.
## Admission Gate
`scripts/validate-simd-admission.sh` keeps SIMD dispatch limited to admitted
backends. The gate currently requires:
- `ActiveBackend` to expose only `Scalar` plus the std x86/x86_64 AVX-512
VBMI, AVX2, SSSE3/SSE4.1, little-endian std aarch64 NEON, and narrow wasm
`simd128` encode variants.
- `active_backend()` to return AVX-512 VBMI before AVX2 before SSSE3/SSE4.1
only after std runtime CPU probing, and scalar otherwise.
- No generic SIMD dispatch variants in source.
- `docs/SIMD_ADMISSION.md` to record the admitted AVX-512 VBMI, AVX2,
SSSE3/SSE4.1, NEON, and wasm `simd128` encode scope and keep all other
backends prototype-only.
- Documentation for benchmark evidence, release-note restrictions, and
vector-register retention cleanup strategy to remain packaged.
- The encode admission draft to remain packaged and validated before any future
encode dispatch scope expands beyond the currently admitted native and narrow
wasm backends.
When an accelerated backend is ready for admission, update this gate in the
same commit as the scalar differential tests, fuzz evidence, unsafe inventory,
benchmark evidence, and release notes. For encode acceleration, start from
[SIMD_ENCODE_ADMISSION_DRAFT.md](SIMD_ENCODE_ADMISSION_DRAFT.md) and keep any
backend not fully proven in the candidate-only state.
The draft is guarded by `scripts/validate-simd-encode-admission-draft.sh` so
runtime report expectations, benchmark template fields, release-note precision,
and architecture-specific blockers do not drift while later encode backends
remain pending.
## Dispatch Rules
- Scalar remains the fallback for every build.
- Candidate detection must not imply activation; a detected candidate may still
execute scalar until the accelerated backend is admitted.
- The active non-scalar backends in the `1.2.x` encode line are std
x86/x86_64 AVX-512 VBMI encode, AVX2 encode, SSSE3/SSE4.1 encode, and std
little-endian aarch64 NEON encode for Standard and URL-safe alphabet
families. The
`1.3.0` decode admission is separate: std x86/x86_64 AVX-512 VBMI first,
then AVX2, then SSSE3/SSE4.1, plus little-endian std aarch64 NEON strict
decode only.
- Prototype functions may exercise target-feature and unsafe plumbing without
being eligible for dispatch.
- Runtime CPU detection may be used only behind `std`.
- Compile-time target-feature paths must be explicit and documented.
- Unsupported CPU features must never panic at runtime.
- SIMD paths must preserve strict error indexes, canonical padding rejection,
and output sizing behavior.
## Release Rule
Do not advertise SIMD acceleration in release notes until accelerated code is
actually enabled, tested, and measured for that release.