base64-ng 1.2.1

# SIMD Admission Policy

`base64-ng` is scalar by default and admits conservative accelerated encode
paths in the `1.2.x` line: std `x86`/`x86_64` AVX-512 VBMI first, then
AVX2, then SSSE3/SSE4.1, plus std `aarch64` NEON, for Standard and URL-safe
alphabet families. Future SIMD dispatch remains gated
unless a complete SIMD admission evidence package lands in the same release
commit as the active backend change. The crate uses `#![deny(unsafe_code)]` and permits
reviewed `allow(unsafe_code)` exceptions only for audited cleanup in
`src/cleanup.rs`, CT comparison, byte accumulation, CT scan, and CT result-gate
helpers in `src/ct/`, and the private `src/simd/` boundary.

This is a security decision, not a rejection of hardware acceleration. SIMD
must be added only when it can be isolated, tested, and reviewed without
weakening the scalar trust base.

## Version Roadmap

The SIMD roadmap separates implementation evidence from active acceleration:

- `1.1.x` is the SIMD encode foundation and admission-candidate series. Early
  checkpoints contain real fixed-block encode prototypes for SSSE3/SSE4.1,
  AVX2, AVX-512 VBMI, NEON, and wasm `simd128`, plus scalar-equivalence tests,
  generated assembly evidence, register-cleanup review, fuzz expansion, and
  admission-tooling updates. Later checkpoints wire admitted encode backends
  into public encode APIs while keeping each checkpoint gated by pentest, CI,
  and release evidence. GitHub checkpoint tags in this line moved evidence
  forward without a matching crates.io publish until the `1.2.0` family sync.
- `1.1.5` adds the public encode backend boundary while still forcing scalar
  execution. This gives future accelerated encode admission one reviewed
  integration point for `encode_slice`, clear-tail helpers, alloc helpers,
  wrapped helpers, and in-place encode. The same checkpoint also adds a
  scalar-forced decode backend boundary for symmetry; decode acceleration
  remains out of scope until the later decode line.
- `1.1.6` admits std `x86`/`x86_64` SSSE3/SSE4.1 encode dispatch for Standard
  and URL-safe alphabet families. It processes fixed 12-byte blocks with vector
  code after runtime CPU probing. Scalar remains the fallback for unsupported
  CPUs, `no_std`, custom alphabets, tails, padding, in-place encode,
  line-ending insertion, and every decode path. Wrapped encode helpers may use
  the admitted backend for their unwrapped staging step when the normal
  `encode_slice` admission conditions are met.
- `1.1.7` admits std `x86`/`x86_64` AVX2 encode dispatch for Standard and
  URL-safe alphabet families. AVX2 is selected before SSSE3/SSE4.1 when runtime
  CPU probing proves `avx2`; otherwise the existing SSSE3/SSE4.1 or scalar
  fallback path is used. Tails, padding, custom alphabets, `no_std`, in-place
  encode, line-ending insertion, and every decode path remain scalar. Wrapped
  encode helpers may use admitted fixed-block encode for their unwrapped staging
  step.
- `1.2.0` is the release where encode acceleration became fully working for
  the admitted encode scope. Public encode APIs dispatch to admitted
  AVX-512 VBMI, AVX2, SSSE3/SSE4.1, or NEON encode backends when runtime policy
  and CPU features allow it, and fall back to scalar for unsupported CPUs,
  `no_std`, custom alphabets unless separately admitted, in-place encode,
  line-ending insertion, legacy profiles, tails, and padding. Wrapped encode
  helpers may use admitted SIMD for the unwrapped staging step when they route
  through `encode_slice`. Backends without complete evidence remain real
  non-dispatchable prototypes.
- `1.2.1` is a documentation/package patch for the released `1.2.0` encode
  acceleration scope. It does not admit additional backends.
- After the `1.2.x` encode release, pause feature work for a short soak period
  so users can report platform-specific encode regressions before decode
  acceleration work starts.
- `1.2.x` is the SIMD decode foundation series. Decode prototypes remain
  non-dispatchable while invalid-input handling, canonicality, padding, output
  retention, error behavior, fuzz coverage, and timing-oriented evidence are
  proven against scalar behavior.
- `1.3.0` is the first release that may activate SIMD decode acceleration if
  the `1.2.x` decode evidence line is complete and the encode acceleration line
  has remained stable.

Patch releases in the `1.1.x` and `1.2.x` series may be small by design. Each
patch should move one evidence boundary forward without changing the active
runtime behavior for that line.

## Current Status

- Default builds compile audited unsafe cleanup, CT barrier, and comparison
  helpers; scalar encode/decode remains safe Rust.
- `scripts/validate-unsafe-boundary.sh` verifies that `allow(unsafe_code)` is
  confined to the reviewed cleanup, CT, and SIMD helper files.
- `docs/UNSAFE.md` inventories every current unsafe site and its invariants.
- The scalar implementation is the reference behavior.
- Encode and decode entry points pass through internal backend boundaries.
  Decode and in-place encode are still backed only by the scalar implementation.
- With the `simd` feature enabled, the private dispatch scaffold detects
  AVX-512 VBMI, AVX2, SSSE3/SSE4.1, NEON, and wasm `simd128` candidates.
  Only std `x86`/`x86_64` AVX-512 VBMI, AVX2, SSSE3/SSE4.1, and std
  `aarch64` NEON encode can become active; all other candidates still execute
  scalar code.
- Admitted SIMD encode paths run only when the current input can fill at least
  one block for the selected backend: 48 bytes for AVX-512 VBMI, 24 bytes for
  AVX2, and 12 bytes for SSSE3/SSE4.1 or NEON. Shorter inputs use scalar encode
  before SIMD dispatch, and non-block tails remain scalar.
- Public slice, clear-tail, alloc, and wrapped encode helpers route through the
  admitted encode boundary. For wrapped encode, SIMD applies only to the
  unwrapped Base64 staging step; line-ending insertion remains scalar.
- AVX-512 VBMI encode is admitted for std `x86`/`x86_64` Standard and URL-safe
  alphabet families. It uses AVX-512 lane-local byte shuffling, vector
  shifts/masks, and VBMI byte permutes over the alphabet table for fixed
  48-byte input blocks, then clears ZMM/YMM state before returning. Runtime
  dispatch uses `std::is_x86_feature_detected!` and requires `avx512f`,
  `avx512bw`, `avx512vl`, and `avx512vbmi`; unsupported CPUs fall back to
  AVX2, SSSE3/SSE4.1, or scalar. Custom alphabets, tails, padding, `no_std`,
  in-place encode, line-ending insertion, and decode stay scalar.
- Runtime backend identifiers expose their required CPU feature bundles through
  `runtime::Backend::required_cpu_features()`.
- Runtime backend reports include `candidate_required_cpu_features=[...]` in
  their stable key/value display output for audit logs.
- Runtime backend reports include `candidate_detection_mode=...` so logs show
  whether a SIMD candidate came from runtime CPU feature probing or from
  compile-time target features.
- Runtime backend reports expose `snapshot()` for structured audit logging
  without parsing formatted strings.
- SSSE3/SSE4.1 encode is admitted for std `x86`/`x86_64` Standard and
  URL-safe alphabet families. It uses SSSE3 byte shuffling, SSE lane
  shifts/masks, and SSE4.1 byte blending for fixed 12-byte input blocks, then
  clears XMM registers before returning. Runtime dispatch uses
  `std::is_x86_feature_detected!`; unsupported CPUs execute scalar code.
  Custom alphabets, tails, padding, `no_std`, in-place encode, line-ending
  insertion, and decode stay scalar.
- AVX2 encode is admitted for std `x86`/`x86_64` Standard and URL-safe alphabet
  families. It uses AVX2 lane-local byte shuffling, vector shifts/masks, and
  byte blending for fixed 24-byte input blocks, then clears XMM/YMM state
  before returning. Runtime dispatch uses `std::is_x86_feature_detected!`;
  unsupported CPUs fall back to SSSE3/SSE4.1 or scalar. Custom alphabets, tails,
  padding, `no_std`, in-place encode, line-ending insertion, and decode stay
  scalar.
- AArch64 NEON encode is admitted for std `aarch64` Standard and URL-safe
  alphabet families. It uses NEON table lookup, vector shifts/masks, and
  byte-select alphabet mapping for fixed 12-byte input blocks, then clears used
  NEON registers before returning. NEON is mandatory for the admitted AArch64
  target. Custom alphabets, tails, padding, 32-bit `arm+neon`, `no_std`,
  in-place encode, line-ending insertion, and decode stay scalar.
- An inactive wasm `simd128` fixed-block encode prototype exists behind the
  same boundary as real non-dispatchable vector encode evidence for Standard
  and URL-safe alphabets. It uses wasm byte shuffling, vector shifts/masks, and
  branchless Standard-family alphabet mapping for fixed 12-byte input blocks.
  Custom alphabets remain scalar scaffold paths because portable wasm SIMD does
  not provide a direct 64-byte alphabet lookup. The wasm feature-bundle check
  builds wasm test binaries with `target-feature=+simd128`; this is compile and
  codegen evidence only, not a runtime/JIT timing or register-retention claim.
  Runtime backend selection remains scalar for wasm.
- `runtime::backend_report()` reports the active backend, detected candidate,
  detection mode, SIMD feature status, security posture, and a
  conservative unsafe-boundary posture flag. The flag is true only when the
  reserved `simd` feature is disabled; SIMD-enabled builds include additional
  private prototype boundaries and must use the release evidence scripts for
  boundary validation.
- On `x86`/`x86_64` with `std`, candidate detection uses
  `std::is_x86_feature_detected!` runtime CPU probing. On `no_std`, wasm, and
  current ARM builds, candidate detection is compile-time target-feature
  reporting. A binary compiled with `-C target-feature=+avx2` can therefore
  report an AVX2 candidate even if it is deployed on a CPU that cannot execute
  AVX2 instructions. Active x86/x86_64 encode dispatch is std runtime-probed
  only; active AArch64 NEON encode dispatch is std-only and relies on the
  mandatory AArch64 NEON target contract. Any future `no_std` SIMD activation
  must require an explicit caller-side CPU contract or remain disabled where
  runtime probing is unavailable.
- `runtime::require_backend_policy()` allows deployments to enforce scalar
  execution, disabled SIMD features, or no detected SIMD candidate.
- `BackendPolicy::HighAssuranceScalarOnly` combines scalar execution, disabled
  SIMD features, no detected SIMD candidate, unsafe-boundary enforcement, and a
  CT result gate classified as an attested hardware speculation barrier. It
  rejects targets that report an unattested hardware barrier, ordering fence,
  or compiler fence. On AArch64, the crate emits `isb sy` plus CSDB hint code
  but reports `hardware-speculation-barrier-unattested` because deployments
  must attest whether that hint is effective on their specific core. Builds
  using the explicit `base64_ng_aarch64_csdb_attested` cfg report
  `hardware-speculation-barrier-build-asserted` so audit logs show the posture
  came from deployment evidence rather than a native target guarantee. On RISC-V,
  the reported CT gate is intentionally only `ordering-fence`; the base ISA
  does not provide a canonical Spectre-v1 speculation barrier, so
  platform-level mitigations are required for that threat model.
- Runtime backend, posture, and policy enums provide stable string identifiers
  for logs and release evidence.
- Runtime backend reports and policy failures format as stable key/value
  strings suitable for CI and audit logs.
- Unit tests compare dispatch behavior against the scalar reference for
  canonical inputs, malformed inputs, and undersized output buffers.
- The `simd` feature enables only the admitted std x86/x86_64 AVX-512 VBMI,
  AVX2, SSSE3/SSE4.1, and std aarch64 NEON encode paths where the platform
  requirements are met.
- Current `1.2.x` development keeps every non-admitted backend scalar or
  prototype-only unless the SIMD admission manifest, scalar differential tests,
  fuzz evidence, unsafe inventory, architecture evidence, benchmark evidence,
  and release wording are updated together.
- CI checks the reserved `simd` feature in `no_std` mode for x86_64, aarch64,
  FreeBSD, wasm32, and Cortex-M targets.
- Performance claims must be backed by local benchmark evidence, not roadmap
  language.

Run the same target check locally for every installed target:

```sh
scripts/check_targets.sh
```

Run a specific target:

```sh
scripts/check_targets.sh aarch64-unknown-linux-gnu
```

Compile-check the reserved SIMD feature bundles:

```sh
scripts/check_simd_feature_bundles.sh
```

This does not execute accelerated code. It proves the reserved AVX2,
AVX-512, SSSE3/SSE4.1, NEON, and wasm `simd128` feature-gated code still
compiles under `no_std` when the corresponding Rust targets are installed. For
wasm `simd128`, it also builds the wasm test binaries with `simd128` enabled so
the inactive prototype body is checked without requiring a wasm runtime.

Capture local backend and prototype evidence:

```sh
scripts/check_backend_evidence.sh
```

This prints the runtime backend-report test and runs the gated SIMD
scalar-equivalence tests with `--nocapture`, so local CPU evidence is easy to
copy into release notes or issue discussion. On x86/x86_64 hosts with AVX-512
VBMI, AVX2, or SSSE3/SSE4.1, and on aarch64 hosts with NEON, the runtime report
may show admitted encode acceleration as active. On 32-bit ARM, NEON remains
scaffold evidence. The script also writes
`target/release-evidence/backend/MANIFEST.txt` with toolchain metadata,
commands, status values, artifact checksums, and explicit
`prototype_state=real-non-dispatchable` labels for prototype-only backends and
`active_backend_admitted=avx512-vbmi-or-avx2-or-ssse3-sse4.1-or-neon-encode` for admitted encode
backends.

Capture generated assembly evidence for x86 encode paths:

```sh
scripts/generate_simd_asm_evidence.sh
```

The script emits release test-harness assembly for the admitted AVX-512 VBMI,
AVX2, and SSSE3/SSE4.1 encode paths, then checks for expected vector and
cleanup instructions. When the `aarch64-unknown-linux-gnu` target is installed,
it also emits AArch64 NEON assembly evidence and checks table lookup,
bit-select, and cleanup instructions.

## Required Before SIMD Code Lands

Any wasm `simd128`, decode, custom alphabet, in-place, or additional
runtime-dispatch implementation
must include:

- Completion of
  [SIMD_ACTIVATION_CHECKLIST.md](SIMD_ACTIVATION_CHECKLIST.md) before the
  backend is wired into dispatch.
- The dedicated `src/simd/` boundary for all architecture-specific code.
- Crate-level `deny(unsafe_code)` must continue to reject unsafe outside the
  volatile wipe helpers and SIMD module.
- A local safety comment for every unsafe block.
- Deterministic differential tests against scalar encode/decode behavior.
- Fuzz differential coverage for strict and legacy-compatible inputs where
  applicable.
- Runtime dispatch tests that prove unsupported CPUs fall back to scalar.
- Miri coverage for scalar and dispatch-level code that Miri can execute.
- Architecture-specific CI evidence or documented local evidence for each
  enabled target.
- Benchmark evidence that reports hardware, OS, Rust version, command, and raw
  output.

## Admission Gate

`scripts/validate-simd-admission.sh` keeps SIMD dispatch limited to admitted
backends. The gate currently requires:

- `ActiveBackend` to expose only `Scalar` plus the std x86/x86_64 AVX-512
  VBMI, AVX2, SSSE3/SSE4.1, and std aarch64 NEON encode variants.
- `active_backend()` to return AVX-512 VBMI before AVX2 before SSSE3/SSE4.1
  only after std runtime CPU probing, and scalar otherwise.
- No accelerated `ActiveBackend::Wasm*` or generic SIMD dispatch variants in
  source.
- `docs/SIMD_ADMISSION.md` to record the admitted AVX-512 VBMI, AVX2,
  SSSE3/SSE4.1, and NEON encode scope and keep all other backends
  prototype-only.
- Documentation for benchmark evidence, release-note restrictions, and
  vector-register retention cleanup strategy to remain packaged.
- The encode admission draft to remain packaged and validated before any future
  encode dispatch scope expands beyond the currently admitted `1.2.x` backends.

When an accelerated backend is ready for admission, update this gate in the
same commit as the scalar differential tests, fuzz evidence, unsafe inventory,
benchmark evidence, and release notes. For encode acceleration, start from
[SIMD_ENCODE_ADMISSION_DRAFT.md](SIMD_ENCODE_ADMISSION_DRAFT.md) and keep any
backend not fully proven in the candidate-only state.
The draft is guarded by `scripts/validate-simd-encode-admission-draft.sh` so
runtime report expectations, benchmark template fields, release-note precision,
and architecture-specific blockers do not drift while later encode backends
remain pending.

## Dispatch Rules

- Scalar remains the fallback for every build.
- Candidate detection must not imply activation; a detected candidate may still
  execute scalar until the accelerated backend is admitted.
- The active non-scalar backends in the `1.2.x` line are std x86/x86_64
  AVX-512 VBMI encode, AVX2 encode, SSSE3/SSE4.1 encode, and std aarch64 NEON
  encode for Standard and URL-safe alphabet families.
- Prototype functions may exercise target-feature and unsafe plumbing without
  being eligible for dispatch.
- Runtime CPU detection may be used only behind `std`.
- Compile-time target-feature paths must be explicit and documented.
- Unsupported CPU features must never panic at runtime.
- SIMD paths must preserve strict error indexes, canonical padding rejection,
  and output sizing behavior.

## Release Rule

Do not advertise SIMD acceleration in release notes until accelerated code is
actually enabled, tested, and measured for that release.