basexx 0.2.0

basexx utilities. base32, base64, base58, base56...
Documentation
# Source Code Review Report for `basexx`

This document provides a comprehensive source code review of the `basexx` Rust library. The project has a high standard of implementation, incorporating SIMD acceleration (SSSE3/AVX2) and arbitrary-precision encoding. However, several critical correctness bugs, soundness violations, and robust/security flaws were uncovered during this review.

---

## 1. Critical & Security Issues

### 1.1. Broken `aligned_data` Feature
* **Location:** [src/lib.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/lib.rs
* **Severity:** **Critical** (Compile Failure)
* **Description:** 
  The library provides an `aligned_data` feature in `Cargo.toml`. However, the module declaration `mod aligned_data;` is commented out in `src/lib.rs`. Building the crate with the `aligned_data` feature enabled (`cargo build --features aligned_data`) fails to compile immediately:
  ```
  error[E0433]: cannot find type `AlignedData64` in this scope
    --> src/ags.rs:97:26
  ```
* **Remediation:** Uncomment `mod aligned_data;` and condition it with `#[cfg(feature = "aligned_data")]`.

---

### 1.2. Unsound Memory Transmutation in `AlignedData` Allocation
* **Location:** [src/aligned_data.rs:25]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/aligned_data.rs#L25 & [src/aligned_data.rs:57]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/aligned_data.rs#L57
* **Severity:** **High / Soundness Violation**
* **Description:** 
  To allocate aligned memory for SIMD operations, the code relies on:
  ```rust
  let mut data = std::mem::transmute::<Vec<AlignedData64>, Vec<u8>>(vec);
  ```
  This is highly **unsound** under the Rust memory model:
  1. `Vec` has a `repr(Rust)` representation, which means layout of `Vec<T>` and `Vec<U>` is not guaranteed to be identical or compatible. The compiler is free to reorder fields.
  2. The capacity of a transmuted vector is not scaled automatically, which can result in incorrect deallocations or assertions. 
* **Remediation:** Avoid transmuting collection types. Use standard layout-based allocations (`std::alloc::alloc`) or libraries like `bytemuck` and `aligned-vec`.

---

### 1.3. Weak/Incorrect Assertion in `AlignedData128::alloc`
* **Location:** [src/aligned_data.rs:59]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/aligned_data.rs#L59
* **Severity:** **High / Potential Buffer Overflow**
* **Description:** 
  In `AlignedData128::alloc`, the assertion checks:
  ```rust
  assert!(size * AD64SZ <= data.capacity());
  ```
  Since `AD64SZ` is `64` and this is `AlignedData128` (which has size `AD128SZ` = `128`), this check is mathematically weaker than the actual length being set:
  ```rust
  data.set_len(size * AD128SZ);
  ```
  If the capacity of `data` is less than `size * 128` but greater than `size * 64`, the assertion will pass, but `set_len` will set the length beyond the allocated capacity, resulting in undefined behavior or a buffer overflow upon subsequent writes.
* **Remediation:** Update the assertion to use the correct constant `AD128SZ`:
  ```rust
  assert!(size * AD128SZ <= data.capacity());
  ```

---

### 1.4. Protocol Deviation / Correctness Bugs in Base58/Base56 Zero-Handling
* **Locations:**
  * [src/base58.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base58.rs
  * [src/base56.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base56.rs
  * [src/base32i.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base32i.rs
* **Severity:** **High / Correctness Bug**
* **Description:** 
  The implementations of `Base58`, `Base56`, and `Base32I` utilize `num-bigint` under the hood. The `to_radix_le` and `to_bytes_le` conversions of `BigUint` treat `0` specially by returning `vec![0]` instead of an empty vector `vec![]`. This leads to two severe bugs:
  1. **Encoding Empty Slice:** `encode(&[])` returns `"1"` (in Base58) or `"2"` (in Base56) instead of the standard empty string `""`.
  2. **Double-counting Leading Zeros in Decode:** Decoding `"1"` (in Base58) or `"2"` (in Base56) returns `vec![0, 0]` instead of `vec![0]`.
  
  The authors added hacky workarounds directly to the test suites (e.g. `tests/more_test1.rs` line 102) to mask this bug:
  ```rust
  let oup = if input == "1" {
      vec![0u8, 0u8]
  } else { ... }
  ```
  In contrast, `Base58B` (Bitcoin-style manual arithmetic) and `Base58R` (Rug-based) correctly encode/decode these zero and empty cases.
* **Remediation:** Adjust the `zero_count` logic and array reconstruction to properly handle cases when the arbitrary precision integer equals `0`, preventing the double-inclusion of the zero digit.

---

### 1.5. Panic on Malformed/Unpadded Inputs in `Base64G::decode`
* **Location:** [src/base64g.rs:185]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base64g.rs#L185
* **Severity:** **Medium / Robustness & Stability**
* **Description:** 
  When decoding an unpadded Base64 input string of length `3` (or length `4k + 3`), the loop accesses `ina[i_idx + 3]`. Because no bounds check is performed, this leads to an out-of-bounds indexing panic:
  ```rust
  // If in_sz is 3, i_idx + 3 equals ina.len(), which panics.
  if ina[i_idx + 3] == b'=' {
  ```
  Decoders must gracefully return an error (e.g., `DecodeError::InvalidLength`) rather than panicking on invalid inputs.
* **Remediation:** Add proper length validation at the start of `_decode_base64g` and ensure indexing operations are guarded.

---

## 2. Code Quality & SIMD Safety

### 2.1. Undefined Behavior Risks in SSE/AVX2 Helpers
* **Locations:**
  * `_encode_base64_avx2_chunks24` in [src/base64/base64_avx2.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base64/base64_avx2.rs
  * `_decode_base64_avx2_chunks32` in [src/base64/base64_avx2.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/base64/base64_avx2.rs
  * `_ascii_to_binary_128_avx2_c32_chunks32` in [src/ags/ags_128_avx2.rs]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/src/ags/ags_128_avx2.rs
* **Severity:** **Medium / SIMD Safety**
* **Description:** 
  These helper functions execute AVX2 intrinsics (such as `_mm256_loadu_si256`) but are **not** marked with the `#[target_feature(enable = "avx2")]` attribute. If the compiler decides not to inline these helper functions, it compiles them using the default CPU target feature set (usually SSE2). Executing AVX2 instructions inside non-AVX2 functions is undefined behavior in Rust and can cause compile errors or runtime crashes.
* **Remediation:** Annotate all functions using target-specific intrinsics with the matching `#[target_feature(enable = "avx2")]` attribute.

---

### 2.2. Subtraction Underflow in `tests/utils.rs`
* **Location:** [tests/utils.rs:23]file:///home/hcc/src/rust/MyJam/rel-github/lib-nodep/basexx/tests/utils.rs#L23
* **Severity:** **Low / Test Robustness**
* **Description:** 
  Inside `read_file_ascii`, the code evaluates `v[v.len() - 1]`. If the file is empty (`v.len() == 0`), this evaluates `v[usize::MAX]`, causing an integer subtraction underflow panic in debug mode.
* **Remediation:** Add an empty check before accessing the last element:
  ```rust
  let vv = if !v.is_empty() && v[v.len() - 1] == b'\n' {
  ```

---

## 3. Summary of Recommendations
1. **Unify zero handling** across `BigUint`-based codecs to match the correct behavior showcased by the manual implementation in `Base58B`.
2. **Resolve transmute soundness violations** in `aligned_data` by replacing transmutes with safe allocation primitives.
3. **Annotate SIMD code correctly** to prevent UB and illegal instruction traps on non-inlined function boundaries.

---
Review Date: 2026-05-26
Reviewer: Gemini CLI Agent