bitcoinleveldb-coding 0.1.19

Low-level LevelDB-compatible binary coding primitives for bitcoin-rs: fixed-width little-endian, varint, and length-prefixed encoding/decoding over raw pointers, Strings, and Slices.
# bitcoinleveldb-coding

Low-level, allocation-conscious encoders and decoders for LevelDB-style binary formats used in `bitcoin-rs`. This crate exposes pointer-based primitives for:

- Fixed-width little-endian integers (`u32`, `u64`)
- Varint-encoded integers (`u32`, `u64`)
- Length-prefixed slices
- Conversions between `Slice` and `String`/UTF‑8

The implementation is intentionally close to the original LevelDB C++ code, with Rust idioms where they do not compromise layout compatibility or performance.

## Design goals

- **Bit-level compatibility with LevelDB**: Encodings are little-endian and follow LevelDB's varint and length-prefix conventions so data can be shared with existing LevelDB implementations.
- **Zero extra allocation in hot paths**: Pointer-based APIs allow writing directly into preallocated buffers and reading from raw memory without intermediate copies.
- **Predictable performance**: Varint encoders use simple branch patterns, and decoders operate in tight loops amenable to inlining and optimization.
- **Logging-friendly**: Functions are instrumented with `trace!`, `debug!`, and `warn!` calls (using the `log` facade or `tracing`-style macros, depending on the parent crate) to aid in debugging complex storage issues.

The crate is primarily intended as an internal component of the `bitcoin-rs` LevelDB port, but it can be used independently wherever LevelDB-like encodings are needed.

## Encoding primitives

### Fixed-width little-endian integers

These functions read/write 32-bit and 64-bit integers in little-endian order directly to/from raw pointers:

```rust
use bitcoinleveldb_coding::{
    encode_fixed32, encode_fixed64,
    decode_fixed32, decode_fixed64,
};

// Write a 32-bit value into an 8-byte buffer
let mut buf = [0u8; 8];
unsafe {
    encode_fixed32(buf.as_mut_ptr(), 0x11223344);
}
assert_eq!(buf[..4], [0x44, 0x33, 0x22, 0x11]);

// Read it back
let v = unsafe { decode_fixed32(buf.as_ptr()) };
assert_eq!(v, 0x11223344);
```

APIs:

- `fn encode_fixed32(dst: *mut u8, value: u32)`
- `fn encode_fixed64(dst: *mut u8, value: u64)`
- `fn decode_fixed32(ptr: *const u8) -> u32`
- `fn decode_fixed64(ptr: *const u8) -> u64`

These functions perform **no bounds checking** and are `unsafe` to call in a memory-safety sense. Callers must guarantee that `dst`/`ptr` points to at least 4 (for 32-bit) or 8 (for 64-bit) valid bytes.

### Varint encoding

Varint encoding represents an integer using a base-128 scheme:

- Each byte carries 7 bits of payload in the low bits.
- The high bit (bit 7) is a continuation flag: `1` means another byte follows, `0` terminates the varint.

This is identical to the scheme used in LevelDB and many other storage systems. Values in `[0, 2^7)` fit in 1 byte, `[2^7, 2^14)` in 2 bytes, etc.

#### Pointer-based varint encoding

```rust
use bitcoinleveldb_coding::{encode_varint32, encode_varint64};

let mut buf = [0u8; 10];
let start = buf.as_mut_ptr();

let end32 = unsafe { encode_varint32(start, 300) };
let len32 = unsafe { end32.offset_from(start) as usize };

let end64 = unsafe { encode_varint64(start, 1234567890123) };
let len64 = unsafe { end64.offset_from(start) as usize };

assert!(len32 <= 5);
assert!(len64 <= 10);
```

APIs:

- `fn encode_varint32(dst: *mut u8, v: u32) -> *mut u8`
- `fn encode_varint64(dst: *mut u8, v: u64) -> *mut u8`

Both functions:

- Assume `dst` points to a buffer with enough capacity (`≤ 5` bytes for `u32`, `≤ 10` bytes for `u64`).
- Return a pointer to the first byte *after* the encoded value.

The helper `fn varint_length(v: u64) -> i32` computes the length (in bytes) of the varint encoding of `v`. This is useful when pre-sizing buffers:

```rust
use bitcoinleveldb_coding::varint_length;

let v: u64 = 1_000_000;
let len = varint_length(v);
assert!(len >= 1 && len <= 10);
```

### String-backed varint and fixed-width encoding

Instead of working with raw pointers, you can append encodings directly into `String` buffers. This matches the original LevelDB design, where `std::string` served as a generic byte buffer.

```rust
use bitcoinleveldb_coding::{
    put_varint32, put_varint64,
    put_fixed32, put_fixed64,
};

let mut s = String::new();

unsafe {
    put_varint32(&mut s as *mut String, 1000);
    put_fixed64(&mut s as *mut String, 0x0102_0304_0506_0708);
}

let bytes = s.into_bytes();
// ``bytes`` now begins with the varint-encoded 1000, followed by 8 LE bytes
```

APIs:

- `fn put_varint32(dst: *mut String, v: u32)`
- `fn put_varint64(dst: *mut String, v: u64)`
- `fn put_fixed32(dst: *mut String, value: u32)`
- `fn put_fixed64(dst: *mut String, value: u64)`

These functions:

- Treat `String` as an opaque byte buffer via `String::as_mut_vec`.
- Append encoded bytes; they do not clear or truncate existing data.
- Expose a raw `*mut String` interface because they are designed to be called from unsafe internals where borrowing rules are already enforced at a higher level.

## Decoding primitives with `Slice`

The crate interoperates with a `Slice` abstraction that behaves like a non-owning byte span with a cursor.

### Varint decoding from pointer ranges

These functions decode varints from `[p, limit)` and either return a pointer to the first byte after the value or `null()` on failure.

```rust
use bitcoinleveldb_coding::{
    get_varint_32ptr,
    get_varint_64ptr,
};

let mut buf = [0u8; 10];
let start = buf.as_mut_ptr();

unsafe {
    let end = bitcoinleveldb_coding::encode_varint64(start, 999_999);
    let limit = end;

    let mut out: u64 = 0;
    let p = get_varint_64ptr(start as *const u8, limit as *const u8, &mut out as *mut u64);

    assert!(!p.is_null());
    assert_eq!(out, 999_999);
}
```

APIs:

- `fn get_varint_32ptr(p: *const u8, limit: *const u8, value: *mut u32) -> *const u8`
- `fn get_varint_32ptr_fallback(p: *const u8, limit: *const u8, value: *mut u32) -> *const u8`
- `fn get_varint_64ptr(p: *const u8, limit: *const u8, value: *mut u64) -> *const u8`

`get_varint_32ptr` uses a fast path for single-byte varints, then falls back to the more general `get_varint_32ptr_fallback` for multi-byte values.

### Varint decoding from `Slice`

These functions parse a varint at the beginning of a `Slice` and advance the slice on success.

```rust
use bitcoinleveldb_coding::{get_varint32, get_varint64};
use bitcoinleveldb_types::Slice; // pseudoname; use the actual path in the repo

let mut storage = String::new();
unsafe { bitcoinleveldb_coding::put_varint32(&mut storage as *mut String, 12345); }

let bytes = storage.into_bytes();
let mut slice = Slice::from_ptr_len(bytes.as_ptr(), bytes.len());

let mut out: u32 = 0;
let ok = unsafe { get_varint32(&mut slice as *mut Slice, &mut out as *mut u32) };

assert!(ok);
assert_eq!(out, 12345);
// ``slice`` has been advanced past the varint
```

APIs:

- `fn get_varint32(input: *mut Slice, value: *mut u32) -> bool`
- `fn get_varint64(input: *mut Slice, value: *mut u64) -> bool`

Semantics:

- On success, return `true`, write the decoded value to `*value`, and call `input.remove_prefix(consumed_bytes)`.
- On failure (overflow or not enough bytes), return `false` and leave `input` unchanged.

## Length-prefixed slices

Length-prefixed slices are encoded as:

1. A `u32` length `L` encoded as varint32.
2. Followed by `L` raw bytes.

This format is omnipresent in LevelDB metadata (keys, values, and other structures).

### Encoding length-prefixed slices

```rust
use bitcoinleveldb_coding::put_length_prefixed_slice;
use bitcoinleveldb_types::Slice; // adjust path to actual crate

let mut s = String::new();
let data = b"hello world";
let slice = unsafe { Slice::from_ptr_len(data.as_ptr(), data.len()) };

unsafe {
    put_length_prefixed_slice(&mut s as *mut String, &slice);
}

// s now holds: varint32(len=11) + b"hello world"
```

API:

- `fn put_length_prefixed_slice(dst: *mut String, value: &Slice)`

Behavior:

- Panics are avoided: if length exceeds `u32::MAX`, the function logs an error and returns early.
- For zero-length slices, only the length varint (0) is written.

### Decoding length-prefixed slices

From a mutable `Slice` cursor:

```rust
use bitcoinleveldb_coding::get_length_prefixed_slice;
use bitcoinleveldb_types::Slice;

// suppose ``input`` points at a length-prefixed slice
let mut input: Slice = /* ... */;
let mut out: Slice = Slice::default(); // or uninitialized according to actual API

let ok = unsafe { get_length_prefixed_slice(&mut input as *mut Slice, &mut out as *mut Slice) };
if ok {
    // ``out`` is a view into the original data; ``input`` is advanced past it
}
```

From raw pointers with an explicit limit:

```rust
use bitcoinleveldb_coding::get_length_prefixed_slice_with_limit;
use bitcoinleveldb_types::Slice;

let buf: &[u8] = /* ... */;
let mut out: Slice = Slice::default();

let next = unsafe {
    get_length_prefixed_slice_with_limit(
        buf.as_ptr(),
        unsafe { buf.as_ptr().add(buf.len()) },
        &mut out as *mut Slice,
    )
};

if !next.is_null() {
    // success; ``next`` points past the slice
}
```

APIs:

- `fn get_length_prefixed_slice(input: *mut Slice, result: *mut Slice) -> bool`
- `fn get_length_prefixed_slice_with_limit(p: *const u8, limit: *const u8, result: *mut Slice) -> *const u8`

Both validate that the declared length does not exceed the available bytes.

## Slice to UTF‑8 conversion

For debugging or higher-level string handling, `slice_to_utf8` converts a `Slice` into an owned `String` using `from_utf8_lossy` semantics:

```rust
use bitcoinleveldb_coding::slice_to_utf8;
use bitcoinleveldb_types::Slice;

let bytes = b"example";
let slice = unsafe { Slice::from_ptr_len(bytes.as_ptr(), bytes.len()) };
let s = slice_to_utf8(&slice);
assert_eq!(s, "example");
```

API:

- `fn slice_to_utf8(slice: &Slice) -> String`

Behavior:

- If the slice is empty or has a null data pointer, returns an empty `String`.
- Invalid UTF‑8 sequences are replaced with the Unicode replacement character; this is deliberate to avoid panics in low-level diagnostics.

## Safety and invariants

Almost all functions in this crate are `unsafe` to use indirectly because they operate on raw pointers or manipulate `String` internals.

Callers must ensure:

- Pointers (`*const u8` / `*mut u8`) point to valid, appropriately sized memory.
- `limit` pointers in decoding functions delimit the actual readable range; `p <= limit` and the region `[p, limit)` must remain valid for the duration of the call.
- `Slice` values obey their own invariants: `data()` and `size()` reflect a valid contiguous region.
- No concurrent mutable aliasing of the same `String` or `Slice` occurs across threads without synchronization.

The crate itself does not attempt to enforce Rust's aliasing rules; it assumes that higher-level abstractions (e.g., the LevelDB table code) orchestrate these invariants.

## Relationship to mathematics and bit-level representation

Varint encoding is effectively a representation of a non-negative integer in base 128 with a self-delimiting prefix code:

- Let `v` be a non-negative integer.
- Repeatedly emit `v mod 128` (7 bits) and set the continuation bit to `1` while `v >= 128`.
- For the final byte, emit `v mod 128` with continuation bit `0`.

This yields a prefix-free code over `u64` with the following length function:

\[
\ell(v) = 1 + \left\lfloor \log_{128} v \right\rfloor \quad (v > 0), \quad \ell(0) = 1.
\]

By encoding smaller integers with fewer bytes, storage layouts benefit significantly when keys and lengths are typically small (common in LevelDB metadata and in many Bitcoin-related indices).

## Integration within `bitcoin-rs`

This crate lives in the `bitcoin-rs` monorepo and is designed to be used by the LevelDB-compatible storage layer that underpins components such as block indexes, UTXO sets, or other key-value stores.

Typical usage pattern:

1. **Serialize** structured metadata into a `String` or `Vec<u8>` using `put_*` APIs.
2. **Store** that byte sequence in LevelDB or a LevelDB-compatible backend.
3. **Deserialize** on load using `get_*` pointer or `Slice`-based APIs.

Because the encodings match the canonical C++ LevelDB representation, databases can be shared between Rust and C++ nodes without reindexing.

## Crate metadata

- **Name:** `bitcoinleveldb-coding`
- **Version:** `0.1.19`
- **Edition:** `2021`
- **License:** MIT
- **Repository:** <https://github.com/klebs6/bitcoin-rs>
- **Authors:** `klebs <none>`

This crate is intended for advanced users who are comfortable reasoning about memory safety, binary layout, and cross-language interoperability.