bitcoinleveldb-versionedit 0.1.19

LevelDB-compatible VersionEdit encoding/decoding and manipulation utilities used by bitcoin-rs for manifest and version metadata management.
# bitcoinleveldb-versionedit

Low-level encoding/decoding and manipulation of LevelDB `VersionEdit` records, extracted from the `bitcoin-rs` project. This crate provides a faithful, byte-for-byte compatible Rust implementation of LevelDB's manifest version-edit logic as used by Bitcoin-like workloads.

---

## Overview

LevelDB stores the evolution of its on-disk state (files per level, sequence-number metadata, compaction pointers, etc.) in a manifest file. Each record in the manifest is a **VersionEdit**: a compact, varint-encoded description of mutations to the logical database version.

This crate implements:

- A `VersionEdit` struct mirroring LevelDB's internal representation
- Deterministic encoding of `VersionEdit` into the manifest binary format
- Robust decoding from manifest records back into a `VersionEdit`
- Convenience APIs to:
  - Track added files (per level)
  - Track deleted files (per level)
  - Maintain compaction pointers
  - Maintain log / sequence-number bookkeeping
  - Derive human-readable debug summaries

It is designed to be interoperable with existing LevelDB/Bitcoin data, focusing on correctness of serialization and deterministic ordering.

This crate is **not** a full LevelDB implementation; it is targeted infrastructure for higher-level components (like `VersionSet` and the full storage engine) in `bitcoin-rs`.

---

## Features

- **Binary compatibility with LevelDB manifest format**
  - Uses varint32/varint64 and length-prefixed slices to match LevelDB's on-disk representation
  - Tags and field semantics match LevelDB's `VersionEdit`:
    - `kComparator` (tag 1)
    - `kLogNumber` (tag 2)
    - `kNextFileNumber` (tag 3)
    - `kLastSequence` (tag 4)
    - `kCompactPointer` (tag 5)
    - `kDeletedFile` (tag 6)
    - `kNewFile` (tag 7)
    - `kPrevLogNumber` (tag 9)

- **Deterministic encoding**
  - Deleted-file entries are sorted by `(level, file_number)` prior to encoding, guaranteeing that
    `encode_to -> decode_from -> encode_to` produces **bit-identical** manifest bytes.

- **Convenient high-level mutation API**
  - `add_file(level, file, size, smallest, largest)`
  - `delete_file(level, file)`
  - `set_comparator_name`, `set_log_number`, `set_prev_log_number`, `set_next_file`, `set_last_sequence`
  - `set_compact_pointer(level, key)`

- **Introspectable**
  - `debug_string()` yields a multi-line, human-readable summary suitable for logging and debugging, including all scalar fields, compaction pointers, deletions, and new files.

- **Safe defaults & state reset**
  - `Default` constructs an empty, "no-op" `VersionEdit` with all `has_` flags cleared.
  - `clear()`/`reset_core_state()` allow reuse of a `VersionEdit` while preserving compaction pointers if desired.

---

## Crate Status

- **License:** MIT
- **Edition:** Rust 2021
- **Repository:** <https://github.com/klebs6/bitcoin-rs>
- **Intended users:** implementers of LevelDB-compatible storage layers, Bitcoin node developers, and systems programmers requiring exact reproduction of LevelDB manifest semantics.

---

## Core Data Structures

### `VersionEdit`

```rust
pub struct VersionEdit  {
    comparator:           String,
    log_number:           u64,
    prev_log_number:      u64,
    next_file_number:     u64,
    last_sequence:        SequenceNumber,
    has_comparator:       bool,
    has_log_number:       bool,
    has_prev_log_number:  bool,
    has_next_file_number: bool,
    has_last_sequence:    bool,
    compact_pointers:     Vec<(i32, InternalKey)>,
    deleted_files:        VersionEditDeletedFileSet,
    new_files:            Vec<(i32, FileMetaData)>,
}

pub type VersionEditDeletedFileSet = HashSet<(i32, u64)>;
```

Conceptually, a `VersionEdit` is a sparse patch to the current logical version:

- **Scalar metadata**
  - `comparator`: name of the key comparator
  - `log_number`: current log file number
  - `prev_log_number`: previous log file number
  - `next_file_number`: global file-number allocator watermark
  - `last_sequence`: maximal sequence number visible after applying this edit
  - `has_*` flags: which of the above are present in this edit

- **Collections**
  - `compact_pointers: Vec<(level, InternalKey)>`
  - `deleted_files: HashSet<(level, file_number)>`
  - `new_files: Vec<(level, FileMetaData)>`

These mutate the file layout per compaction level.

### Helper functions

These implement the manifest's binary protocol for specific logical units:

```rust
pub fn get_level(input: &mut Slice, level: &mut i32) -> bool { ... }

pub fn get_internal_key(input: &mut Slice, key: &mut InternalKey) -> bool { ... }
```

- `get_level` reads a LevelDB level (0..N) from a varint32-encoded field.
- `get_internal_key` reads a length-prefixed slice and decodes it into an `InternalKey`.

---

## Encoding & Decoding Semantics

### Encoding: `VersionEdit::encode_to`

```rust
impl VersionEdit {
    pub fn encode_to(&self, dst: *mut String) { ... }
}
```

- Accepts a raw pointer to an owned `String` that serves as a byte buffer.
- Serializes the `VersionEdit` fields into the LevelDB manifest wire format:
  - Scalars are emitted only if the corresponding `has_*` flag is true.
  - `compact_pointers`, `deleted_files`, and `new_files` are written sequentially.
- `deleted_files` are pre-sorted:

```rust
let mut deleted_files_sorted: Vec<(i32, u64)> =
    self.deleted_files().iter().copied().collect();
deleted_files_sorted.sort_unstable();
```

This guarantees deterministic encoding irrespective of the internal `HashSet` iteration order.

**Safety model:**
- The method uses `unsafe` for the raw pointer; you must ensure:
  - `dst` is non-null and points to a valid `String`
  - The `String` outlives the call

A higher-level wrapper can be constructed to hide the raw pointer, e.g. by allocating and passing `&mut String` and then casting internally.

### Decoding: `VersionEdit::decode_from`

```rust
impl VersionEdit {
    pub fn decode_from(&mut self, src: &Slice) -> Status { ... }
}
```

- Resets the core scalar state and file collections before decoding.
- Consumes a copy of the input `Slice` and incrementally parses tagged fields.
- Each tag is matched against the LevelDB tag set; unknown or malformed tags result in a `Status::corruption` with contextual diagnostics.
- Parsed values are routed through the higher-level mutation functions (`set_*`, `add_file`, `delete_file`, `set_compact_pointer`).

The loop structure is essentially:

```rust
while msg.is_none() && get_varint32(&mut input, &mut tag) {
    match tag {
        1 => { /* comparator */ }
        2 => { /* log number */ }
        3 => { /* next file number */ }
        4 => { /* last sequence */ }
        5 => { /* compact pointer */ }
        6 => { /* deleted file */ }
        7 => { /* new file */ }
        9 => { /* prev log number */ }
        _ => { msg = Some("unknown tag"); }
    }
}
```

Post-conditions:
- On success: returns `Status::ok()` and a fully-populated `VersionEdit`.
- On failure: returns a corruption `Status` indicating the failing component, and leaves the `VersionEdit` in a reset state (partial mutations are not guaranteed useful).

---

## Public API Usage

### Constructing a basic `VersionEdit`

```rust
use bitcoinleveldb_versionedit::VersionEdit;
use bitcoinleveldb_types::{InternalKey, SequenceNumber};

fn build_simple_edit() -> VersionEdit {
    let mut edit = VersionEdit::default();

    // set comparator name
    let cmp_name = Slice::from("leveldb.BytewiseComparator".as_bytes());
    edit.set_comparator_name(&cmp_name);

    // log / sequence metadata
    edit.set_log_number(42);
    edit.set_prev_log_number(41);
    edit.set_next_file(1000);
    edit.set_last_sequence(123_456 as SequenceNumber);

    edit
}
```

### Adding a new file

```rust
fn add_new_sstable(
    edit: &mut VersionEdit,
    level: i32,
    file_number: u64,
    file_size: u64,
    smallest: &InternalKey,
    largest: &InternalKey,
) {
    edit.add_file(level, file_number, file_size, smallest, largest);
}
```

**Preconditions (mirroring LevelDB's invariants):**
- `smallest` and `largest` must be the true extremal internal keys in the file.
- The file must not have been persisted to the VersionSet yet (`VersionSet::SaveTo()` expectation).

### Deleting a file

```rust
fn mark_file_deleted(edit: &mut VersionEdit, level: i32, file_number: u64) {
    edit.delete_file(level, file_number);
}
```

Internally, this records `(level, file_number)` in `deleted_files`, which will be serialized as one or more `kDeletedFile` entries.

### Compaction pointers

```rust
fn update_compaction_pointer(
    edit: &mut VersionEdit,
    level: i32,
    key: &InternalKey,
) {
    edit.set_compact_pointer(level, key);
}
```

This denotes the logical *resume key* for future compactions at that level.

### Debugging

```rust
fn log_version_edit(edit: &VersionEdit) {
    println!("{}", edit.debug_string());
}
```

Example output:

```text
VersionEdit {
  Comparator: leveldb.BytewiseComparator
  LogNumber: 42
  PrevLogNumber: 41
  NextFile: 1000
  LastSeq: 123456
  CompactPointer: 1 userkey1@123
  DeleteFile: 2 57
  AddFile: 1 1001 1048576 smallest_key .. largest_key
}
```

### Clearing and reusing a `VersionEdit`

```rust
fn reuse_edit(edit: &mut VersionEdit) {
    // Reset scalar state and file collections; compact_pointers remain.
    edit.clear();

    // Now you can repopulate it with new metadata and file deltas.
}
```

`clear()` simply delegates to `reset_core_state()`, which zeroes scalars, clears `deleted_files` and `new_files`, and resets `has_*` flags.

---

## Binary Format Details

This crate encodes/decodes the same schema as canonical LevelDB:

- **Tags** are varint32-encoded integers.
- **Levels** are varint32-encoded unsigned integers, cast to `i32` in-memory.
- **File numbers and sizes** use varint64.
- **Internal keys** are serialized as a length-prefixed slice (`len` as varint32, followed by bytes) and then decoded via `InternalKey::decode_from`.
- **Comparator name** is also a length-prefixed slice of UTF-8 bytes.

The serialization order is purely determined by the order of fields in the `VersionEdit` and the order of `compact_pointers` and `new_files` vectors, except for `deleted_files`, which are explicitly sorted, providing deterministic binary output.

This determinism is critical when one wants to ensure that two logically identical `VersionEdit`s result in the same manifest bytes, which facilitates:

- Reproducible tests
- Content-addressable storage and hashing
- Stable replication and snapshot mechanics across nodes

---

## Relationship to LevelDB and Bitcoin

In LevelDB (and by inheritance, Bitcoin Core's database layout), `VersionEdit` is the backbone for describing structural mutations in the set of SSTables. Bitcoin stores UTXO and block index information in LevelDB-style databases; exact adherence to manifest semantics is mandatory if you need to:

- Read or write existing Bitcoin Core databases
- Implement alternative nodes that share storage layouts
- Perform analysis or replay of historical LevelDB states from archived manifests

This crate intentionally mirrors the C++ LevelDB logic, with additional Rust idioms (e.g., `Default`, strong typing around `Status`, and improved logging).

---

## Safety & Concurrency Considerations

- The core APIs are `&mut self` and therefore **not** thread-safe by themselves; wrap in synchronization primitives (`Mutex`, etc.) if accessed concurrently.
- `encode_to` uses a raw pointer. Incorrect usage can lead to undefined behavior. If you design higher-level APIs on top of this crate, you are encouraged to encapsulate this unsafety in a small, well-tested layer that exposes only safe abstractions.
- `decode_from` trusts the `Slice` size; it validates structure but not cryptographic authenticity. For untrusted input, pair it with higher-level validation or checksums.

---

## When to Use This Crate

Use this crate if you need:

- Precise, LevelDB-compatible manifest handling in Rust
- To interoperate with Bitcoin or other LevelDB-based systems at the storage format level
- Deterministic, testable `VersionEdit` encoding/decoding

This crate is probably **too low-level** if you only need a high-level key-value database abstraction; in that case, integrate through whatever higher-layer `VersionSet` or storage API `bitcoin-rs` exposes.

---

## License

This crate is distributed under the **MIT** license, consistent with the `bitcoin-rs` repository.

---

## Provenance

This crate is part of the [`bitcoin-rs`](https://github.com/klebs6/bitcoin-rs) repository and focuses exclusively on the `VersionEdit` component of a LevelDB-compatible storage engine.