lencode 1.0.0

A crate for encoding and decoding variable-length data using the Lencode varint encoding scheme and deduplication
Documentation

📦 lencode

Crates.io docs.rs CI Big Endian CI License: MIT

Compact, fast binary encoding with varints, optional deduplication, and opportunistic zstd compression for bytes and strings. no_std by default, with an opt‑in std feature.

Highlights

  • Fast varints: efficient for small and large integers
  • Optional deduplication: replace repeats with compact IDs for supported types
  • Bytes/strings compression: flagged header + zstd when smaller; high‑entropy data is detected and skipped automatically
  • Bulk encoding: Vec<T> of fixed‑size types (e.g. [u8; 32]) are encoded/decoded via bulk memcpy, not per‑element
  • no_std + alloc: works without std (uses zstd-safe)
  • Derive macros: #[derive(Encode, Decode)] for your types, #[derive(Pack)] for dedupe/bulk types
  • Solana support: feature solana adds v2/v3 SDK types
  • Big-endian ready: CI runs tests on s390x

Install

[dependencies]
lencode = "0.1"

# With standard library types (e.g., Cow)
lencode = { version = "0.1", features = ["std"] }

# With Solana type support (implies std)
lencode = { version = "0.1", features = ["solana"] }

Quick start

Derive and round‑trip

use lencode::prelude::*;

#[derive(Encode, Decode, PartialEq, Debug)]
struct Point { x: u64, y: u64 }

let p = Point { x: 3, y: 5 };
let mut buf = Vec::new();
encode(&p, &mut buf)?;
let q: Point = decode(&mut Cursor::new(&buf))?;
assert_eq!(p, q);

Collections and primitives

use lencode::prelude::*;

let values: Vec<u128> = (0..10).collect();
let mut buf = Vec::new();
encode(&values, &mut buf)?;
let rt: Vec<u128> = decode(&mut Cursor::new(&buf))?;
assert_eq!(values, rt);

Deduplication (optional)

To benefit from deduplication for your own types, implement Pack and the marker traits and pass the encoder/decoder via encode_ext/decode_ext.

use lencode::prelude::*;

#[derive(Clone, Copy, PartialEq, Eq, Debug, Hash)]
struct MyId(u32);

impl Pack for MyId {
    fn pack(&self, w: &mut impl Write) -> Result<usize> { self.0.pack(w) }
    fn unpack(r: &mut impl Read) -> Result<Self> { Ok(Self(u32::unpack(r)?)) }
}
impl DedupeEncodeable for MyId {}
impl DedupeDecodeable for MyId {}

let vals = vec![MyId(42), MyId(7), MyId(42), MyId(7), MyId(42)];

// Encode with deduplication enabled
let mut enc_ctx = EncoderContext::with_dedupe();
let mut buf = Vec::new();
encode_ext(&vals, &mut buf, Some(&mut enc_ctx))?;

// Decode with deduplication enabled
let mut dec_ctx = DecoderContext::with_dedupe();
let roundtrip: Vec<MyId> = decode_ext(&mut Cursor::new(&buf), Some(&mut dec_ctx))?;
assert_eq!(roundtrip, vals);

Compact bytes and strings

&[u8], Vec<u8], VecDeque<u8], &str, and String use a compact flagged header: varint((payload_len << 1) | flag) + payload.

  • flag = 0 → raw bytes/UTF‑8
  • flag = 1 → zstd frame (original size stored inside the frame)

The encoder picks whichever is smaller per value. High‑entropy data (random bytes, encrypted content) is detected via a fast entropy check and skips compression entirely.

Bulk encoding for fixed‑size types

Vec<T> where T has a fixed‑size wire representation (e.g. [u8; 32], or #[repr(transparent)] newtypes over byte arrays) is encoded and decoded via bulk memcpy rather than per‑element iteration. This is handled automatically through Encode::encode_slice / Decode::decode_vec and their Pack counterparts Pack::pack_slice / Pack::unpack_vec.

Custom Pack types can opt in by overriding pack_slice and unpack_vec, or by using #[derive(Pack)] on a #[repr(transparent)] single‑field struct, which generates the bulk overrides automatically:

use lencode::prelude::*;

#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, Hash, Pack)]
struct MyPubkey([u8; 32]);

impl DedupeEncodeable for MyPubkey {}
impl DedupeDecodeable for MyPubkey {}

Incremental diff encoding

DiffEncoder/DiffDecoder provide stateful delta encoding for keyed byte blobs. When the same key is re‑encoded, only the diff is emitted. Two strategies are tried automatically and the smaller output is picked:

  • RLE patches — run‑length‑encoded list of changed regions (fast, best for sparse changes)
  • XOR + zstd — XOR old and new blobs, then zstd‑compress the result (best for scattered changes)

Supported byte types: Vec<u8>, &[u8], [u8; N], VecDeque<u8>. Set a key via set_key() on the diff encoder/decoder before each encode_ext/decode_ext call to opt in.

use lencode::prelude::*;
use lencode::context::{EncoderContext, DecoderContext};
use lencode::diff::{DiffEncoder, DiffDecoder};

let key = 1u64;
let mut enc_ctx = EncoderContext { dedupe: None, diff: Some(DiffEncoder::new()) };
let mut dec_ctx = DecoderContext { dedupe: None, diff: Some(DiffDecoder::new()) };

// First encode (full blob)
let data1: Vec<u8> = vec![0xAA; 2048];
let mut buf = Vec::new();
enc_ctx.diff.as_mut().unwrap().set_key(key);
dec_ctx.diff.as_mut().unwrap().set_key(key);
data1.encode_ext(&mut buf, Some(&mut enc_ctx)).unwrap();

// Second encode (only the diff is written)
let mut data2 = data1.clone();
data2[100] = 0xFF;
buf.clear();
enc_ctx.diff.as_mut().unwrap().set_key(key);
dec_ctx.diff.as_mut().unwrap().set_key(key);
data2.encode_ext(&mut buf, Some(&mut enc_ctx)).unwrap();
assert!(buf.len() < data2.len() / 2); // diff is much smaller

let mut cursor = Cursor::new(&buf[..]);
let result: Vec<u8> = Vec::decode_ext(&mut cursor, Some(&mut dec_ctx)).unwrap();
assert_eq!(result, data2);

Writer pre‑allocation

The Write trait provides a reserve(additional) hint. Growable writers like VecWriter use this to pre‑allocate capacity before encoding large collections, reducing intermediate reallocations.

Supported types

  • Primitives: all ints, bool, f32, f64
  • Arrays: [T; N]
  • Option: Option<T>
  • Bytes/strings: &[u8], Vec<u8], VecDeque<u8], &str, String
  • Collections (alloc): Vec<T>, BTreeMap<K,V>, BTreeSet<V>, VecDeque<T>, LinkedList<T>, BinaryHeap<T>
  • Tuples: (T1,) … up to 11 elements
  • std feature: adds support for std::borrow::Cow<'_, T>
  • solana feature: Pubkey, Signature, Hash, messages (legacy/v0), and related v2/v3 types

Note: HashMap/HashSet are not implemented.

Cargo features

  • default: core + no_std (uses alloc)
  • std: enables std adapters and Cow
  • solana: Solana SDK v2 + Agave v3 types (implies std)

Big‑endian and portability

  • Varints are decoded efficiently on little‑endian and portably on big‑endian
  • CI runs tests on s390x-unknown-linux-gnu using cross
  • Pack always uses a stable little‑endian layout

Benchmarks

# Full suite
cargo bench --all-features

# Compare against borsh/bincode
cargo bench --bench roundup --features std

# Diff encoder (RLE vs XOR+zstd strategies)
cargo bench --bench diff_bench --features std

# Solana‑specific
cargo bench --bench solana_bench --features solana

Errors

Errors use lencode::io::Error and map to std::io::Error under std.

use lencode::prelude::*;
use lencode::io::Error;

let mut buf = Vec::new();
match encode(&123u64, &mut buf) {
    Ok(n) => eprintln!("wrote {n} bytes"),
    Err(Error::WriterOutOfSpace) => eprintln!("buffer too small"),
    Err(Error::ReaderOutOfData) => eprintln!("unexpected EOF"),
    Err(Error::InvalidData) => eprintln!("corrupted data"),
    Err(e) => eprintln!("other error: {e}"),
}

Examples

  • examples/size_comparison.rs: space savings on repeated Solana pubkeys
  • examples/versioned_tx_compression.rs: end‑to‑end on Solana versioned transactions

Run with --features solana.

License

MIT