vb 0.2.7

The fastest VByte/Varint encoding library in Rust / Rust 生态最快的变长字节编码库
Documentation

English | 中文


vb

The fastest VByte encoding library in the Rust ecosystem.

Encode at 430M integers/sec, decode at 415M integers/sec — 2.4x faster encoding and 1.2x faster decoding than alternatives.

VByte Encoding Benchmark

Table of Contents

Features

  • Blazing Fast: Hand-optimized with loop unrolling, bounds check elimination, and CLZ instructions
  • Variable Byte Encoding: Compresses u64 integers using 1-10 bytes based on magnitude
  • Differential Encoding: Optimizes strictly increasing sequences (requires diff feature)
  • Zero-Copy Decoding: Decode directly from byte slices with offset tracking
  • Minimal Dependencies: Only thiserror for error handling

Installation

[dependencies]
vb = "0.2"

# With differential encoding support
vb = { version = "0.2", features = ["diff"] }

Usage

Basic Encoding

use vb::{e_li, d_li};

let numbers = vec![0, 127, 128, 16383, 16384, 2097151];

// Encode
let encoded = e_li(numbers.iter().cloned());
println!("Compressed to {} bytes", encoded.len());

// Decode
let decoded = d_li(&encoded).unwrap();
assert_eq!(numbers, decoded);

Differential Encoding

Ideal for sorted sequences like timestamps, IDs, or offsets.

use vb::{e_diff, d_diff};

let timestamps = vec![1000000, 1000005, 1000010, 1000042];

// Stores only deltas: [1000000, 5, 5, 32]
let encoded = e_diff(&timestamps);

let decoded = d_diff(&encoded).unwrap();
assert_eq!(timestamps, decoded);

API Reference

Function Description
e(value, buf) Encode single u64, append to buffer
d(bytes) Decode single u64, return (value, bytes_consumed)
e_li(iter) Encode iterator of u64 to Vec<u8>
d_li(bytes) Decode bytes to Vec<u64>
e_diff(slice) Encode increasing sequence with delta compression
d_diff(bytes) Decode delta-compressed sequence

Performance

Benchmarked with 10,000 integers (60% small, 30% medium, 10% large):

Library Encode (M/s) Decode (M/s)
vb 430 415
leb128 289 213
integer-encoding 176 349

Run benchmarks yourself:

./bench.sh

Design

VByte uses 7 bits per byte for data, with the MSB as continuation flag:

  • MSB = 0: Final byte
  • MSB = 1: More bytes follow

Key optimizations:

  • Fast path: Single-byte values (< 128) skip all loops
  • Loop unrolling: 2-5 byte cases fully unrolled
  • Bounds elimination: Unsafe pointer arithmetic when ≥10 bytes available
  • CLZ instruction: leading_zeros() calculates byte count in one CPU cycle

Bench

VByte Encoding Benchmark

Comparing varint encoding libraries with 10,000 integers (mixed distribution: 60% small, 30% medium, 10% large).

Results

Library Encode (M/s) Decode (M/s)
vb 430.5 414.9
integer-encoding 176.2 348.6
leb128 289.2 212.9

Environment

macOS 26.1 (arm64) · Apple M2 Max · 12 cores · 64.0GB · rustc 1.94.0-nightly (21ff67df1 2025-12-15)


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


vb

Rust 生态中最快的 VByte 编码库。

编码速度 4.3 亿整数/秒,解码速度 4.15 亿整数/秒 — 编码比同类库快 2.4 倍,解码快 1.2 倍。

VByte 编码性能评测

目录

功能特性

  • 极致性能:手工优化,包括循环展开、边界检查消除、CLZ 指令加速
  • 变长字节编码:根据数值大小,用 1-10 字节压缩 u64 整数
  • 差分编码:优化严格递增序列(需开启 diff 特性)
  • 零拷贝解码:直接从字节切片解码,支持偏移量追踪
  • 依赖精简:仅依赖 thiserror 处理错误

安装

[dependencies]
vb = "0.2"

# 启用差分编码
vb = { version = "0.2", features = ["diff"] }

使用指南

基础编码

use vb::{e_li, d_li};

let numbers = vec![0, 127, 128, 16383, 16384, 2097151];

// 编码
let encoded = e_li(numbers.iter().cloned());
println!("压缩至 {} 字节", encoded.len());

// 解码
let decoded = d_li(&encoded).unwrap();
assert_eq!(numbers, decoded);

差分编码

适用于时间戳、ID、偏移量等有序序列。

use vb::{e_diff, d_diff};

let timestamps = vec![1000000, 1000005, 1000010, 1000042];

// 仅存储差值: [1000000, 5, 5, 32]
let encoded = e_diff(&timestamps);

let decoded = d_diff(&encoded).unwrap();
assert_eq!(timestamps, decoded);

API 参考

函数 说明
e(value, buf) 编码单个 u64,追加到缓冲区
d(bytes) 解码单个 u64,返回 (值, 消耗字节数)
e_li(iter) u64 迭代器编码为 Vec<u8>
d_li(bytes) 将字节解码为 Vec<u64>
e_diff(slice) 差分压缩递增序列
d_diff(bytes) 解码差分压缩序列

性能

测试数据:10,000 个整数(60% 小值,30% 中值,10% 大值)

编码 (百万/秒) 解码 (百万/秒)
vb 430 415
leb128 289 213
integer-encoding 176 349

运行评测:

./bench.sh

设计

VByte 每字节用 7 位存数据,最高位 (MSB) 作为延续标志:

  • MSB = 0:最后一个字节
  • MSB = 1:后续还有字节

核心优化:

  • 快速路径:单字节值 (< 128) 跳过所有循环
  • 循环展开:2-5 字节场景完全展开
  • 边界消除:剩余 ≥10 字节时使用 unsafe 指针运算
  • CLZ 指令leading_zeros() 单周期计算所需字节数

评测

VByte 编码性能评测

对比 varint 编码库,测试数据:10,000 个整数(混合分布:60% 小值,30% 中值,10% 大值)。

结果

编码 (百万/秒) 解码 (百万/秒)
vb 430.5 414.9
integer-encoding 176.2 348.6
leb128 289.2 212.9

环境

macOS 26.1 (arm64) · Apple M2 Max · 12 核 · 64.0GB · rustc 1.94.0-nightly (21ff67df1 2025-12-15)


关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注: