yykv-layout 0.1.0

Physical data layout and serialization for yykv
Documentation

yykv-layout (Physical Data Layout)

yykv-layout defines the physical organizational structure of the YYKV storage engine on raw disk media. It is responsible for serializing complex in-memory data objects (DsValue) into compact, aligned, and self-verifying binary page streams, serving as the cornerstone for the storage engine's high performance and reliability.

Core Design

📑 Unified Page Model

YYKV does not rely on a file system; instead, it manages physical pages directly:

  • Page Specifications: Supports various sizes like 4KB (SSD optimized) and 16KB (throughput optimized).
  • Page Types:
    • DataPage: Stores actual KV or row records.
    • IndexPage: Stores B-Tree or HNSW index nodes.
    • MetaPage: Stores tenant metadata and system configurations.

📦 Variable-Length Binary Storage

  • Compact Serialization: Binary layouts optimized for different data types (Varint, Bit-packing).
  • Zero-Copy Reading: Designed to allow deserialization directly on AlignedBuffer, avoiding memory copies.

🛡️ Data Integrity Protection

  • CRC32C Checksum: Each page header includes a cyclic redundancy check to prevent silent data corruption caused by hardware failure.
  • Version Control: The page header contains version information, supporting smooth online upgrades of storage formats.

Physical Layout

Region Description
File Header (100B) Contains SQLite-compatible magic number, version, page size, and first free page pointer.
Page Header Page type (1B), flags (1B), checksum (4B), Lsn (8B).
Page Body Actual record slots or B-Tree node entries.
Page Trailer Page footer pointer for assisting page scanning and corruption detection.

Core Components

  • LayoutManager: Manages the conversion between memory objects and disk byte streams.
  • Header: Initial metadata parser for physical files/drives.
  • Page: Abstract page object supporting parsing of various physical page types.

Technical Advantages

  • Hardware-Friendly: All data layouts strictly follow sector alignment, perfectly matching Direct IO.
  • Cross-Modal Consistency: Both vector data and SQL row data follow a unified page management framework.