# Parcode
[](https://crates.io/crates/parcode)
[](https://docs.rs/parcode)
[](https://github.com/retypeos/parcode/actions/workflows/ci.yml)
[](./LICENSE)
---
**High-performance, zero-copy, lazy-loading object storage for Rust.**
`parcode` is an architecture-aware storage system designed for complex, deep data structures. Unlike traditional serialization (JSON, Bincode) which treats data as a flat blob, `parcode` preserves the **structure** of your objects on disk.
This enables capabilities previously reserved for complex databases:
* **Lazy Mirrors:** Navigate deep struct hierarchies without loading data from disk.
* **Surgical Access:** Load only the specific field, vector chunk, or map entry you need.
* **$O(1)$ Map Lookups:** Retrieve items from huge `HashMap`s instantly without full deserialization.
* **Parallel Speed:** Writes are fully parallelized using a Zero-Copy graph architecture.
---
## The Innovation: Pure Rust Lazy Loading
Most libraries that offer "Lazy Loading" or "Zero-Copy" access (like FlatBuffers or Cap'n Proto) come with a heavy price: **Interface Definition Languages (IDLs)**. You are forced to write separate schema files (`.proto`, `.fbs`), run external compilers, and deal with generated code that doesn't feel like Rust.
**Parcode changes the game.**
We invented a technique we call **"Native Mirroring"**. By simply adding `#[derive(ParcodeObject)]`, Parcode analyzes your Rust structs at compile time and invisibly generates a **Lazy Mirror** API.
| **Schema Definition** | External IDL files (`.fbs`) | **Standard Rust Structs** |
| **Build Process** | Requires external CLI (`flatc`) | **Standard `cargo build`** |
| **Refactoring** | Manual sync across files | **IDE Rename / Refactor** |
| **Developer Experience** | Foreign | **Native** |
### How it works
You define your data naturally:
```rust
#[derive(ParcodeObject)]
struct Level {
name: String,
#[parcode(chunkable)]
physics: PhysicsData, // Heavy struct
}
```
Parcode's macro engine automatically generates a **Shadow Type** (`LevelLazy`) that mirrors your structure but replaces heavy fields with **Smart Promises**.
* When you call `reader.read_lazy::<Level>()`, you don't get a `Level`.
* You get a `LevelLazy` handle.
* Accessing `level_lazy.name` is instant (read from header).
* Accessing `level_lazy.physics` returns a **Promise**, not data.
* Only calling `.load()` triggers the disk I/O.
**Result:** The performance of a database with the ergonomics of a standard `struct`.
---
## The Principal Feature: Lazy Mirrors
The problem with standard serialization is **"All or Nothing"**. To read `level.config.name`, you usually have to deserialize the entire 500MB `level` file.
**Parcode V3 solves this.** By simply adding `#[derive(ParcodeObject)]`, the library generates a "Shadow Mirror" of your struct. You can traverse this mirror instantly—only metadata is read—and trigger disk I/O only when you actually request the data.
### Example
```rust
use parcode::{Parcode, ParcodeObject, ParcodeReader};
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, ParcodeObject)]
struct Level {
id: u32, // Local field (Metadata)
name: String, // Local field (Metadata)
#[parcode(chunkable)] // Stored in a separate, compressed chunk
config: LevelConfig,
#[parcode(chunkable)] // Large collection (sharded automatically)
assets: Vec<u8>,
}
#[derive(Serialize, Deserialize, ParcodeObject)]
struct LevelConfig {
version: u8,
#[parcode(chunkable)]
metadata: String, // Deeply nested chunk
}
fn main() -> parcode::Result<()> {
// 1. Open the file (Instant operation, mmaps the content)
let reader = ParcodeReader::open("level.par")?;
// 2. Get the Lazy Mirror (Instant operation, reads only the header)
let level_lazy = reader.read_lazy::<Level>()?;
// 3. Access local fields directly (Already in memory)
println!("ID: {}, Name: {}", level_lazy.id, level_lazy.name);
// 4. Navigate deep without loading!
// 'config' is a Mirror. Accessing it costs 0 I/O.
// 'version' is a local field of config. Accessing it costs 0 I/O (eager header load).
println!("Config Version: {}", level_lazy.config.version);
// 5. Surgical Load
// Only NOW do we touch the disk to load the specific 'metadata' chunk.
// The 2GB 'assets' vector is NEVER loaded.
let meta = level_lazy.config.metadata.load()?;
println!("Deep Metadata: {}", meta);
Ok(())
}
```
This architecture allows "Cold Starts" in **microseconds**, regardless of the file size.
---
## O(1) Map Access: The "Database" Mode
Storing a `HashMap` usually means serializing it as a single huge blob. If you need one user from a 1GB user database, you have to read 1GB.
Parcode introduces **Hash Sharding**. By marking a map with `#[parcode(map)]`, the library automatically:
1. Distributes items into buckets based on their hash.
2. Writes each bucket as an independent chunk with a "Micro-Index" (Structure of Arrays).
3. Allows **O(1) retrieval** by reading only the relevant ~4KB shard.
```rust
#[derive(Serialize, Deserialize, ParcodeObject)]
struct UserDatabase {
#[parcode(chunkable)]
settings: HashMap<String, String>,
#[parcode(map)]
users: HashMap<u64, UserProfile>,
}
let db = reader.read_lazy::<UserDatabase>()?;
let user = db.users.get(&88888)?.expect("User not found");
```
---
## Macro Attributes Reference
Control exactly how your data structure maps to disk using `#[parcode(...)]`.
| **(none)** | Field is serialized into the parent's payload. | Small primitives (`u32`, `bool`), short Strings, flags. Access is instant if parent is loaded. |
| `#[parcode(chunkable)]` | Field is stored in its own independent Chunk (node). | Large structs, vectors, or fields you want to load lazily (`.load()`). |
| `#[parcode(map)]` | Field (`HashMap`) is sharded by hash. | Large Dictionaries/Indices where you need random access by key (`.get()`). |
| `#[parcode(compression="lz4")]` | Overrides compression for this specific field/chunk. | Highly compressible data (text, save states). Requires `lz4_flex` feature. |
### Smart Defaults
Parcode is adaptive.
* **Small Vectors:** If a `Vec` marked `chunkable` is tiny (< 4KB), Parcode may inline it or merge chunks to avoid overhead.
* **Small Maps:** If a `HashMap` marked `map` has few items (< 200), Parcode automatically falls back to standard serialization to save space.
---
## Benchmarks vs The World
We benchmarked Parcode V3 against `bincode` (the Rust standard for raw speed) and `sled` (an embedded DB) in a complex game world scenario involving heavy assets (100MB) and metadata lookups.
> **Scenario:** Cold Start of an application reading a massive World State file.
| **Cold Start** (Ready to read metadata) | **Parcode** | **0.16 ms** | **0 MB** | Instant. Only headers read. |
| | Bincode | 97.47 ms | 30 MB | Forced to deserialize everything. |
| **Deep Fetch** (Load 1 asset) | **Parcode** | **3.20 ms** | **3.8 MB** | Loads only the target 1MB chunk. |
| | Bincode | 97.47 ms | 30 MB | Same cost as full load. |
| **Map Lookup** (Find user by ID) | **Parcode** | **0.02 ms** | **0 MB** | **4000x Faster**. Hash Sharding win. |
| | Bincode | 97.47 ms | 30 MB | O(N) scan. |
| **Write Speed** (Throughput) | **Parcode** | ~73 ms | 48 MB | Slightly slower due to graph building (Depending of dataset data). |
| | Bincode | ~50 ms | 0.01 MB | Faster but produces monolithic blobs. |
*Benchmarks run on NVMe SSD. Parallel throughput scales with cores.*
---
## Real-World Scenario: Heavy Game Assets
Benchmarks are useful, but how does it feel in a real application?
We simulated a Game Engine loading a World State containing two **50MB binary assets** (Skybox and Terrain) plus metadata.
> **Test:** Write to disk, restart application, read metadata (World Name), and load *only* the Skybox.
> *Note: Compression disabled to measure pure architectural efficiency.*
| **Save World** (Write) | **3.37 s** | 8.88 s | **2.6x Faster** (Parallel Writes) |
| **Start Up** (Read Metadata) | **358 µs** | 10.84 s | **~30,000x Faster** (Lazy vs Full) |
| **Load Skybox** (Partial) | 1.33 s | N/A | Granular loading |
| **Total Workflow** | **4.70 s** | 19.72 s | **4.2x Faster** |
**The User Experience Difference:**
* **With Bincode:** The user stares at a frozen loading screen for **10.8 seconds** just to see the level name. Memory usage spikes to load assets that might not even be visible yet.
* **With Parcode:** The application opens **instantly (<1ms)**. The UI populates immediately. The 50MB Skybox streams in smoothly over 1.3 seconds. The Terrain is never loaded if the user doesn't look at it.
---
## Architecture Under the Hood
Parcode treats your data as a **Dependency Graph**, not a byte stream.
1. **Zero-Copy Write:** The serializer "borrows" your data (`&[T]`) instead of cloning it. It builds a graph of `ChunkNode`s representing your struct hierarchy.
2. **Parallel Execution:** Writing is orchestrated by a graph executor. Independent nodes (chunks) are serialized, compressed, and written to disk concurrently.
3. **Physical Layout:** The file format (`.pcode` v4) places children before parents ("Bottom-Up"). This allows the Root Node at the end of the file to contain a table of contents for the entire structure.
4. **Lazy Mirroring:** The `ParcodeObject` macro generates a "Mirror Struct" that holds `ChunkNode` handles instead of data. Accessing `mirror.field` simply returns another Mirror or a Promise, incurring zero I/O cost until the final `.load()`.
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
parcode = "0.3.1"
```
To enable LZ4 compression:
```toml
[dependencies]
parcode = { version = "0.3.1", features = ["lz4_flex"] }
```
## License
This project is licensed under the [MIT license](LICENSE).
---
*Built for the Rust community by RetypeOS.*