compress-json-rs
AI-driven Rust port of the JavaScript compress-json library by Beenotung. Store JSON data in a space-efficient compressed form with lossless round-trip compression and decompression.
Table of Contents
- Features
- Installation
- Quick Start
- Usage Examples
- API Reference
- How It Works
- Architecture
- Compression Format
- Configuration
- Helper Functions
- Performance Considerations
- License
Features
- Full JSON Support: Objects, arrays, strings, numbers, booleans, and null
- Value Deduplication: Repeated values stored once with reference keys
- Schema Deduplication: Objects with identical keys share schemas
- Compact Encoding: Numbers encoded in base-62 format
- Type Safety: Zero-copy round-trip using
serde_json::Value - UTF-8 Safe: Full Unicode support for strings
- No Dependencies on Disk/Network: Fast in-memory compression
Installation
Add to your Cargo.toml:
[]
= "0.1.0"
= "1.0"
Quick Start
use ;
use json;
Usage Examples
Basic Object Compression
use ;
use json;
let user = json!;
let = compress;
// The compressed form is a tuple of:
// - values: Vec<String> - deduplicated value store
// - root: String - key pointing to the root value
println!;
println!;
// Restore original
let restored = decompress;
assert_eq!;
Array with Repeated Objects
use ;
use json;
// Arrays of objects with similar schemas benefit most from compression
let data = json!;
let compressed = compress;
let restored = decompress;
assert_eq!;
Serialization for Storage/Transmission
use ;
use json;
let data = json!;
// Compress
let compressed = compress;
// Serialize to JSON string for storage
let json_str = to_string.unwrap;
println!;
// Later: deserialize and decompress
let loaded: Compressed = from_str.unwrap;
let restored = decompress;
assert_eq!;
Working with Files
use ;
use json;
use fs;
// Usage
let data = json!;
save_compressed.unwrap;
let restored = load_compressed.unwrap;
Using Helper Functions
use ;
use ;
// Remove null values from objects before compression
let mut data: = from_value.unwrap;
trim_undefined;
// data now only contains "name" and "age"
// Recursively remove nulls from nested objects
let mut nested: = from_value.unwrap;
trim_undefined_recursively;
API Reference
Core Functions
/// Compressed representation: (values array, root key)
pub type Compressed = ;
/// Key type for value references
pub type Key = String;
/// Compress a JSON value into its compressed form
;
/// Decompress a compressed form back into JSON
;
/// Decode a single key from the values array
;
Lower-Level API
/// Memory structure for compression state
/// Create a new memory instance for compression
;
/// Add a value to memory, returns its reference key
;
/// Convert memory to the values array
;
Helper Functions
/// Remove keys with null values from an object (shallow)
;
/// Recursively remove keys with null values from nested objects
;
Configuration
/// Global configuration for compression behavior
pub const CONFIG: Config;
How It Works
The compression algorithm works by deduplicating values and encoding references using base-62 keys.
Compression Flow
flowchart TD
subgraph Input
A[JSON Value]
end
subgraph Compression Process
B[Create Memory Store]
C{Value Type?}
D[Encode Boolean]
E[Encode Number]
F[Encode String]
G[Process Array]
H[Process Object]
I[Check Value Cache]
J{Cached?}
K[Return Existing Key]
L[Generate New Key]
M[Store Value]
end
subgraph Output
N[Compressed Tuple]
O["(Vec<String>, Key)"]
end
A --> B
B --> C
C -->|bool| D
C -->|number| E
C -->|string| F
C -->|array| G
C -->|object| H
D & E & F --> I
G --> |each element| C
H --> |schema + values| C
I --> J
J -->|yes| K
J -->|no| L
L --> M
M --> K
K --> N
N --> O
Decompression Flow
flowchart TD
subgraph Input
A["Compressed (values, root)"]
end
subgraph Decompression Process
B[Parse Root Key]
C[Lookup Value in Store]
D{Value Prefix?}
E[Decode Boolean]
F[Decode Number]
G[Decode String]
H[Decode Array]
I[Decode Object]
J[Recursive Decode]
end
subgraph Output
K[JSON Value]
end
A --> B
B --> C
C --> D
D -->|"b|"| E
D -->|"n|"| F
D -->|"s|" or none| G
D -->|"a|"| H
D -->|"o|"| I
H --> J
I --> J
J --> C
E & F & G --> K
H & I --> K
Architecture
Module Structure
graph TB
subgraph Public API
LIB[lib.rs]
end
subgraph Core Modules
CORE[core.rs<br/>compress/decompress]
MEM[memory.rs<br/>value storage]
ENC[encode.rs<br/>type encoding]
end
subgraph Support Modules
NUM[number.rs<br/>base-62 conversion]
BOOL[boolean.rs<br/>bool encoding]
HELP[helpers.rs<br/>utility functions]
CFG[config.rs<br/>configuration]
DBG[debug.rs<br/>error handling]
end
LIB --> CORE
LIB --> MEM
LIB --> HELP
LIB --> CFG
CORE --> MEM
CORE --> ENC
MEM --> ENC
MEM --> NUM
MEM --> CFG
MEM --> DBG
ENC --> NUM
ENC --> BOOL
Memory Structure
classDiagram
class Memory {
-Vec~String~ store
-HashMap~String, String~ value_cache
-HashMap~String, String~ schema_cache
-usize key_count
}
class Compressed {
+Vec~String~ values
+String root_key
}
Memory --> Compressed : produces
note for Memory "Stores encoded values with<br/>deduplication via caches"
note for Compressed "Final output format:<br/>(values, root)"
Compression Format
Value Encoding Prefixes
| Prefix | Type | Example Encoded | Original Value |
|---|---|---|---|
b|T |
Boolean true | b|T |
true |
b|F |
Boolean false | b|F |
false |
n| |
Number | n|42.5 |
42.5 |
s| |
Escaped string | s|n|123 |
"n|123" |
a| |
Array | a|0|1|2 |
Array with refs 0,1,2 |
o| |
Object | o|0|1|2 |
Object with schema ref |
| (none) | Plain string | hello |
"hello" |
"" or _ |
Null | `` | null |
Key Encoding (Base-62)
Keys are encoded using base-62 for compact representation:
Characters: 0-9 A-Z a-z (62 total)
Examples:
0 -> "0"
9 -> "9"
10 -> "A"
35 -> "Z"
36 -> "a"
61 -> "z"
62 -> "10"
124 -> "20"
Example Compression
graph LR
subgraph Original JSON
A["{
'name': 'Alice',
'role': 'admin'
}"]
end
subgraph Compressed Values Array
B["0: 'name,role' (schema)
1: 'Alice'
2: 'admin'
3: 'o|0|1|2' (object)"]
end
subgraph Compressed Output
C["(['name,role', 'Alice',
'admin', 'o|0|1|2'], '3')"]
end
A --> B
B --> C
Schema Sharing Example
graph TD
subgraph "Input: Array of Objects"
A["[
{ id: 1, type: 'A' },
{ id: 2, type: 'B' },
{ id: 3, type: 'A' }
]"]
end
subgraph "Compressed Values"
B["0: 'a|id,type' // shared schema
1: 'n|1'
2: 'A'
3: 'o|0|1|2' // obj 1
4: 'n|2'
5: 'B'
6: 'o|0|4|5' // obj 2
7: 'n|3'
8: 'o|0|7|2' // obj 3 (reuses 'A')
9: 'a|3|6|8' // root array"]
end
subgraph Benefits
C["✓ Schema 'id,type' stored once
✓ Value 'A' stored once
✓ Minimal storage for repetitive data"]
end
A --> B
B --> C
Configuration
The library uses a compile-time configuration:
pub const CONFIG: Config = Config ;
Behavior Notes
- NaN and Infinity: By default, these invalid JSON numbers are silently converted to
null - Key Order: Object keys maintain insertion order unless
sort_keyis enabled - Unicode: Full UTF-8 support for all string values
Helper Functions
trim_undefined
Removes keys with null values from an object (shallow operation):
use trim_undefined;
use ;
let mut obj: = from_value.unwrap;
trim_undefined;
// obj = { "a": 1, "c": 3 }
trim_undefined_recursively
Removes null values from nested objects:
use trim_undefined_recursively;
use ;
let mut obj: = from_value.unwrap;
trim_undefined_recursively;
// obj = { "user": { "name": "Alice" } }
Performance Considerations
Best Use Cases
graph LR
subgraph "High Compression Ratio"
A[Arrays of similar objects]
B[Repeated string values]
C[Nested objects with shared schemas]
end
subgraph "Lower Compression Ratio"
D[Unique primitive values]
E[Deeply nested unique data]
F[Large binary-like strings]
end
A --> G[Excellent]
B --> G
C --> G
D --> H[Moderate]
E --> H
F --> H
Memory Usage
- Compression builds an in-memory store with hash maps for deduplication
- For very large JSON documents, consider streaming or chunked processing
- The compressed format itself is typically 30-70% smaller for repetitive data
Compression Ratio Examples
| Data Type | Typical Ratio |
|---|---|
| API response arrays | 40-60% of original |
| Configuration files | 50-70% of original |
| Unique data | 90-100% of original |
| Highly repetitive | 20-40% of original |
Testing
Run the test suite:
The library includes comprehensive tests covering:
- Number encoding edge cases
- Unicode string handling
- Empty objects and arrays
- Null value handling
- Deeply nested structures
- Schema deduplication
License
Licensed under the BSD-2-Clause license. See LICENSE for details.
Related Projects
- compress-json - Original TypeScript implementation
- serde_json - JSON serialization framework for Rust