rds2rust
A pure Rust library for reading and writing R's RDS (R Data Serialization) files without requiring an R runtime. Inspired by rds2cpp, which provides similar functionality with a C++ implementation.
Features
- Pure Rust implementation - No R runtime required
- Complete RDS format support - Reads and writes all R object types
- Memory efficient - Optimized with string interning, compact attributes, and object deduplication
- Automatic compression - Transparent gzip compression/decompression
- Type safe - Strong Rust types for all R objects
- Zero-copy where possible - Efficient parsing and serialization
Supported R Types
- Primitive types: NULL, integers, doubles, logicals, characters, raw bytes, complex numbers
- Collections: vectors, lists, pairlists, expression vectors
- Data structures: data frames, matrices, factors (ordered and unordered)
- Object-oriented: S3 objects, S4 objects with slots
- Language objects: formulas, unevaluated expressions, function calls
- Functions: closures, environments, promises, special/builtin functions
- Advanced: reference tracking (REFSXP), ALTREP compact sequences
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Quick Start
Reading an RDS file
use ;
use fs;
Writing an RDS file
use ;
use fs;
use Arc;
Working with Data Frames
use ;
use Arc;
// Read a data frame
let data = read?;
let obj = read_rds?;
if let DataFrame = obj
Working with Factors
use ;
let data = read?;
let obj = read_rds?;
if let Factor = obj
Working with S3/S4 Objects
use ;
use Arc;
let data = read?;
let obj = read_rds?;
// S3 objects
if let S3Object = obj
// S4 objects
if let S4Object = obj
Roundtrip: Read and Write
use ;
use fs;
// Read an RDS file
let input_data = read?;
let obj = read_rds?;
// Process the data...
// (modify the object as needed)
// Write back to RDS format
let output_data = write_rds?;
write?;
// Verify roundtrip
let obj2 = read_rds?;
assert_eq!;
Type System
The RObject enum represents all possible R object types:
Special Values
R's special values are represented as:
- NA (integers):
RObject::NA_INTEGERconstant (i32::MIN) - NA (logicals):
Logical::Naenum variant - NA (real): Check with
f64::is_nan() - Inf/-Inf:
f64::INFINITYandf64::NEG_INFINITY - NaN:
f64::NAN
Memory Optimizations
rds2rust includes several memory optimizations for efficient data processing:
- String Interning - All strings use
Arc<str>for automatic deduplication - Boxed Large Variants - Large enum variants are boxed to reduce memory overhead
- Compact Attributes - SmallVec stores 0-2 attributes inline without heap allocation
- Object Deduplication - Identical objects are automatically shared during parsing
These optimizations provide 20-50% memory reduction for typical RDS files while maintaining zero API overhead.
Performance Tips
Reading Large Files
use read_rds;
use File;
use Read;
// For very large files, read in chunks if needed
let mut file = open?;
let mut buffer = Vecnew;
file.read_to_end?;
let obj = read_rds?;
Reusing Parsed Objects
use Arc;
use RObject;
// Wrap in Arc for cheap cloning
let obj = new;
// Clone is cheap (just increments reference count)
let obj2 = clone;
Limitations
- Write support: All R types can be written except for some complex environment configurations
- Compression formats: Currently supports gzip; bzip2/xz support planned
- ALTREP: Reads ALTREP objects but writes them as regular vectors
- External pointers: Not supported (rarely used in serialized data)
Development Status
Current version: 0.1.0
Test coverage: 137 passing tests covering all R object types
Completed phases:
- ✅ All basic R types (NULL, vectors, matrices, data frames)
- ✅ All object-oriented types (S3, S4, factors)
- ✅ All language types (expressions, formulas, closures, environments)
- ✅ All special types (promises, special functions, builtin functions)
- ✅ Reference tracking and ALTREP optimization
- ✅ Complete read/write roundtrip support
- ✅ Memory optimizations (string interning, compact attributes, deduplication)
License
Licensed under:
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)