# onecode-rs
Rust bindings for [ONEcode](https://github.com/thegenemyers/ONEcode), a simple and efficient data representation format for genomic data.
## Overview
ONEcode is a data representation framework designed primarily for genomic data, providing both human-readable ASCII and compressed binary file versions with strongly typed data.
This library provides safe, idiomatic Rust bindings to the ONEcode C library.
## Features
- ✅ Read and write ONE files in both ASCII and binary formats
- ✅ Schema validation and creation
- ✅ Provenance and reference tracking
- ✅ Type-safe access to fields (integers, reals, characters, strings, lists)
- ✅ File navigation and statistics
- ✅ Sequence name extraction from embedded GDB in alignment files
- ✅ RAII-based resource management
- ✅ **Fully thread-safe** - concurrent operations supported
## Requirements
### System Dependencies
This library uses `bindgen` to generate Rust bindings from C headers, which requires clang/libclang:
**Ubuntu/Debian:**
```bash
sudo apt-get install llvm-dev libclang-dev clang
```
**Fedora/RHEL:**
```bash
sudo dnf install clang-devel llvm-devel
```
**macOS:**
```bash
xcode-select --install # Usually already installed
```
**Arch Linux:**
```bash
sudo pacman -S clang
```
For more details, see the [bindgen requirements documentation](https://rust-lang.github.io/rust-bindgen/requirements.html).
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
onecode = { git = "https://github.com/pangenome/onecode-rs" }
```
## Usage
### Reading a ONE file
```rust
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("data.1seq", None, None, 1)?;
// Read through the file
loop {
let line_type = file.read_line();
if line_type == '\0' {
break; // End of file
}
match line_type {
'S' => {
// Access DNA sequence data
println!("Sequence line");
},
'I' => {
// Access identifier string
println!("ID: {}", file.int(0));
},
_ => {}
}
}
Ok(())
}
```
### Writing a ONE file
```rust
use onecode::{OneFile, OneSchema};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a schema
let schema_text = "P 3 tst\nO T 1 3 INT\n";
let schema = OneSchema::from_text(schema_text)?;
// Open file for writing
let mut writer = OneFile::open_write_new(
"output.1tst",
&schema,
"tst",
false, // ASCII format
1 // single-threaded
)?;
// Add provenance
writer.add_provenance("myprogram", "1.0", "example command")?;
// Write data
writer.set_int(0, 42);
writer.write_line('T', 0, None);
// File is automatically closed on drop
Ok(())
}
```
### Creating schemas from text
```rust
use onecode::OneSchema;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Define schema inline
let schema_text = r#"
P 3 seq
O S 1 3 DNA
D I 1 3 INT
"#;
let schema = OneSchema::from_text(schema_text)?;
// Use schema for file operations
Ok(())
}
```
### Getting file statistics
```rust
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = OneFile::open_read("data.1seq", None, None, 1)?;
// Get statistics for a line type
let (count, max_length, total_length) = file.stats('S')?;
println!("Sequences: {}, Max length: {}, Total: {}",
count, max_length, total_length);
Ok(())
}
```
### Working with alignment files (.1aln) and sequence names
Alignment files can contain embedded genome database (GDB) information, mapping sequence IDs to names:
```rust
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get all sequence names (efficient for multiple lookups)
let seq_names = file.get_all_sequence_names();
println!("Found {} sequences", seq_names.len());
// Read alignments and resolve sequence names
loop {
let line_type = file.read_line();
if line_type == '\0' { break; }
if line_type == 'A' {
let query_id = file.int(0);
let target_id = file.int(3);
if let (Some(query_name), Some(target_name)) =
(seq_names.get(&query_id), seq_names.get(&target_id)) {
println!("Alignment: {} vs {}", query_name, target_name);
}
}
}
Ok(())
}
```
Or look up individual names on-demand:
```rust
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get a specific sequence name by ID
if let Some(name) = file.get_sequence_name(5) {
println!("Sequence 5: {}", name);
}
```
## API Documentation
Full API documentation is available via cargo doc:
```bash
cargo doc --open
```
Key types:
- `OneFile` - Main file handle for reading/writing ONE files
- `OneSchema` - Schema definition and validation
- `OneError` - Error types
- `OneType` - Field type enumeration
## Building
The library uses `bindgen` to automatically generate bindings from the C headers and `cc` to compile the C library.
```bash
cargo build --release
```
## Testing
All tests pass with full concurrent execution:
```bash
cargo test
```
Test suite includes:
- 9 basic functionality tests
- 3 sequence name extraction tests
- 4 thread-safety stress tests (10-50 concurrent threads)
- 2 doc tests
## Thread Safety
✅ **Fully thread-safe!** The library supports concurrent operations without any restrictions.
The upstream ONEcode C library has been updated with thread-local storage for all global state, making it safe for concurrent use from multiple threads. All operations including schema creation, file reading, and error handling work correctly under concurrent load.
## Architecture
The library is organized into several modules:
- `ffi` - Raw FFI bindings generated by bindgen
- `error` - Rust error types and Result wrapper
- `types` - Rust-friendly type definitions
- `file` - Safe `OneFile` wrapper with RAII resource management
- `schema` - `OneSchema` management and validation
## Integration with ONEcode
The C library is included as a git subtree in the `ONEcode/` directory and compiled automatically during the build process.
To update the ONEcode subtree:
```bash
git subtree pull --prefix ONEcode https://github.com/thegenemyers/ONEcode.git main --squash
```
## Performance
- Zero-copy access to data where possible
- Supports parallel reading/writing with configurable thread count
- Binary format provides efficient compression
- Thread-safe without synchronization overhead
## License
This Rust wrapper is licensed under MIT OR Apache-2.0.
The ONEcode C library has its own license - see `ONEcode/` for details.
## Contributing
Contributions are welcome! Please ensure tests pass before submitting PRs:
```bash
cargo test
cargo clippy
cargo fmt
```
## Acknowledgments
ONEcode was developed by Gene Myers and Richard Durbin. This Rust wrapper builds on their excellent work to provide safe, idiomatic Rust bindings.