Expand description
§emstar
High-performance STAR file I/O library for Rust, optimized for cryo-EM workflows.
§Overview
emstar provides a fast, type-safe, and ergonomic API for reading, writing, and manipulating STAR (Self-defining Text Archival and Retrieval) files. These files are widely used in cryo-electron microscopy software like RELION for storing particle data, optimization parameters, and processing metadata.
§Features
- Fast Parsing: Efficient parsing with lexical number parsing and smart string optimization
- Type-Safe API: Strong Rust typing ensures correctness at compile time
- Comprehensive API: Full CRUD operations for files, data blocks, and individual values
- STAR Format Support: Handles quoted strings, empty values, multi-block files, and edge cases
- Statistics: Compute file statistics without loading entire dataset into memory
- Zero-Copy Operations: Efficient columnar storage using Polars DataFrames
§Data Structures
STAR files contain one or more data blocks, each of which can be:
- SimpleBlock: Key-value pairs (e.g., global parameters)
- LoopBlock: Tabular data with columns and rows (e.g., particle coordinates)
§Quick Start
use emstar::{read, write, DataBlock};
// Read a STAR file
let data_blocks = read("particles.star", None)?;
// Access a data block
if let Some(DataBlock::Loop(df)) = data_blocks.get("particles") {
println!("Found {} particles", df.row_count());
println!("Columns: {:?}", df.columns());
}
// Write modified data
write(&data_blocks, "output.star", None)?;§Creating a New STAR File
use emstar::{write, SimpleBlock, LoopBlock, DataBlock, DataValue};
use std::collections::HashMap;
let mut data = HashMap::new();
// Create a simple block using array initialization
let general: SimpleBlock = [
("rlnImageSize", DataValue::Integer(256)),
("rlnPixelSize", DataValue::Float(1.2)),
].into();
data.insert("general".to_string(), DataBlock::Simple(general));
// Create a loop block using the builder pattern
let particles = LoopBlock::builder()
.columns(&["rlnCoordinateX", "rlnCoordinateY", "rlnCoordinateZ"])
.rows(vec![vec![
DataValue::Float(100.0),
DataValue::Float(200.0),
DataValue::Float(50.0),
]])
.build()?;
data.insert("particles".to_string(), DataBlock::Loop(particles));
write(&data, "output.star", None)?;§Querying Data
use emstar::{read, DataBlock, DataValue};
let data_blocks = read("particles.star", None)?;
if let Some(DataBlock::Loop(particles)) = data_blocks.get("particles") {
// Get column data
let x_coords = particles.get_column("rlnCoordinateX").unwrap();
let y_coords = particles.get_column("rlnCoordinateY").unwrap();
// Iterate over coordinates
for (x, y) in x_coords.iter().zip(y_coords.iter()) {
if let (DataValue::Float(x_val), DataValue::Float(y_val)) = (x, y) {
println!("Particle at ({}, {})", x_val, y_val);
}
}
}§Computing Statistics
use emstar::stats;
// Get statistics from file (loads entire file into memory)
let file_stats = stats("particles.star")?;
println!("Total blocks: {}", file_stats.n_blocks);
println!("Loop blocks: {}", file_stats.n_loop_blocks);
println!("Total particles: {}", file_stats.total_loop_rows);§Data Block Operations
§SimpleBlock (Key-Value Pairs)
use emstar::{read, DataBlock};
let data_blocks = read("parameters.star", None)?;
if let Some(DataBlock::Simple(params)) = data_blocks.get("general") {
// Get a value
if let Some(value) = params.get("rlnImageSize") {
println!("Image size: {:?}", value);
}
// Set a value
// params.set("new_key", DataValue::Integer(42));
// Check if key exists
if params.contains_key("rlnImageSize") {
println!("Key exists");
}
// Get all keys
for key in params.keys() {
println!("Key: {}", key);
}
}§LoopBlock (Tabular Data)
use emstar::{read, DataBlock};
let data_blocks = read("particles.star", None)?;
if let Some(DataBlock::Loop(particles)) = data_blocks.get("particles") {
// Get dimensions
println!("{} particles with {} columns", particles.row_count(), particles.column_count());
// Get column names
let columns = particles.columns();
println!("Columns: {:?}", columns);
// Get cell value by index
if let Some(value) = particles.get(0, 0) {
println!("First cell: {:?}", value);
}
// Get cell value by column name
if let Some(value) = particles.get_by_name(0, "rlnCoordinateX") {
println!("First X coordinate: {:?}", value);
}
// Iterate over rows
for (i, row) in particles.iter_rows().enumerate() {
println!("Row {}: {:?}", i, row);
}
}§LoopBlock Builder Pattern
Use the builder pattern for more ergonomic LoopBlock creation:
use emstar::{LoopBlock, DataValue};
let particles = LoopBlock::builder()
.columns(&["rlnCoordinateX", "rlnCoordinateY", "rlnAnglePsi"])
.rows(vec![
vec![DataValue::Float(100.0), DataValue::Float(200.0), DataValue::Float(45.0)],
vec![DataValue::Float(150.0), DataValue::Float(250.0), DataValue::Float(90.0)],
])
.build()?;
assert_eq!(particles.row_count(), 2);
assert_eq!(particles.column_count(), 3);§SimpleBlock Array Initialization
Create a SimpleBlock from an array of key-value pairs:
use emstar::{SimpleBlock, DataValue};
let general: SimpleBlock = [
("rlnImageSize", DataValue::Integer(256)),
("rlnPixelSize", DataValue::Float(1.06)),
("rlnVoltage", DataValue::Float(300.0)),
].into();
assert_eq!(general.len(), 3);§DataBlock Convenience Methods
Access blocks without verbose pattern matching:
use emstar::{read, DataBlock, SimpleBlock, LoopBlock};
let data_blocks = read("particles.star", None)?;
// Using expect methods (panics with message if wrong type)
if let Some(block) = data_blocks.get("general") {
let general: &SimpleBlock = block.expect_simple("general should be a SimpleBlock");
}
if let Some(block) = data_blocks.get("particles") {
let particles: &LoopBlock = block.expect_loop("particles should be a LoopBlock");
}
// Using as methods (returns Option)
if let Some(block) = data_blocks.get("general") {
if let Some(simple) = block.as_simple() {
// Work with SimpleBlock
}
}
// Check block type
if let Some(block) = data_blocks.get("particles") {
if block.is_loop() {
// It's a LoopBlock
}
}§Error Handling
All functions return Result<T, Error>. Common error types:
Error::FileNotFound- The specified file does not existError::Io- I/O error occurredError::Parse- Failed to parse the STAR file
use emstar::{read, Error};
match read("particles.star", None) {
Ok(data) => println!("Successfully read {} blocks", data.len()),
Err(Error::FileNotFound(path)) => println!("File not found: {:?}", path),
Err(Error::Parse { line, message }) => {
println!("Parse error at line {}: {}", line, message);
}
Err(e) => println!("Error: {:?}", e),
}§Performance Considerations
- Parsing: Uses the
lexicalcrate for fast number parsing - Memory: LoopBlocks use Polars DataFrames for efficient columnar storage
- String Storage: Uses
SmartStringfor small string optimization - Statistics: Can compute statistics without loading full file into memory
§STAR File Format
STAR files have the following structure:
data_block_name
_key1 value1
_key2 value2
loop_
_column1 _column2 _column3
value1 value2 value3
value4 value5 value6- Data blocks start with
data_followed by a name - Simple blocks contain key-value pairs starting with
_ - Loop blocks start with
loop_and contain tabular data - Values can be unquoted, single-quoted, or double-quoted
- Empty values are represented as
""or''
§See Also
- API Documentation - Detailed API reference
- Examples - Example code
- RELION Documentation - Information about STAR files in cryo-EM
Structs§
- Loop
Block - Represents a loop data block (table-like data) Uses Polars DataFrame for efficient columnar storage and operations
- Loop
Block Builder - Builder for constructing LoopBlock with a fluent API
- Loop
Block Stats - Statistics for a LoopBlock
- Read
Options - Configuration options for reading STAR files
- Simple
Block - Represents a simple (non-loop) data block Contains key-value pairs
- Simple
Block Stats - Statistics for a SimpleBlock
- Star
Stats - Comprehensive statistics for a STAR file
- Validation
Details - Validation details returned by validate()
- Write
Options - Configuration options for writing STAR files
Enums§
- Data
Block - Represents a data block in a STAR file
- Data
Block Stats - Statistics for a DataBlock (either Simple or Loop)
- Data
Value - Represents a value in a STAR file
- Error
- Error type for emstar operations
Functions§
- list_
blocks - List all data blocks with their names and types.
- merge
- Merge multiple STAR files into a single output file.
- merge_
with_ file - Merge data blocks with an existing STAR file.
- read
- Read a STAR file from disk.
- stats
- Calculate statistics for a STAR file.
- stats_
streaming - Calculate streaming statistics for a STAR file without loading all data.
- to_
string - Convert data blocks to a STAR format string.
- validate
- Validate a STAR file without loading all data into memory.
- write
- Write data blocks to a STAR file.
Type Aliases§
- Result
- Result type alias for emstar operations