Skip to main content

Crate emstar

Crate emstar 

Source
Expand description

§emstar

High-performance STAR file I/O library for Rust, optimized for cryo-EM workflows.

§Overview

emstar provides a fast, type-safe, and ergonomic API for reading, writing, and manipulating STAR (Self-defining Text Archival and Retrieval) files. These files are widely used in cryo-electron microscopy software like RELION for storing particle data, optimization parameters, and processing metadata.

§Features

  • Fast Parsing: Efficient parsing with lexical number parsing and smart string optimization
  • Type-Safe API: Strong Rust typing ensures correctness at compile time
  • Comprehensive API: Full CRUD operations for files, data blocks, and individual values
  • STAR Format Support: Handles quoted strings, empty values, multi-block files, and edge cases
  • Statistics: Compute file statistics without loading entire dataset into memory
  • Zero-Copy Operations: Efficient columnar storage using Polars DataFrames

§Data Structures

STAR files contain one or more data blocks, each of which can be:

  • SimpleBlock: Key-value pairs (e.g., global parameters)
  • LoopBlock: Tabular data with columns and rows (e.g., particle coordinates)

§Quick Start

use emstar::{read, write, DataBlock};

// Read a STAR file
let data_blocks = read("particles.star", None)?;

// Access a data block
if let Some(DataBlock::Loop(df)) = data_blocks.get("particles") {
    println!("Found {} particles", df.row_count());
    println!("Columns: {:?}", df.columns());
}

// Write modified data
write(&data_blocks, "output.star", None)?;

§Creating a New STAR File

use emstar::{write, SimpleBlock, LoopBlock, DataBlock, DataValue};
use std::collections::HashMap;

let mut data = HashMap::new();

// Create a simple block using array initialization
let general: SimpleBlock = [
    ("rlnImageSize", DataValue::Integer(256)),
    ("rlnPixelSize", DataValue::Float(1.2)),
].into();
data.insert("general".to_string(), DataBlock::Simple(general));

// Create a loop block using the builder pattern
let particles = LoopBlock::builder()
    .columns(&["rlnCoordinateX", "rlnCoordinateY", "rlnCoordinateZ"])
    .rows(vec![vec![
        DataValue::Float(100.0),
        DataValue::Float(200.0),
        DataValue::Float(50.0),
    ]])
    .build()?;

data.insert("particles".to_string(), DataBlock::Loop(particles));

write(&data, "output.star", None)?;

§Querying Data

use emstar::{read, DataBlock, DataValue};

let data_blocks = read("particles.star", None)?;

if let Some(DataBlock::Loop(particles)) = data_blocks.get("particles") {
    // Get column data
    let x_coords = particles.get_column("rlnCoordinateX").unwrap();
    let y_coords = particles.get_column("rlnCoordinateY").unwrap();

    // Iterate over coordinates
    for (x, y) in x_coords.iter().zip(y_coords.iter()) {
        if let (DataValue::Float(x_val), DataValue::Float(y_val)) = (x, y) {
            println!("Particle at ({}, {})", x_val, y_val);
        }
    }
}

§Computing Statistics

use emstar::stats;

// Get statistics from file (loads entire file into memory)
let file_stats = stats("particles.star")?;
println!("Total blocks: {}", file_stats.n_blocks);
println!("Loop blocks: {}", file_stats.n_loop_blocks);
println!("Total particles: {}", file_stats.total_loop_rows);

§Data Block Operations

§SimpleBlock (Key-Value Pairs)

use emstar::{read, DataBlock};

let data_blocks = read("parameters.star", None)?;

if let Some(DataBlock::Simple(params)) = data_blocks.get("general") {
    // Get a value
    if let Some(value) = params.get("rlnImageSize") {
        println!("Image size: {:?}", value);
    }

    // Set a value
    // params.set("new_key", DataValue::Integer(42));

    // Check if key exists
    if params.contains_key("rlnImageSize") {
        println!("Key exists");
    }

    // Get all keys
    for key in params.keys() {
        println!("Key: {}", key);
    }
}

§LoopBlock (Tabular Data)

use emstar::{read, DataBlock};

let data_blocks = read("particles.star", None)?;

if let Some(DataBlock::Loop(particles)) = data_blocks.get("particles") {
    // Get dimensions
    println!("{} particles with {} columns", particles.row_count(), particles.column_count());

    // Get column names
    let columns = particles.columns();
    println!("Columns: {:?}", columns);

    // Get cell value by index
    if let Some(value) = particles.get(0, 0) {
        println!("First cell: {:?}", value);
    }

    // Get cell value by column name
    if let Some(value) = particles.get_by_name(0, "rlnCoordinateX") {
        println!("First X coordinate: {:?}", value);
    }

    // Iterate over rows
    for (i, row) in particles.iter_rows().enumerate() {
        println!("Row {}: {:?}", i, row);
    }
}

§LoopBlock Builder Pattern

Use the builder pattern for more ergonomic LoopBlock creation:

use emstar::{LoopBlock, DataValue};

let particles = LoopBlock::builder()
    .columns(&["rlnCoordinateX", "rlnCoordinateY", "rlnAnglePsi"])
    .rows(vec![
        vec![DataValue::Float(100.0), DataValue::Float(200.0), DataValue::Float(45.0)],
        vec![DataValue::Float(150.0), DataValue::Float(250.0), DataValue::Float(90.0)],
    ])
    .build()?;

assert_eq!(particles.row_count(), 2);
assert_eq!(particles.column_count(), 3);

§SimpleBlock Array Initialization

Create a SimpleBlock from an array of key-value pairs:

use emstar::{SimpleBlock, DataValue};

let general: SimpleBlock = [
    ("rlnImageSize", DataValue::Integer(256)),
    ("rlnPixelSize", DataValue::Float(1.06)),
    ("rlnVoltage", DataValue::Float(300.0)),
].into();

assert_eq!(general.len(), 3);

§DataBlock Convenience Methods

Access blocks without verbose pattern matching:

use emstar::{read, DataBlock, SimpleBlock, LoopBlock};

let data_blocks = read("particles.star", None)?;

// Using expect methods (panics with message if wrong type)
if let Some(block) = data_blocks.get("general") {
    let general: &SimpleBlock = block.expect_simple("general should be a SimpleBlock");
}
if let Some(block) = data_blocks.get("particles") {
    let particles: &LoopBlock = block.expect_loop("particles should be a LoopBlock");
}

// Using as methods (returns Option)
if let Some(block) = data_blocks.get("general") {
    if let Some(simple) = block.as_simple() {
        // Work with SimpleBlock
    }
}

// Check block type
if let Some(block) = data_blocks.get("particles") {
    if block.is_loop() {
        // It's a LoopBlock
    }
}

§Error Handling

All functions return Result<T, Error>. Common error types:

  • Error::FileNotFound - The specified file does not exist
  • Error::Io - I/O error occurred
  • Error::Parse - Failed to parse the STAR file
use emstar::{read, Error};

match read("particles.star", None) {
    Ok(data) => println!("Successfully read {} blocks", data.len()),
    Err(Error::FileNotFound(path)) => println!("File not found: {:?}", path),
    Err(Error::Parse { line, message }) => {
        println!("Parse error at line {}: {}", line, message);
    }
    Err(e) => println!("Error: {:?}", e),
}

§Performance Considerations

  • Parsing: Uses the lexical crate for fast number parsing
  • Memory: LoopBlocks use Polars DataFrames for efficient columnar storage
  • String Storage: Uses SmartString for small string optimization
  • Statistics: Can compute statistics without loading full file into memory

§STAR File Format

STAR files have the following structure:

data_block_name
_key1 value1
_key2 value2

loop_
_column1 _column2 _column3
value1   value2   value3
value4   value5   value6
  • Data blocks start with data_ followed by a name
  • Simple blocks contain key-value pairs starting with _
  • Loop blocks start with loop_ and contain tabular data
  • Values can be unquoted, single-quoted, or double-quoted
  • Empty values are represented as "" or ''

§See Also

Structs§

LoopBlock
Represents a loop data block (table-like data) Uses Polars DataFrame for efficient columnar storage and operations
LoopBlockBuilder
Builder for constructing LoopBlock with a fluent API
LoopBlockStats
Statistics for a LoopBlock
ReadOptions
Configuration options for reading STAR files
SimpleBlock
Represents a simple (non-loop) data block Contains key-value pairs
SimpleBlockStats
Statistics for a SimpleBlock
StarStats
Comprehensive statistics for a STAR file
ValidationDetails
Validation details returned by validate()
WriteOptions
Configuration options for writing STAR files

Enums§

DataBlock
Represents a data block in a STAR file
DataBlockStats
Statistics for a DataBlock (either Simple or Loop)
DataValue
Represents a value in a STAR file
Error
Error type for emstar operations

Functions§

list_blocks
List all data blocks with their names and types.
merge
Merge multiple STAR files into a single output file.
merge_with_file
Merge data blocks with an existing STAR file.
read
Read a STAR file from disk.
stats
Calculate statistics for a STAR file.
stats_streaming
Calculate streaming statistics for a STAR file without loading all data.
to_string
Convert data blocks to a STAR format string.
validate
Validate a STAR file without loading all data into memory.
write
Write data blocks to a STAR file.

Type Aliases§

Result
Result type alias for emstar operations