emdoc 0.1.1

A fast, lossless serialEM MDOC parser and writer for cryo-electron microscopy. Eg. cryo-ET mdoc file.
Documentation
# ๐Ÿ“ฆ `emdoc`

**emdoc** โ€” A fast, high-performance serialEM MDOC parser and writer for cryo-EM **โšก**

- serialem + mdoc = emdoc
- for **rust** & **python** users

[![Rust](https://img.shields.io/badge/rust-%23000000.svg?style=for-the-badge&logo=rust&logoColor=white)](https://www.rust-lang.org/) [![Crates.io](https://img.shields.io/crates/v/mdocfile.svg?style=for-the-badge)](https://crates.io/crates/mdocfile) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](./LICENSE)

---

## โœจ Features

| Feature | Description |
|---------|-------------|
| โšก **Streaming Parse** | Process gigabyte-scale MDOC files with `BufRead` |
| ๐Ÿ”„ **Lossless Round-trip** | Preserve every character & comment |
| ๐Ÿ“ **Order Preservation** | Field order guaranteed |
| ๐Ÿ”ง **No Schema Lock-in** | Works with any MDOC variant |
| ๐ŸŽฏ **Typed Access** | `get::<f32>()` or `get_checked::<f32>()` for safety |
| ๐Ÿ **Python Integration** | `to_python_dict()`, `to_numpy_arrays()` |
| ๐Ÿ“Š **Polars Support** | Zero-copy DataFrame conversion |
| ๐Ÿ› ๏ธ **Normalization APIs** | Clean up inconsistent formatting |
| ๐Ÿš€ **Zero-copy Streaming** | Visitor pattern for huge files |

---

## ๐Ÿ“ฅ Installation

### Cargo.toml

```toml
[dependencies]
mdoc = { version = "0.1.0", features = ["serde"] }

# Optional features
# mdoc = { version = "0.1.0", features = ["serde", "python", "polars"] }
```

### Python

```bash
# Coming soon! For now, build with maturin
maturin develop --features python
```

---

## ๐Ÿš€ Quick Start

### 1๏ธโƒฃ Parse a File

```rust
use emdoc::Mdoc;

// Load entire MDOC into memory
let mdoc = Mdoc::from_file("data.mdoc")?;
println!("๐Ÿ“Š Found {} tilt images", mdoc.tilt_series().len());

// Access header
for entry in mdoc.header() {
    println!("๐Ÿท๏ธ  {:?}", entry);
}
```

### 2๏ธโƒฃ Read Typed Fields

```rust
let z0 = mdoc.tilt_image(0).unwrap();
println!("๐Ÿ”ฌ Tilt Angle: {:?}", z0.tilt_angle());
println!("๐ŸŽฏ Defocus: {:?}", z0.defocus());
println!("๐Ÿ“ Magnification: {:?}", z0.magnification());

// Type-safe access with error handling
let angle: f32 = z0.get_checked("TiltAngle")?;
```

### 3๏ธโƒฃ Modify & Save

```rust
let mut mdoc = Mdoc::from_file("input.mdoc")?;

// Update a field
mdoc.update_field(0, "Defocus", "-3.5");

// Add new tilt image
let new_image = mdoc.add_tilt_image(42);
new_image.set("TiltAngle", "45.0");
new_image.set("Magnification", "50000");

// Lossless write (preserves original formatting)
mdoc.write_lossless(std::fs::File::create("output.mdoc")?)?;
```

---

## ๐Ÿ“š API Reference

### ๐Ÿ—๏ธ Core Types

| Type | Description | Key Methods |
|------|-------------|-------------|
| **`Mdoc`** | Root MDOC container | `from_file()`, `write()`, `validate()` |
| **`ZBlock`** | Tilt image metadata block | `get()`, `set()`, `tilt_angle()` |
| **`HeaderEntry`** | Header line enum | `Comment`, `KeyValue`, `Unknown` |
| **`ParseError`** | Parse failures | `InvalidBlock`, `InvalidZValue` |
| **`FieldError`** | Field access errors | `Missing`, `InvalidType` |

---

### ๐Ÿ”ง `Mdoc` Methods

#### ๐Ÿ“‚ Constructors

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`from_reader`** | ๐Ÿ“– | `from_reader(R: BufRead)` โ†’ `Result<Mdoc, ParseError>` | Stream parse from any `BufRead` |
| **`from_file`** | ๐Ÿ’พ | `from_file<P: AsRef<Path>>(path: P)` โ†’ `Result<Mdoc, ParseError>` | Parse from file path |

#### โœ๏ธ Writers

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`write`** | โœ๏ธ | `write<W: Write>(&self, w: W)` โ†’ `io::Result<()>` | Write normalized format |
| **`write_lossless`** | ๐Ÿ”„ | `write_lossless<W: Write>(&self, w: W)` โ†’ `io::Result<()>` | Preserve original formatting! |

#### ๐Ÿ” Accessors

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`header`** | ๐Ÿท๏ธ | `header(&self)` โ†’ `&[HeaderEntry]` | Get header entries |
| **`tilt_series`** | ๐Ÿ“Š | `tilt_series(&self)` โ†’ `&[ZBlock]` | Get all Z blocks |
| **`tilt_image`** | ๐ŸŽฏ | `tilt_image(&self, z: usize)` โ†’ `Option<&ZBlock>` | Get specific Z block |
| **`tilt_image_mut`** | ๐Ÿ› ๏ธ | `tilt_image_mut(&self, z: usize)` โ†’ `Option<&mut ZBlock>` | Get mutable Z block |

#### ๐Ÿ› ๏ธ Mutators

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`add_tilt_image`** | โž• | `add_tilt_image(&mut self, z: usize)` โ†’ `&mut ZBlock` | Add or replace Z block |
| **`remove_tilt_image`** | ๐Ÿ—‘๏ธ | `remove_tilt_image(&mut self, z: usize)` โ†’ `bool` | Remove Z block |
| **`update_field`** | ๐Ÿ“ | `update_field(&mut self, z: usize, key: K, value: V)` โ†’ `bool` | Set single field |

#### โœ… Validation & Normalization

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`validate`** | โœ… | `validate(&self)` โ†’ `Result<(), Vec<ValidationError>>` | Check for duplicate Z values |
| **`normalize_spaces`** | ๐Ÿงน | `normalize_spaces(&mut self)` | Collapse multiple spaces |
| **`normalize_format`** | ๐ŸŽจ | `normalize_format(&mut self)` | Standardize `Key = value;` format |
| **`capture_raw_values`** | ๐Ÿ“ธ | `capture_raw_values(&mut self)` | **Deprecated** - now auto-captured |

#### ๐Ÿ”Œ Serialization

| Method | Emoji | Feature | Signature | Description |
|--------|-------|---------|-----------|-------------|
| **`to_json`** | ๐Ÿ“ค | `serde` | `to_json(&self)` โ†’ `Result<String, serde_json::Error>` | Pretty JSON export |
| **`from_json`** | ๐Ÿ“ฅ | `serde` | `from_json(json: &str)` โ†’ `Result<Mdoc, serde_json::Error>` | JSON import |

#### ๐Ÿ“Š Data Science

| Method | Emoji | Feature | Signature | Description |
|--------|-------|---------|-----------|-------------|
| **`to_polars_df`** | ๐Ÿ“ˆ | `polars` | `to_polars_df(&self)` โ†’ `Result<DataFrame, PolarsError>` | Zero-copy DataFrame |
| **`to_python_dict`** | ๐Ÿ | `python` | `to_python_dict(&self)` โ†’ `PyResult<PyObject>` | Python dict conversion |
| **`to_numpy_arrays`** | ๐Ÿ”ข | `python` | `to_numpy_arrays(&self)` โ†’ `PyResult<(PyObject, PyObject)>` | (tilt_angles, defocus) arrays |

---

### ๐Ÿ” `ZBlock` Methods

#### ๐Ÿ“– Readers

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`z`** | #๏ธโƒฃ | `z(&self)` โ†’ `usize` | Get Z value |
| **`get_raw`** | ๐Ÿ“ | `get_raw(&self, key: &str)` โ†’ `Option<&str>` | Get raw string value |
| **`get`** | ๐ŸŽฏ | `get<T: FromMdocValue>(&self, key: &str)` โ†’ `Option<T>` | Typed access |
| **`get_checked`** | โœ… | `get_checked<T: FromMdocValue>(&self, key: &str)` โ†’ `Result<T, FieldError>` | Typed with error |

#### Convenience Fields

| Method | Emoji | Return Type | Description |
|--------|-------|-------------|-------------|
| **`tilt_angle`** | ๐Ÿ“ | `Option<f32>` | `TiltAngle` field |
| **`defocus`** | ๐Ÿ”ฌ | `Option<f32>` | `Defocus` field |
| **`magnification`** | ๐Ÿ” | `Option<i32>` | `Magnification` field |
| **`subframe_path`** | ๐Ÿ“ | `Option<String>` | `SubFramePath` field |
| **`basename`** | ๐Ÿท๏ธ | `Option<String>` | Basename of `SubFramePath` |
| **`stage_position`** | ๐Ÿ“ | `Option<(f32, f32)>` | Parse `StagePosition` into XY |
| **`min_max_mean`** | ๐Ÿ“Š | `Option<(f32, f32, f32)>` | Parse `MinMaxMean` |

#### โœ๏ธ Writers

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`set`** | ๐Ÿ“ | `set(&mut self, key: K, value: V)` | Set or add field |
| **`remove`** | ๐Ÿ—‘๏ธ | `remove(&mut self, key: &str)` โ†’ `bool` | Remove single field |
| **`retain_fields`** | ๐Ÿงน | `retain_fields<F>(&mut self, f: F)` | Batch remove (efficient!) |

#### ๐Ÿ”„ Round-trip

| Method | Emoji | Signature | Description |
|--------|-------|-----------|-------------|
| **`has_raw_lines`** | ๐Ÿ“ธ | `has_raw_lines(&self)` โ†’ `bool` | Check if raw lines stored |
| **`get_raw_line`** | ๐Ÿ“ | `get_raw_line(&self, key: &str)` โ†’ `Option<&str>` | Get original line |

---

### ๐ŸŒŠ Streaming APIs

#### Visitor Pattern

```rust
pub trait MdocVisitor {
    fn header(&mut self, entry: &str);      // ๐Ÿท๏ธ Header line
    fn begin_tilt(&mut self, z: usize);     // ๐ŸŽฌ Start Z block
    fn field(&mut self, key: &str, value: &str); // ๐Ÿ“„ Field
    fn end_tilt(&mut self);                 // ๐Ÿ End Z block
}

pub fn parse_stream<R: BufRead>(
    reader: R,
    visitor: &mut dyn MdocVisitor,
) -> Result<(), ParseError>
```

#### Transform API

```rust
pub fn transform<R: BufRead, W: Write>(
    reader: R,
    writer: W,
    f: impl FnMut(&mut FieldEdit),
) -> Result<(), ParseError>

// FieldEdit has: key, value, new_value (set via .set())
```

---

## ๐ŸŽฏ Performance Tips

| Pattern | โœ… Good | โŒ Bad | Why |
|---------|---------|--------|-----|
| **Batch Removal** | `retain_fields()` | Multiple `remove()` | Single index rebuild O(n) vs O(nร—m) |
| **Streaming** | `parse_stream()` | `from_reader()` huge files | Constant memory for GB files |
| **Typed Access** | `get_checked()` | `get().unwrap()` | Proper error handling |
| **Raw Lines** | Auto-captured! ๐ŸŽ‰ | Manual `capture_raw_values()` | No-op since v0.1.0 |
| **Validation** | `validate()` before save | Assume valid | Catches duplicates early |

---

## ๐Ÿ”ฌ Cryo-EM Specific Examples

### ๐Ÿ“Š Plot Defocus vs Tilt Angle (Python)

```python
import emdoc
import matplotlib.pyplot as plt

# Load MDOC
mdoc_data = emdoc.Mdoc.from_file("tilt_series.mdoc")
tilt_angles, defocus_values = mdoc_data.to_numpy_arrays()

# ๐Ÿ“ˆ Quick plot
plt.figure(figsize=(10, 6))
plt.scatter(tilt_angles, defocus_values, alpha=0.7, s=50)
plt.xlabel("Tilt Angle (ยฐ)")
plt.ylabel("Defocus (ฮผm)")
plt.title("Defocus vs Tilt Angle")
plt.grid(True, alpha=0.3)
plt.show()
```

### ๐Ÿ”„ Streaming Filter (Rust)

```rust
use std::fs::File;
use emdoc::{parse_stream, MdocVisitor};

struct TiltAngleFilter {
    min_angle: f32,
    max_angle: f32,
}

impl MdocVisitor for TiltAngleFilter {
    fn begin_tilt(&mut self, z: usize) {
        println!("๐ŸŽฌ Processing Z = {}", z);
    }
    
    fn field(&mut self, key: &str, value: &str) {
        if key == "TiltAngle" {
            let angle: f32 = value.parse().unwrap();
            if angle < self.min_angle || angle > self.max_angle {
                println!("โš ๏ธ  Tilt angle {} out of range!", angle);
            }
        }
    }
}
```

---

## ๐Ÿงช Testing

```bash
# Run tests
cargo test

# Test Python integration
cargo test --features python

# Benchmark parsing
cargo bench --features bench
```

---

## ๐Ÿ“œ License

MIT License - see `LICENSE` file for details.

---

## ๐Ÿค Contributing

We love contributions! Please submit a pull request or open an issue on GitHub.