# ๐ฆ `emdoc`
**emdoc** โ A fast, high-performance serialEM MDOC parser and writer for cryo-EM **โก**
- serialem + mdoc = emdoc
- for **rust** & **python** users
[](https://www.rust-lang.org/) [](https://crates.io/crates/mdocfile) [](./LICENSE)
---
## โจ Features
| โก **Streaming Parse** | Process gigabyte-scale MDOC files with `BufRead` |
| ๐ **Lossless Round-trip** | Preserve every character & comment |
| ๐ **Order Preservation** | Field order guaranteed |
| ๐ง **No Schema Lock-in** | Works with any MDOC variant |
| ๐ฏ **Typed Access** | `get::<f32>()` or `get_checked::<f32>()` for safety |
| ๐ **Python Integration** | `to_python_dict()`, `to_numpy_arrays()` |
| ๐ **Polars Support** | Zero-copy DataFrame conversion |
| ๐ ๏ธ **Normalization APIs** | Clean up inconsistent formatting |
| ๐ **Zero-copy Streaming** | Visitor pattern for huge files |
---
## ๐ฅ Installation
### Cargo.toml
```toml
[dependencies]
mdoc = { version = "0.1.0", features = ["serde"] }
# Optional features
# mdoc = { version = "0.1.0", features = ["serde", "python", "polars"] }
```
### Python
```bash
# Coming soon! For now, build with maturin
maturin develop --features python
```
---
## ๐ Quick Start
### 1๏ธโฃ Parse a File
```rust
use emdoc::Mdoc;
// Load entire MDOC into memory
let mdoc = Mdoc::from_file("data.mdoc")?;
println!("๐ Found {} tilt images", mdoc.tilt_series().len());
// Access header
for entry in mdoc.header() {
println!("๐ท๏ธ {:?}", entry);
}
```
### 2๏ธโฃ Read Typed Fields
```rust
let z0 = mdoc.tilt_image(0).unwrap();
println!("๐ฌ Tilt Angle: {:?}", z0.tilt_angle());
println!("๐ฏ Defocus: {:?}", z0.defocus());
println!("๐ Magnification: {:?}", z0.magnification());
// Type-safe access with error handling
let angle: f32 = z0.get_checked("TiltAngle")?;
```
### 3๏ธโฃ Modify & Save
```rust
let mut mdoc = Mdoc::from_file("input.mdoc")?;
// Update a field
mdoc.update_field(0, "Defocus", "-3.5");
// Add new tilt image
let new_image = mdoc.add_tilt_image(42);
new_image.set("TiltAngle", "45.0");
new_image.set("Magnification", "50000");
// Lossless write (preserves original formatting)
mdoc.write_lossless(std::fs::File::create("output.mdoc")?)?;
```
---
## ๐ API Reference
### ๐๏ธ Core Types
| **`Mdoc`** | Root MDOC container | `from_file()`, `write()`, `validate()` |
| **`ZBlock`** | Tilt image metadata block | `get()`, `set()`, `tilt_angle()` |
| **`HeaderEntry`** | Header line enum | `Comment`, `KeyValue`, `Unknown` |
| **`ParseError`** | Parse failures | `InvalidBlock`, `InvalidZValue` |
| **`FieldError`** | Field access errors | `Missing`, `InvalidType` |
---
### ๐ง `Mdoc` Methods
#### ๐ Constructors
| **`from_reader`** | ๐ | `from_reader(R: BufRead)` โ `Result<Mdoc, ParseError>` | Stream parse from any `BufRead` |
| **`from_file`** | ๐พ | `from_file<P: AsRef<Path>>(path: P)` โ `Result<Mdoc, ParseError>` | Parse from file path |
#### โ๏ธ Writers
| **`write`** | โ๏ธ | `write<W: Write>(&self, w: W)` โ `io::Result<()>` | Write normalized format |
| **`write_lossless`** | ๐ | `write_lossless<W: Write>(&self, w: W)` โ `io::Result<()>` | Preserve original formatting! |
#### ๐ Accessors
| **`header`** | ๐ท๏ธ | `header(&self)` โ `&[HeaderEntry]` | Get header entries |
| **`tilt_series`** | ๐ | `tilt_series(&self)` โ `&[ZBlock]` | Get all Z blocks |
| **`tilt_image`** | ๐ฏ | `tilt_image(&self, z: usize)` โ `Option<&ZBlock>` | Get specific Z block |
| **`tilt_image_mut`** | ๐ ๏ธ | `tilt_image_mut(&self, z: usize)` โ `Option<&mut ZBlock>` | Get mutable Z block |
#### ๐ ๏ธ Mutators
| **`add_tilt_image`** | โ | `add_tilt_image(&mut self, z: usize)` โ `&mut ZBlock` | Add or replace Z block |
| **`remove_tilt_image`** | ๐๏ธ | `remove_tilt_image(&mut self, z: usize)` โ `bool` | Remove Z block |
| **`update_field`** | ๐ | `update_field(&mut self, z: usize, key: K, value: V)` โ `bool` | Set single field |
#### โ
Validation & Normalization
| **`validate`** | โ
| `validate(&self)` โ `Result<(), Vec<ValidationError>>` | Check for duplicate Z values |
| **`normalize_spaces`** | ๐งน | `normalize_spaces(&mut self)` | Collapse multiple spaces |
| **`normalize_format`** | ๐จ | `normalize_format(&mut self)` | Standardize `Key = value;` format |
| **`capture_raw_values`** | ๐ธ | `capture_raw_values(&mut self)` | **Deprecated** - now auto-captured |
#### ๐ Serialization
| **`to_json`** | ๐ค | `serde` | `to_json(&self)` โ `Result<String, serde_json::Error>` | Pretty JSON export |
| **`from_json`** | ๐ฅ | `serde` | `from_json(json: &str)` โ `Result<Mdoc, serde_json::Error>` | JSON import |
#### ๐ Data Science
| **`to_polars_df`** | ๐ | `polars` | `to_polars_df(&self)` โ `Result<DataFrame, PolarsError>` | Zero-copy DataFrame |
| **`to_python_dict`** | ๐ | `python` | `to_python_dict(&self)` โ `PyResult<PyObject>` | Python dict conversion |
| **`to_numpy_arrays`** | ๐ข | `python` | `to_numpy_arrays(&self)` โ `PyResult<(PyObject, PyObject)>` | (tilt_angles, defocus) arrays |
---
### ๐ `ZBlock` Methods
#### ๐ Readers
| **`z`** | #๏ธโฃ | `z(&self)` โ `usize` | Get Z value |
| **`get_raw`** | ๐ | `get_raw(&self, key: &str)` โ `Option<&str>` | Get raw string value |
| **`get`** | ๐ฏ | `get<T: FromMdocValue>(&self, key: &str)` โ `Option<T>` | Typed access |
| **`get_checked`** | โ
| `get_checked<T: FromMdocValue>(&self, key: &str)` โ `Result<T, FieldError>` | Typed with error |
#### Convenience Fields
| **`tilt_angle`** | ๐ | `Option<f32>` | `TiltAngle` field |
| **`defocus`** | ๐ฌ | `Option<f32>` | `Defocus` field |
| **`magnification`** | ๐ | `Option<i32>` | `Magnification` field |
| **`subframe_path`** | ๐ | `Option<String>` | `SubFramePath` field |
| **`basename`** | ๐ท๏ธ | `Option<String>` | Basename of `SubFramePath` |
| **`stage_position`** | ๐ | `Option<(f32, f32)>` | Parse `StagePosition` into XY |
| **`min_max_mean`** | ๐ | `Option<(f32, f32, f32)>` | Parse `MinMaxMean` |
#### โ๏ธ Writers
| **`set`** | ๐ | `set(&mut self, key: K, value: V)` | Set or add field |
| **`remove`** | ๐๏ธ | `remove(&mut self, key: &str)` โ `bool` | Remove single field |
| **`retain_fields`** | ๐งน | `retain_fields<F>(&mut self, f: F)` | Batch remove (efficient!) |
#### ๐ Round-trip
| **`has_raw_lines`** | ๐ธ | `has_raw_lines(&self)` โ `bool` | Check if raw lines stored |
| **`get_raw_line`** | ๐ | `get_raw_line(&self, key: &str)` โ `Option<&str>` | Get original line |
---
### ๐ Streaming APIs
#### Visitor Pattern
```rust
pub trait MdocVisitor {
fn header(&mut self, entry: &str); // ๐ท๏ธ Header line
fn begin_tilt(&mut self, z: usize); // ๐ฌ Start Z block
fn field(&mut self, key: &str, value: &str); // ๐ Field
fn end_tilt(&mut self); // ๐ End Z block
}
pub fn parse_stream<R: BufRead>(
reader: R,
visitor: &mut dyn MdocVisitor,
) -> Result<(), ParseError>
```
#### Transform API
```rust
pub fn transform<R: BufRead, W: Write>(
reader: R,
writer: W,
f: impl FnMut(&mut FieldEdit),
) -> Result<(), ParseError>
// FieldEdit has: key, value, new_value (set via .set())
```
---
## ๐ฏ Performance Tips
| **Batch Removal** | `retain_fields()` | Multiple `remove()` | Single index rebuild O(n) vs O(nรm) |
| **Streaming** | `parse_stream()` | `from_reader()` huge files | Constant memory for GB files |
| **Typed Access** | `get_checked()` | `get().unwrap()` | Proper error handling |
| **Raw Lines** | Auto-captured! ๐ | Manual `capture_raw_values()` | No-op since v0.1.0 |
| **Validation** | `validate()` before save | Assume valid | Catches duplicates early |
---
## ๐ฌ Cryo-EM Specific Examples
### ๐ Plot Defocus vs Tilt Angle (Python)
```python
import emdoc
import matplotlib.pyplot as plt
# Load MDOC
mdoc_data = emdoc.Mdoc.from_file("tilt_series.mdoc")
tilt_angles, defocus_values = mdoc_data.to_numpy_arrays()
# ๐ Quick plot
plt.figure(figsize=(10, 6))
plt.scatter(tilt_angles, defocus_values, alpha=0.7, s=50)
plt.xlabel("Tilt Angle (ยฐ)")
plt.ylabel("Defocus (ฮผm)")
plt.title("Defocus vs Tilt Angle")
plt.grid(True, alpha=0.3)
plt.show()
```
### ๐ Streaming Filter (Rust)
```rust
use std::fs::File;
use emdoc::{parse_stream, MdocVisitor};
struct TiltAngleFilter {
min_angle: f32,
max_angle: f32,
}
impl MdocVisitor for TiltAngleFilter {
fn begin_tilt(&mut self, z: usize) {
println!("๐ฌ Processing Z = {}", z);
}
fn field(&mut self, key: &str, value: &str) {
if key == "TiltAngle" {
let angle: f32 = value.parse().unwrap();
if angle < self.min_angle || angle > self.max_angle {
println!("โ ๏ธ Tilt angle {} out of range!", angle);
}
}
}
}
```
---
## ๐งช Testing
```bash
# Run tests
cargo test
# Test Python integration
cargo test --features python
# Benchmark parsing
cargo bench --features bench
```
---
## ๐ License
MIT License - see `LICENSE` file for details.
---
## ๐ค Contributing
We love contributions! Please submit a pull request or open an issue on GitHub.