mzdata 0.64.1

A library to read mass spectrometry data formats and a data model for mass spectra
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Instructions

- Never make commits on my behalf.
- Do not guess. Always ask for clarification.
- If something does not look correct, ask about it.

## Commands

**Test suite** (uses `cargo nextest` with the full feature set):
```
just test
```

**Run a single test** (replace `<test_name>` with the test function name):
```
cargo nextest run --lib --features nalgebra,parallelism,mzsignal,zlib-ng-compat,thermo,async,numpress,bruker_tdf,imzml <test_name>
```

**Docs** (quick, no deps):
```
just quick-docs
```

**Docs** (full, all features):
```
just docs
```

**Test coverage**:
```
just test-coverage
```

## Architecture

### Crate layout

- **`mzdata`** (root, `src/`) — the main library; all code below is relative to here
- **`crates/mzdata-spectra/`** — a thin re-export crate that exposes only the spectrum data model, without any I/O
- **`crates/pymzdata/`** — PyO3 Python bindings

### `src/io/` — format I/O

Each supported format lives in its own submodule with a `Reader` and often a `Writer` type. The concrete reader types are all parameterized as `<ReaderType<R, C, D>>` (backing `Read+Seek`, centroid peak type, deconvoluted peak type).

- `io/mzml/` — mzML and indexedmzML (default on)
- `io/mgf/` — MGF (default on)
- `io/imzml/` — imzML (feature `imzml`)
- `io/mzmlb/` — mzMLb / HDF5 (feature `mzmlb`)
- `io/thermo/` — Thermo RAW (feature `thermo`)
- `io/tdf/` — Bruker TDF (feature `bruker_tdf`)
- `io/infer_format/` — format detection, and `MZReaderType` + `MZReader` dispatch

**Format dispatch** (`io/infer_format/dispatch.rs`): `MZReaderType<R, C, D>` is a `#[non_exhaustive]` enum whose variants are each concrete reader type guarded by their feature flag. `MZReader<File>` is the canonical type alias for the common case. Open it via `MZReader::open_path(...)`, which requires importing the `MZFileReader` trait from `mzdata::prelude`.

`mz_read!` / `mz_write!` macros (in `io/shorthand.rs`) are zero-overhead alternatives that avoid the `enum` dispatch but constrain the reader to a limited scope. Usage is discouraged outside of throwaway code.

### `src/spectrum/` — spectrum data model

Three spectrum representation layers, convertible between each other:

| Type | Contents |
|---|---|
| `RawSpectrum` | `SpectrumDescription` + `BinaryArrayMap` (encoded byte arrays) |
| `CentroidSpectrumType<C>` | `SpectrumDescription` + decoded centroid m/z peak list |
| `DeconvolutedSpectrumType<C>` | `SpectrumDescription` + decoded deconvoluted neutral mass and charge peak list |
| `MultiLayerSpectrum<C, D>` | `SpectrumDescription` + optional arrays + optional centroid peaks + optional deconvoluted peaks |

`MultiLayerSpectrum` is the standard interchange type. `Spectrum` / `CentroidSpectrum` / `RawSpectrum` are type aliases with default peak types.

`BinaryArrayMap` (`spectrum/bindata/map.rs`) stores `DataArray` values keyed by `ArrayType` (m/z, intensity, charge, ion mobility, etc.) in encoded/compressed form; arrays are decoded lazily.

`SpectrumGroup<C, D, S>` pairs one optional MS1 precursor spectrum with a `Vec` of MSn product spectra. Readers expose a grouping iterator via `SpectrumSource`.

### `src/io/traits/` — reader/writer traits

- `SpectrumSource<C, D, S>` — base trait; extends `Iterator<Item=S>`, provides `get_spectrum_by_id/index/time`
- `RandomAccessSpectrumIterator` — adds `seek` and `start_from_*` methods
- `MZFileReader` — adds `open_path` (must be in scope to call `.open_path`)
- `SpectrumWriter` — base writing trait
- `MSDataFileMetadata` — file-level metadata (instrument config, data processing, file description, software list)

All traits intended for day-to-day use are re-exported from `mzdata::prelude`. Import `use mzdata::prelude::*` to get everything.

### `src/params.rs` — CV parameter system

`Param` represents a single PSI-MS ontology term with an optional value. `ParamDescribed` (from prelude) provides `.params()` / `.add_param()` etc. on any type decorated with CV terms. The `curie!(MS:XXXXXXX)` macro constructs a CV reference inline. The compressed OBO file lives at `cv/psi-ms.obo.gz`.

### `src/meta/` — file metadata

Holds `InstrumentConfiguration`, `DataProcessing`, `Software`, `FileDescription`, `Sample`, and `ScanSettings`. All readers implement `MSDataFileMetadata` to expose these.

## Key features

Feature flags control which formats and backends are compiled in. The test suite enables:
`nalgebra,parallelism,mzsignal,zlib-ng-compat,thermo,async,numpress,bruker_tdf,imzml`

Signal processing (`mzsignal` crate) requires one linear-algebra backend: `nalgebra` (pure Rust, preferred), `openblas`, `netlib`, or `intel-mkl`.

## Testing conventions

- Tests live in the same module as the type they test (standard Rust co-location with `#[cfg(test)] mod tests { ... }`).
- Test data files are in `test/data/`.
- The `imzml` module keeps its tests in a separate `src/io/imzml/tests.rs` file (declared as `mod tests;`).