openproteo-core
Part of the OpenProteo stack for proteomics raw-file access.
openproteo-coreis the vendor-neutral foundation: records, theSpectrumSourcetrait, a canonical mzML 1.1.0 writer, an Arrow bridge, and the cross-vendor conformance suite. It is consumed by every reader in the stack - OpenTFRaw (Thermo), OpenTimsTDF (Bruker), OpenWRaw (Waters) - and by the openproteo-io umbrella.
Shared foundation for the open Rust mass-spec parsers (opentfraw, opentimstdf, openwraw).
Defines the vendor-neutral records, the SpectrumSource trait every
parser implements, a canonical mzML 1.1.0 writer (with optional indexed
mzML output), an optional Apache Arrow bridge, and a cross-vendor
conformance harness.
- MSRV: 1.85
- License: Apache-2.0
#![forbid(unsafe_code)]
Each vendor crate stays a complete standalone tool: a user pulls in
opentfraw alone and gets parsing and mzML export. This crate is
the shared vocabulary that keeps the three parsers in lock-step.
Install
[]
= "0.1"
# Optional: zero-copy Arrow RecordBatch builder for spectra.
= { = "0.1", = ["arrow"] }
Quick example
Implement SpectrumSource and write a valid mzML document:
use ;
let mut src = MySource ;
let mut out = create?;
write_indexed_mzml?;
# Ok::
Public API
| Symbol | Module | Purpose |
|---|---|---|
SpectrumRecord |
types |
Decoded spectrum: id, ms level, polarity, rt, peaks, precursor. |
PrecursorInfo |
types |
Selected / isolated precursor, charge, activation, scan window. |
ChromatogramRecord |
types |
TIC / BPC / SRM trace. |
RunMetadata |
types |
Run-level CV terms: instrument, source format, native id format. |
CvTerm |
types |
A PSI-MS controlled-vocabulary term. |
Polarity, Analyzer, ScanMode, MsPower, Activation |
enums |
Standard enumerations. |
MobilityArrayKind |
enums |
Per-peak inverse-mobility / drift-time array kind. |
SpectrumSource |
source |
Trait every parser implements. |
write_mzml |
mzml |
Stream a SpectrumSource to a plain mzML 1.1.0 document. |
write_indexed_mzml |
mzml |
Same, with <indexList> + SHA-1 footer for byte-offset indexing. |
conformance::assert_source_invariants |
conformance |
Check a live SpectrumSource for cross-vendor invariants. |
conformance::assert_iter_invariants |
conformance |
Same, but from any IntoIterator<Item = SpectrumRecord>. |
arrow::SpectrumBatchBuilder |
arrow (feat) |
Zero-copy builder for arrow_array::RecordBatch from a spectrum stream. |
arrow::spectrum_record_schema |
arrow (feat) |
The canonical Arrow schema. |
Error |
error |
Aggregate thiserror-based error type. |
Conformance harness
The conformance module enforces the cross-vendor invariants every parser must satisfy:
- monotonic spectrum indices,
- non-negative, non-decreasing retention times,
- equal-length m/z and intensity arrays,
- equal-length mobility arrays (when present),
- MS-level / polarity sanity,
- precursor presence on MSn spectra.
Failures surface as ConformanceError variants
(PeakArrayLengthMismatch, MobilityArrayLengthMismatch,
RetentionTimeNonMonotonic, and others).
The vendor2mzml validate subcommand in
OpenProteo runs this
harness on any vendor input or pre-existing mzML.
To assemble a local corpus for running the harness across vendors, see the shared corpus schema and fetcher described in OpenProteo/docs/CORPUS.md.
Feature flags
| Flag | Default | Effect |
|---|---|---|
arrow |
no | Enables arrow_array::RecordBatch building from spectra. |
Ecosystem
openproteo-core (this crate: types + trait + mzML writer)
^
+------------+------------+
| | |
opentfraw opentimstdf openwraw (vendor parsers)
| | |
+------------+------------+
v
openproteo-io (umbrella: detect_format, collect, to_mzml)
|
+------------+------------+
| |
vendor2mzml CLI openproteo (Python metapackage)
- Unified docs hub: https://github.com/Sigilweaver/OpenProteo
- Umbrella crate (workspace): https://github.com/Sigilweaver/OpenProteo
- Vendor parsers: opentfraw, opentimstdf, openwraw.
Roadmap
See ROADMAP.md for the multi-phase plan.
Changelog
See CHANGELOG.md.
License
Apache-2.0