Expand description
mzdata provides basic access to raw and processed mass spectrometry data formats in
Rust.
For a guide, see the tutorial section.
The library currently supports reading:
- MGF files using
MGFReaderinmzdata::io::mgf - mzML & indexedmzML files using
MzMLReaderinmzdata::io::mzml - mzMLb files using
MzMLbReaderinmzdata::io::mzmlb, if themzmlbfeature is enabled - Thermo RAW files using
ThermoRawReaderinmzdata::io::thermo, if thethermofeature is enabled - Bruker TDF files using
TDFSpectrumReaderinmzdata::io::tdf, if thebruker_tdffeature is enabled
and writing:
- MGF files using
MGFWriterinmzdata::io::mgf - mzML & indexedmzML files using
MzMLWriterinmzdata::io::mzml - mzMLb files using
MzMLbWriterinmzdata::io::mzmlb, if themzmlbfeature is enabled
This menagerie of different formats and gzip compression or not can be inferred from a path or io::Read using io::infer_format and io::infer_from_stream.
Conventional dispatch is possible through MZReader. The mz_read macro provides a convenient means of working with
a value with zero added overhead, but with a limited scope. The mz_write macro is the equivalent for opening a writer.
There are additional tools for dealing with file format dispatch in MassSpectrometryReadWriteProcess.
It also includes a set of representation layers for spectra in mzdata::spectrum
§Example
use std::fs;
use mzdata::prelude::*;
use mzpeaks::Tolerance;
use mzdata::MZReader;
use mzdata::spectrum::SignalContinuity;
let reader = MZReader::open_path("./test/data/small.mzML").unwrap();
for spectrum in reader {
println!("Scan {} => BP {}", spectrum.id(), spectrum.peaks().base_peak().mz);
if spectrum.signal_continuity() == SignalContinuity::Centroid {
let peak_picked = spectrum.into_centroid().unwrap();
println!("Matches for 579.155: {:?}",
peak_picked.peaks.all_peaks_for(
579.155, Tolerance::Da(0.02)
)
);
}
}It uses mzpeaks to represent peaks and peak lists, and re-exports the basic types. While the high-level
types are templated on simple peak types, more complex, application-specific peak types can be substituted.
See mzdata::spectrum::bindata for more information about how to directly convert
data arrays to peak lists.
§Traits
The library makes heavy use of traits to abstract over the implementation details of different file formats.
These traits are included in mzdata::prelude. It also imports mzpeaks::prelude.
§Features
mzdata provides many optional features, some of which are self-contained, while others layer funcitonality.
§mzsignal-related
TLDR: Unless you are already using ndarray-linalg in your dependency graph, you should enable mzsignal + nalgebra.
The mzsignal crate provides signal processing, peak picking, and feature finding funcitonality. Part of this behavior requires a linear algebra implementation. mzsignal is flexible. It can use either nalgebra, a pure Rust library that is self-contained but optimized for small matrices, or ndarray-linalg which requires an external LAPACK library be available either at build time or run time, all of which are outside the basic Rust ecosystem. Enabling the mzsignal feature requires one of the following features:
nalgebra- No external dependencies.openblas- Requires OpenBlas (see https://crates.io/crates/ndarray-linalg)intel-mkl- Requires Intel’s Math Kernel Library (see https://crates.io/crates/ndarray-linalg)netlib- Requires the NETLIB (see https://crates.io/crates/ndarray-linalg)
§File Formats
mzdata supports reading several file formats, some of which add large dependencies and can be opted into or out of.
| Feature | File Format | Dependency |
|---|---|---|
mzmlb | mzMLb | HDF5 C shared library at runtime or statically linked with hdf5-rs, possibly a C compiler |
thermo | Thermo-Fisher RAW Format | .NET runtime at build time and runtime, possibly a C compiler |
bruker_tdf | Bruker TDF Format | SQLite3 C library at runtime or statically linked with rusqlite, requires mzsignal for flattening spectra |
Additionally, mzML and MGF are supported by default, but they can be disabled by skipping default features and not enabling the mzml and mgf features.
To complicate matters the hdf5_static feature combined with mzmlb handles statically linking the HDF5 C library and zlib together to avoid symbol collision with other compression libraries used by mzdata.
§Compression
mzdata uses flate2 to compress and decompress zlib-type compressed streams, but there are three different backends available with different tradeoffs in speed and build convenience:
zlib- The historical implementation. Faster thanminiz_oxideand consistently produces the best compression. Requires a nearly ubiquitous C library at build time.zlib-ng-compat- The fastest, often nearly best if not best compression and decompression. Requires a C library or a C compiler at build time.zlib-ng- C library dependency, I encountered build errors but your mileage may vary. Requires a C library or a C compiler at build time.miniz_oxide- Pure Rust backend, the slowest in practice.
mzdata was also a test-bed for some experimental compression techniques.
zstd- Enables layered Zstandard and byte shuffling + dictionary encoding methods.
§Async I/O
mzdata uses synchronous I/O by default, but includes code for some async options:
async_partial- Implements trait-level asynchronous versions of the spectrum reading traits and implementations for mzML, MGF, and Thermo RAW files usingtokio, but doesn’t enable thetokio/fsmodule which carries additional requirements which is not compatible with all platforms.async- Enablesasync_partialandtokio/fs.
§PROXI
mzdata includes PROXI clients for fetching spectra from supporting servers on the internet using USIs.
proxi- Provides a synchronous client inmzdata::io::proxiand addsmzdata::io::usi::USI::download_spectrum_blockingasync-proxi- Provides an asynchronous client inmzdata::io::proxiand addsmzdata::io::usi::USI::download_spectrum_async
§Other
serde- Enablesserdeserialization and deserialization for most library types that aren’t directly connected to an I/O device.parallelism- Enablesrayonparallel iterators on a small number of internal operations to speed up some operations relating to decompression signal processing. This is unlikely to be notice-able in most cases. More benefit is had by simply processing multiple spectra in parallel usingrayon’s bridging adapters.
Re-exports§
pub use crate::io::MZReader;pub use crate::io::MZReaderBuilder;pub use crate::io::mgf::MGFReader;pub use crate::io::mgf::MGFWriter;pub use crate::io::mzml::MzMLReader;pub use crate::io::mzml::MzMLWriter;pub use crate::io::mzmlb::MzMLbReader;pub use crate::io::mzmlb::MzMLbWriter;pub use crate::io::mzmlb::MzMLbWriterBuilder;pub use crate::params::Param;pub use crate::params::ParamList;pub use crate::spectrum::CentroidSpectrum;pub use crate::spectrum::RawSpectrum;pub use crate::spectrum::Spectrum;pub use mzpeaks;pub use mzsignal;
Modules§
- io
- Reading and writing mass spectrometry data file formats and abstractions over them.
- meta
- Metadata describing mass spectrometry data files and their contents.
- params
- Elements of controlled vocabularies used to describe mass spectra and their components.
- prelude
- A set of foundational traits used throughout the library.
- spectrum
- The data structures and components that represent a mass spectrum and how to access their data.
- tutorial
- A series of written introductions to specific topics in
mzdata - utils
Macros§
- curie
- cvmap
- delegate_
impl_ metadata_ trait - Delegates the implementation of
MSDataFileMetadatato a member. Passing an extra levelextendedtoken implements the optional methods. - find_
param_ method - impl_
metadata_ trait - Assumes a field for the non-
Optionfacets of theMSDataFileMetadataimplementation are present. Passing an extra levelextendedtoken implements the optional methods. - impl_
param_ described - Implement the
ParamDescribedtrait for type$t, referencing aparamsmember of typeVec<Param>. - impl_
param_ described_ deferred - Implement the
ParamDescribedtrait for type$t, referencing aparamsmember that is anOption<Vec<Param>>that will lazily be initialized automatically when it is accessed mutably. - mz_read
- A macro that dynamically works out how to get a
SpectrumSource-derived object from a path orio::Read+io::Seekboxed object. This is meant to be a convenience for working with a scoped file reader without penalty. - mz_
write - A macro that dynamically works out how to get a
SpectrumWriterfrom a path orio::Writeboxed object.