Expand description
Serialize and deserialize the NumPy’s *.npy binary format.
§Overview
NPY is a simple binary data format. It stores the type, shape and endianness information in a header, which is followed by a flat binary data field. This crate offers a simple, mostly type-safe way to read and write *.npy files. Files are handled using iterators, so they don’t need to fit in memory.
§Optional cargo features
No features are enabled by default. Here is the list of existing features:
- There are a couple of features which enable support for serialization/deserialization of foreign
types. These require opt-in because they can be stability hazards; a major version bump of
npyz
may introduce a major version bump of one of these crates. (NOTE: to ease this issue somewhat,npyz
will re-export the versions of the crates it uses)"complex"
enables the use ofnum_complex::Complex
."half"
enables the use ofhalf::f16
."arrayvec"
enables the use ofarrayvec::ArrayVec
andarrayvec::ArrayString
as alternatives toVec
andString
for some string types.
"derive"
enables derives of traits for working with structured arrays."npz"
enables adapters for working with NPZ files (including scipy sparse matrices), adding a public dependency on thezip
crate. This requires opt-in becausezip
has a fair number of transitive dependencies. (note that some npz-related helper functions are available even without the feature)
§Reading
Let’s create a simple *.npy file in Python:
import numpy as np
a = np.array([1, 3.5, -6, 2.3])
np.save('test-data/plain.npy', a)
Now, we can load it in Rust using NpyFile
:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let bytes = std::fs::read("test-data/plain.npy")?;
// Note: In addition to byte slices, this accepts any io::Read
let npy = npyz::NpyFile::new(&bytes[..])?;
for number in npy.data::<f64>()? {
let number = number?;
eprintln!("{}", number);
}
Ok(())
}
And we can see our data:
1
3.5
-6
2.3
§Inspecting properties of the array
NpyFile
provides methods that let you inspect the array.
fn main() -> std::io::Result<()> {
let bytes = std::fs::read("test-data/c-order.npy")?;
let data = npyz::NpyFile::new(&bytes[..])?;
assert_eq!(data.shape(), &[2, 3, 4]);
assert_eq!(data.order(), npyz::Order::C);
assert_eq!(data.strides(), &[12, 4, 1]);
// convenience method for reading to vec
println!("{:?}", data.into_vec::<f64>());
Ok(())
}
§Writing
The primary interface for writing npy files is the WriterBuilder
trait.
use npyz::WriterBuilder;
fn main() -> std::io::Result<()> {
// Any io::Write is supported. For this example we'll
// use Vec<u8> to serialize in-memory.
let mut out_buf = vec![];
let mut writer = {
npyz::WriteOptions::new()
.default_dtype()
.shape(&[2, 3])
.writer(&mut out_buf)
.begin_nd()?
};
writer.push(&100)?;
writer.push(&101)?;
writer.push(&102)?;
// you can also write multiple items at once
writer.extend(vec![200, 201, 202])?;
writer.finish()?;
eprintln!("{:02x?}", out_buf);
Ok(())
}
§Supported dtypes
A complete description of the supported numpy dtypes and the corresponding rust types
can be found on the crate::type_matchup_docs
module.
§Working with ndarray
Using the ndarray
crate? No problem!
At the time, no conversion API is provided by npyz
, but one can easily be written:
// Example of parsing to an array with fixed NDIM.
fn to_array_3<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::Array3<T> {
use ndarray::ShapeBuilder;
let shape = match shape[..] {
[i1, i2, i3] => [i1 as usize, i2 as usize, i3 as usize],
_ => panic!("expected 3D array"),
};
let true_shape = shape.set_f(order == npyz::Order::Fortran);
ndarray::Array3::from_shape_vec(true_shape, data)
.unwrap_or_else(|e| panic!("shape error: {}", e))
}
// Example of parsing to an array with dynamic NDIM.
fn to_array_d<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::ArrayD<T> {
use ndarray::ShapeBuilder;
let shape = shape.into_iter().map(|x| x as usize).collect::<Vec<_>>();
let true_shape = shape.set_f(order == npyz::Order::Fortran);
ndarray::ArrayD::from_shape_vec(true_shape, data)
.unwrap_or_else(|e| panic!("shape error: {}", e))
}
pub fn main() -> std::io::Result<()> {
let bytes = std::fs::read("test-data/c-order.npy")?;
let reader = npyz::NpyFile::new(&bytes[..])?;
let shape = reader.shape().to_vec();
let order = reader.order();
let data = reader.into_vec::<i64>()?;
println!("{:?}", to_array_3(data.clone(), shape.clone(), order));
println!("{:?}", to_array_d(data.clone(), shape.clone(), order));
Ok(())
}
Likewise, here is a function that can be used to write an ndarray:
use std::io;
use std::fs::File;
use ndarray::Array;
use npyz::WriterBuilder;
// Example of writing an array with unknown shape. The output is always C-order.
fn write_array<T, S, D>(writer: impl io::Write, array: &ndarray::ArrayBase<S, D>) -> io::Result<()>
where
T: Clone + npyz::AutoSerialize,
S: ndarray::Data<Elem=T>,
D: ndarray::Dimension,
{
let shape = array.shape().iter().map(|&x| x as u64).collect::<Vec<_>>();
let c_order_items = array.iter();
let mut writer = npyz::WriteOptions::new().default_dtype().shape(&shape).writer(writer).begin_nd()?;
writer.extend(c_order_items)?;
writer.finish()
}
pub fn main() -> io::Result<()> {
let array = Array::from_shape_fn((6, 7, 8), |(i, j, k)| 100*i as i32 + 10*j as i32 + k as i32);
// even weirdly-ordered axes and non-contiguous arrays are fine
let view = array.view(); // shape (6, 7, 8), C-order
let view = view.reversed_axes(); // shape (8, 7, 6), fortran order
let view = view.slice(ndarray::s![.., .., ..;2]); // shape (8, 7, 3), non-contiguous
assert_eq!(view.shape(), &[8, 7, 3]);
let mut file = io::BufWriter::new(File::create("examples/output/ndarray.npy")?);
write_array(&mut file, &view)
}
§Structured arrays
npyz
supports structured arrays! Consider the following structured array created in Python:
import numpy as np
a = np.array([(1,2.5,4), (2,3.1,5)], dtype=[('a', 'i4'),('b', 'f4'),('c', 'i8')])
np.save('test-data/simple.npy', a)
To load this in Rust, we need to create a corresponding struct. There are three derivable traits we can define for it:
Deserialize
— Enables easy reading of.npy
files.AutoSerialize
— Enables easy writing of.npy
files. (in a default format)Serialize
— Supertrait ofAutoSerialize
that allows one to specify a customDType
.
Enable the "derive"
feature in Cargo.toml
,
and make sure the field names and types all match up:
// make sure to add `features = ["derive"]` in Cargo.toml!
#[derive(npyz::Deserialize, Debug)]
struct Struct {
a: i32,
b: f32,
c: i64,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let bytes = std::fs::read("test-data/structured.npy")?;
let npy = npyz::NpyFile::new(&bytes[..])?;
for row in npy.data::<Struct>()? {
let row = row?;
eprintln!("{:?}", row);
}
Ok(())
}
The output is:
Array { a: 1, b: 2.5, c: 4 }
Array { a: 2, b: 3.1, c: 5 }
§.npz
files
- To work with
.npz
files in general, see thenpz
module. - To work with
scipy.sparse
matrices see thesparse
module.
Re-exports§
pub use num_complex;
pub use arrayvec;
pub use half;
Modules§
- npz
- Utilities for working with
npz
files. - sparse
- Tools for reading and writing Scipy sparse matrices in NPZ format.
- type_
matchup_ docs - DType to/from rust type documentation.
- write_
options - Types and traits related to the implementation of
WriteOptions
.
Structs§
- DType
Error - Indicates that a particular rust type does not support serialization or deserialization
as a given
DType
. - Field
- A field of a structured array dtype
- Fixed
Size Bytes - Wrapper around
[u8; N]
that can serialize as|VN
. The size must match exactly. - NpyData
Deprecated - Legacy type for reading
npy
files. - NpyFile
- Object for reading an
npy
file. - NpyHeader
- Represents the parsed header portion of an
npy
file. - NpyReader
- Iterator returned by
NpyFile::data
which reads elements of type T from the data portion of an NPY file. - NpyWriter
- Interface for writing an NPY file to a data stream.
- Parse
Type StrError - Error type returned by
<TypeStr as FromStr>::parse
. - TypeStr
- Represents an Array Interface type-string.
- Write
Options - Represents an almost-empty configuration for an
NpyWriter
.
Enums§
- DType
- Representation of a Numpy type
- Endianness
- Represents the first character in a
TypeStr
, which describes endianness. - Order
- Order of axes in a file.
- Time
Units - Represents the units of the
m
andM
datatypes in aTypeStr
. - Type
Char - Represents the second character in a
TypeStr
.
Traits§
- Auto
Serialize - Subtrait of
Serialize
for types which have a reasonable defaultDType
. - Deserialize
- Trait that permits reading a type from an
.npy
file. - Serialize
- Trait that permits writing a type to an
.npy
file. - Type
Read - Like some sort of
for<R: io::Read> Fn(R) -> io::Result<T>
. - Type
Read Dyn - The proper trait to use for trait objects of
TypeRead
. - Type
Write - Like some sort of
for<W: io::Write> Fn(W, &T) -> io::Result<()>
. - Type
Write Dyn - The proper trait to use for trait objects of
TypeWrite
. - Writer
Builder - Trait that provides methods on
WriteOptions
.
Functions§
- to_file
Deprecated - Serialize an iterator over a struct to a NPY file.
- to_
file_ 1d - Serialize an iterator over a struct to a NPY file.