Expand description
Serialize and deserialize the NumPy’s *.npy binary format.
Overview
NPY is a simple binary data format. It stores the type, shape and endianness information in a header, which is followed by a flat binary data field. This crate offers a simple, mostly type-safe way to read and write *.npy files. Files are handled using iterators, so they don’t need to fit in memory.
Optional cargo features
No features are enabled by default. Here is the list of existing features:
- There are a couple of features which enable support for serialization/deserialization of foreign
types. These require opt-in because they can be stability hazards; a major version bump of
npyzmay introduce a major version bump of one of these crates. (NOTE: to ease this issue somewhat,npyzwill re-export the versions of the crates it uses)"complex"enables the use ofnum_complex::Complex."arrayvec"enables the use ofarrayvec::ArrayVecandarrayvec::ArrayStringas alternatives toVecandStringfor some string types.
"derive"enables derives of traits for working with structured arrays."npz"enables adapters for working with NPZ files (including scipy sparse matrices), adding a public dependency on thezipcrate. This requires opt-in becauseziphas a fair number of transitive dependencies. (note that some npz-related helper functions are available even without the feature)
Reading
Let’s create a simple *.npy file in Python:
import numpy as np
a = np.array([1, 3.5, -6, 2.3])
np.save('test-data/plain.npy', a)
Now, we can load it in Rust using NpyFile:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let bytes = std::fs::read("test-data/plain.npy")?;
// Note: In addition to byte slices, this accepts any io::Read
let npy = npyz::NpyFile::new(&bytes[..])?;
for number in npy.data::<f64>()? {
let number = number?;
eprintln!("{}", number);
}
Ok(())
}And we can see our data:
1
3.5
-6
2.3
Inspecting properties of the array
NpyFile provides methods that let you inspect the array.
fn main() -> std::io::Result<()> {
let bytes = std::fs::read("test-data/c-order.npy")?;
let data = npyz::NpyFile::new(&bytes[..])?;
assert_eq!(data.shape(), &[2, 3, 4]);
assert_eq!(data.order(), npyz::Order::C);
assert_eq!(data.strides(), &[12, 4, 1]);
// convenience method for reading to vec
println!("{:?}", data.into_vec::<f64>());
Ok(())
}Writing
The primary interface for writing npy files is the WriterBuilder trait.
use npyz::WriterBuilder;
fn main() -> std::io::Result<()> {
// Any io::Write is supported. For this example we'll
// use Vec<u8> to serialize in-memory.
let mut out_buf = vec![];
let mut writer = {
npyz::WriteOptions::new()
.default_dtype()
.shape(&[2, 3])
.writer(&mut out_buf)
.begin_nd()?
};
writer.push(&100)?;
writer.push(&101)?;
writer.push(&102)?;
// you can also write multiple items at once
writer.extend(vec![200, 201, 202])?;
writer.finish()?;
eprintln!("{:02x?}", out_buf);
Ok(())
}Supported dtypes
A complete description of the supported numpy dtypes and the corresponding rust types
can be found on the crate::type_matchup_docs module.
Working with ndarray
Using the ndarray crate? No problem!
At the time, no conversion API is provided by npyz, but one can easily be written:
// Example of parsing to an array with fixed NDIM.
fn to_array_3<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::Array3<T> {
use ndarray::ShapeBuilder;
let shape = match shape[..] {
[i1, i2, i3] => [i1 as usize, i2 as usize, i3 as usize],
_ => panic!("expected 3D array"),
};
let true_shape = shape.set_f(order == npyz::Order::Fortran);
ndarray::Array3::from_shape_vec(true_shape, data)
.unwrap_or_else(|e| panic!("shape error: {}", e))
}
// Example of parsing to an array with dynamic NDIM.
fn to_array_d<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::ArrayD<T> {
use ndarray::ShapeBuilder;
let shape = shape.into_iter().map(|x| x as usize).collect::<Vec<_>>();
let true_shape = shape.set_f(order == npyz::Order::Fortran);
ndarray::ArrayD::from_shape_vec(true_shape, data)
.unwrap_or_else(|e| panic!("shape error: {}", e))
}
pub fn main() -> std::io::Result<()> {
let bytes = std::fs::read("test-data/c-order.npy")?;
let reader = npyz::NpyFile::new(&bytes[..])?;
let shape = reader.shape().to_vec();
let order = reader.order();
let data = reader.into_vec::<i64>()?;
println!("{:?}", to_array_3(data.clone(), shape.clone(), order));
println!("{:?}", to_array_d(data.clone(), shape.clone(), order));
Ok(())
}Likewise, here is a function that can be used to write an ndarray:
use std::io;
use std::fs::File;
use ndarray::Array;
use npyz::WriterBuilder;
// Example of writing an array with unknown shape. The output is always C-order.
fn write_array<T, S, D>(writer: impl io::Write, array: &ndarray::ArrayBase<S, D>) -> io::Result<()>
where
T: Clone + npyz::AutoSerialize,
S: ndarray::Data<Elem=T>,
D: ndarray::Dimension,
{
let shape = array.shape().iter().map(|&x| x as u64).collect::<Vec<_>>();
let c_order_items = array.iter();
let mut writer = npyz::WriteOptions::new().default_dtype().shape(&shape).writer(writer).begin_nd()?;
writer.extend(c_order_items)?;
writer.finish()
}
pub fn main() -> io::Result<()> {
let array = Array::from_shape_fn((6, 7, 8), |(i, j, k)| 100*i as i32 + 10*j as i32 + k as i32);
// even weirdly-ordered axes and non-contiguous arrays are fine
let view = array.view(); // shape (6, 7, 8), C-order
let view = view.reversed_axes(); // shape (8, 7, 6), fortran order
let view = view.slice(ndarray::s![.., .., ..;2]); // shape (8, 7, 3), non-contiguous
assert_eq!(view.shape(), &[8, 7, 3]);
let mut file = io::BufWriter::new(File::create("examples/output/ndarray.npy")?);
write_array(&mut file, &view)
}Structured arrays
npyz supports structured arrays! Consider the following structured array created in Python:
import numpy as np
a = np.array([(1,2.5,4), (2,3.1,5)], dtype=[('a', 'i4'),('b', 'f4'),('c', 'i8')])
np.save('test-data/simple.npy', a)
To load this in Rust, we need to create a corresponding struct. There are three derivable traits we can define for it:
Deserialize— Enables easy reading of.npyfiles.AutoSerialize— Enables easy writing of.npyfiles. (in a default format)Serialize— Supertrait ofAutoSerializethat allows one to specify a customDType.
Enable the "derive" feature in Cargo.toml,
and make sure the field names and types all match up:
// make sure to add `features = ["derive"]` in Cargo.toml!
#[derive(npyz::Deserialize, Debug)]
struct Struct {
a: i32,
b: f32,
c: i64,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let bytes = std::fs::read("test-data/structured.npy")?;
let npy = npyz::NpyFile::new(&bytes[..])?;
for row in npy.data::<Struct>()? {
let row = row?;
eprintln!("{:?}", row);
}
Ok(())
}The output is:
Array { a: 1, b: 2.5, c: 4 }
Array { a: 2, b: 3.1, c: 5 }
.npz files
- To work with
.npzfiles in general, see thenpzmodule. - To work with
scipy.sparsematrices see thesparsemodule.
Re-exports
pub use num_complex;pub use arrayvec;Modules
npz files.WriteOptions.Structs
DType.[u8; N] that can serialize as |VN. The size must match exactly.npy files.npy file.NpyFile::data which reads elements of type T from the
data portion of an NPY file.<TypeStr as FromStr>::parse.NpyWriter.Enums
TypeStr, which describes endianness.Traits
.npy file..npy file.for<R: io::Read> Fn(R) -> io::Result<T>.TypeRead.for<W: io::Write> Fn(W, &T) -> io::Result<()>.TypeWrite.WriteOptions.