Crate npyz[][src]

Expand description

Serialize and deserialize the NumPy’s *.npy binary format.

Overview

NPY is a simple binary data format. It stores the type, shape and endianness information in a header, which is followed by a flat binary data field. This crate offers a simple, mostly type-safe way to read and write *.npy files. Files are handled using iterators, so they don’t need to fit in memory.

Optional cargo features

No features are enabled by default. Here is the list of existing features:

  • "complex" enables the use of num_complex::Complex. This requires opt-in because it is a stability hazard; num_complex sometimes undergoes major semver version bumps and it is your responsibility to make sure that your code and npyz are using the same version.
  • "derive" enables derives of traits for working with structured arrays. This will add a build-time dependency on common proc macro utilities (syn, quote).
  • "npz" enables adapters for working with NPZ files (including scipy sparse matrices), adding a public dependency on the zip crate. This requires opt-in because zip has a fair number of transitive dependencies. (note that some npz-related helper functions are available even without the feature)

Reading

Let’s create a simple *.npy file in Python:

import numpy as np
a = np.array([1, 3.5, -6, 2.3])
np.save('test-data/plain.npy', a)

Now, we can load it in Rust using NpyFile:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = std::fs::read("test-data/plain.npy")?;

    // Note: In addition to byte slices, this accepts any io::Read
    let npy = npyz::NpyFile::new(&bytes[..])?;
    for number in npy.data::<f64>()? {
        let number = number?;
        eprintln!("{}", number);
    }
    Ok(())
}

And we can see our data:

1
3.5
-6
2.3

Inspecting properties of the array

NpyFile provides methods that let you inspect the array.

fn main() -> std::io::Result<()> {
    let bytes = std::fs::read("test-data/c-order.npy")?;

    let data = npyz::NpyFile::new(&bytes[..])?;
    assert_eq!(data.shape(), &[2, 3, 4]);
    assert_eq!(data.order(), npyz::Order::C);
    assert_eq!(data.strides(), &[12, 4, 1]);

    // convenience method for reading to vec
    println!("{:?}", data.into_vec::<f64>());
    Ok(())
}

Writing

The primary interface for writing npy files is the WriterBuilder trait.

use npyz::WriterBuilder;

fn main() -> std::io::Result<()> {
    // Any io::Write is supported.  For this example we'll
    // use Vec<u8> to serialize in-memory.
    let mut out_buf = vec![];
    let mut writer = {
        npyz::WriteOptions::new()
            .default_dtype()
            .shape(&[2, 3])
            .writer(&mut out_buf)
            .begin_nd()?
    };

    writer.push(&100)?;
    writer.push(&101)?;
    writer.push(&102)?;
    // you can also write multiple items at once
    writer.extend(vec![200, 201, 202])?;
    writer.finish()?;

    eprintln!("{:02x?}", out_buf);
    Ok(())
}

Working with ndarray

Using the ndarray crate? No problem! At the time, no conversion API is provided by npyz, but one can easily be written:

// Example of parsing to an array with fixed NDIM.
fn to_array_3<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::Array3<T> {
    use ndarray::ShapeBuilder;

    let shape = match shape[..] {
        [i1, i2, i3] => [i1 as usize, i2 as usize, i3 as usize],
        _  => panic!("expected 3D array"),
    };
    let true_shape = shape.set_f(order == npyz::Order::Fortran);

    ndarray::Array3::from_shape_vec(true_shape, data)
        .unwrap_or_else(|e| panic!("shape error: {}", e))
}

// Example of parsing to an array with dynamic NDIM.
fn to_array_d<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::ArrayD<T> {
    use ndarray::ShapeBuilder;

    let shape = shape.into_iter().map(|x| x as usize).collect::<Vec<_>>();
    let true_shape = shape.set_f(order == npyz::Order::Fortran);

    ndarray::ArrayD::from_shape_vec(true_shape, data)
        .unwrap_or_else(|e| panic!("shape error: {}", e))
}

pub fn main() -> std::io::Result<()> {
    let bytes = std::fs::read("test-data/c-order.npy")?;
    let reader = npyz::NpyFile::new(&bytes[..])?;
    let shape = reader.shape().to_vec();
    let order = reader.order();
    let data = reader.into_vec::<i64>()?;

    println!("{:?}", to_array_3(data.clone(), shape.clone(), order));
    println!("{:?}", to_array_d(data.clone(), shape.clone(), order));
    Ok(())
}

Likewise, here is a function that can be used to write an ndarray:

use std::io;
use std::fs::File;

use ndarray::Array;
use npyz::WriterBuilder;

// Example of writing an array with unknown shape.  The output is always C-order.
fn write_array<T, S, D>(writer: impl io::Write, array: &ndarray::ArrayBase<S, D>) -> io::Result<()>
where
    T: Clone + npyz::AutoSerialize,
    S: ndarray::Data<Elem=T>,
    D: ndarray::Dimension,
{
    let shape = array.shape().iter().map(|&x| x as u64).collect::<Vec<_>>();
    let c_order_items = array.iter();

    let mut writer = npyz::WriteOptions::new().default_dtype().shape(&shape).writer(writer).begin_nd()?;
    writer.extend(c_order_items)?;
    writer.finish()
}

pub fn main() -> io::Result<()> {
    let array = Array::from_shape_fn((6, 7, 8), |(i, j, k)| 100*i as i32 + 10*j as i32 + k as i32);
    // even weirdly-ordered axes and non-contiguous arrays are fine
    let view = array.view(); // shape (6, 7, 8), C-order
    let view = view.reversed_axes(); // shape (8, 7, 6), fortran order
    let view = view.slice(ndarray::s![.., .., ..;2]); // shape (8, 7, 3), non-contiguous
    assert_eq!(view.shape(), &[8, 7, 3]);

    let mut file = io::BufWriter::new(File::create("examples/output/ndarray.npy")?);
    write_array(&mut file, &view)
}

Structured arrays

npyz supports structured arrays! Consider the following structured array created in Python:

import numpy as np
a = np.array([(1,2.5,4), (2,3.1,5)], dtype=[('a', 'i4'),('b', 'f4'),('c', 'i8')])
np.save('test-data/simple.npy', a)

To load this in Rust, we need to create a corresponding struct. There are three derivable traits we can define for it:

  • Deserialize — Enables easy reading of .npy files.
  • AutoSerialize — Enables easy writing of .npy files. (in a default format)
  • Serialize — Supertrait of AutoSerialize that allows one to specify a custom DType.

Enable the "derive" feature in Cargo.toml, and make sure the field names and types all match up:

// make sure to add `features = ["derive"]` in Cargo.toml!
#[derive(npyz::Deserialize, Debug)]
struct Struct {
    a: i32,
    b: f32,
    c: i64,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = std::fs::read("test-data/structured.npy")?;

    let npy = npyz::NpyFile::new(&bytes[..])?;
    for row in npy.data::<Struct>()? {
        let row = row?;
        eprintln!("{:?}", row);
    }
    Ok(())
}

The output is:

Array { a: 1, b: 2.5, c: 4 }
Array { a: 2, b: 3.1, c: 5 }

.npz files

  • To work with .npz files in general, see the npz module.
  • To work with scipy.sparse matrices see the sparse module.

Re-exports

pub use num_complex;
pub use zip;

Modules

Utilities for working with npz files.

Tools for reading and writing Scipy sparse matrices in NPZ format.

Types and traits related to the implementation of WriteOptions.

Structs

Indicates that a particular rust type does not support serialization or deserialization as a given DType.

A field of a structured array dtype

NpyDataDeprecated

Legacy type for reading npy files.

Object for reading an npy file.

Iterator returned by NpyFile::data which reads elements of type T from the data portion of an NPY file.

Interface for writing an NPY file to a data stream.

Error type returned by <TypeStr as FromStr>::parse.

Represents an Array Interface type-string.

Represents an almost-empty configuration for an NpyWriter.

Enums

Representation of a Numpy type

Order of axes in a file.

Traits

Subtrait of Serialize for types which have a reasonable default DType.

Trait that permits reading a type from an .npy file.

Trait that permits writing a type to an .npy file.

Like some sort of for<R: io::Read> Fn(R) -> io::Result<T>.

The proper trait to use for trait objects of TypeRead.

Like some sort of for<W: io::Write> Fn(W, &T) -> io::Result<()>.

The proper trait to use for trait objects of TypeWrite.

Trait that provides methods on WriteOptions.

Functions

to_fileDeprecated

Serialize an iterator over a struct to a NPY file.

Serialize an iterator over a struct to a NPY file.

Type Definitions

OutFileDeprecated

NpyWriter that writes an entire file.

Derive Macros