1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
#![warn(missing_docs)]

/*!
Serialize and deserialize the NumPy's
[*.npy binary format](https://docs.scipy.org/doc/numpy-dev/neps/npy-format.html).

# Overview

[**NPY**](https://docs.scipy.org/doc/numpy-dev/neps/npy-format.html) is a simple binary data format.
It stores the type, shape and endianness information in a header,
which is followed by a flat binary data field. This crate offers a simple, mostly type-safe way to
read and write *.npy files. Files are handled using iterators, so they don't need to fit in memory.

## Optional cargo features

No features are enabled by default.  Here is the list of existing features:

* **`"complex"`** enables the use of [`num_complex::Complex`].
  This requires opt-in because it is a stability hazard; `num_complex` sometimes
  undergoes major semver version bumps and it is your responsibility to make sure
  that your code and `npyz` are using the same version.
* **`"derive"`** enables derives of traits for working with structured arrays.
  This will add a build-time dependency on common proc macro utilities (`syn`, `quote`).
* **`"npz"`** enables adapters for working with NPZ files
  (including scipy sparse matrices),
  adding a public dependency on the `zip` crate.
  This requires opt-in because `zip` has a fair number of transitive dependencies.
  (note that some npz-related helper functions are available even without the feature)

## Reading

Let's create a simple *.npy file in Python:

```python
import numpy as np
a = np.array([1, 3.5, -6, 2.3])
np.save('test-data/plain.npy', a)
```

Now, we can load it in Rust using [`NpyFile`]:

```rust
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = std::fs::read("test-data/plain.npy")?;

    // Note: In addition to byte slices, this accepts any io::Read
    let npy = npyz::NpyFile::new(&bytes[..])?;
    for number in npy.data::<f64>()? {
        let number = number?;
        eprintln!("{}", number);
    }
    Ok(())
}
```

And we can see our data:

```text
1
3.5
-6
2.3
```

### Inspecting properties of the array

[`NpyFile`] provides methods that let you inspect the array.

```rust
fn main() -> std::io::Result<()> {
    let bytes = std::fs::read("test-data/c-order.npy")?;

    let data = npyz::NpyFile::new(&bytes[..])?;
    assert_eq!(data.shape(), &[2, 3, 4]);
    assert_eq!(data.order(), npyz::Order::C);
    assert_eq!(data.strides(), &[12, 4, 1]);

    // convenience method for reading to vec
    println!("{:?}", data.into_vec::<f64>());
    Ok(())
}
```

## Writing

The primary interface for writing npy files is the [`WriterBuilder`] trait.

```rust
use npyz::WriterBuilder;

fn main() -> std::io::Result<()> {
    // Any io::Write is supported.  For this example we'll
    // use Vec<u8> to serialize in-memory.
    let mut out_buf = vec![];
    let mut writer = {
        npyz::WriteOptions::new()
            .default_dtype()
            .shape(&[2, 3])
            .writer(&mut out_buf)
            .begin_nd()?
    };

    writer.push(&100)?;
    writer.push(&101)?;
    writer.push(&102)?;
    // you can also write multiple items at once
    writer.extend(vec![200, 201, 202])?;
    writer.finish()?;

    eprintln!("{:02x?}", out_buf);
    Ok(())
}
```

## Working with `ndarray`

Using the [`ndarray`](https://docs.rs/ndarray) crate?  No problem!
At the time, no conversion API is provided by `npyz`, but one can easily be written:

```rust
// Example of parsing to an array with fixed NDIM.
fn to_array_3<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::Array3<T> {
    use ndarray::ShapeBuilder;

    let shape = match shape[..] {
        [i1, i2, i3] => [i1 as usize, i2 as usize, i3 as usize],
        _  => panic!("expected 3D array"),
    };
    let true_shape = shape.set_f(order == npyz::Order::Fortran);

    ndarray::Array3::from_shape_vec(true_shape, data)
        .unwrap_or_else(|e| panic!("shape error: {}", e))
}

// Example of parsing to an array with dynamic NDIM.
fn to_array_d<T>(data: Vec<T>, shape: Vec<u64>, order: npyz::Order) -> ndarray::ArrayD<T> {
    use ndarray::ShapeBuilder;

    let shape = shape.into_iter().map(|x| x as usize).collect::<Vec<_>>();
    let true_shape = shape.set_f(order == npyz::Order::Fortran);

    ndarray::ArrayD::from_shape_vec(true_shape, data)
        .unwrap_or_else(|e| panic!("shape error: {}", e))
}

pub fn main() -> std::io::Result<()> {
    let bytes = std::fs::read("test-data/c-order.npy")?;
    let reader = npyz::NpyFile::new(&bytes[..])?;
    let shape = reader.shape().to_vec();
    let order = reader.order();
    let data = reader.into_vec::<i64>()?;

    println!("{:?}", to_array_3(data.clone(), shape.clone(), order));
    println!("{:?}", to_array_d(data.clone(), shape.clone(), order));
    Ok(())
}
```

Likewise, here is a function that can be used to write an ndarray:

```rust
use std::io;
use std::fs::File;

use ndarray::Array;
use npyz::WriterBuilder;

// Example of writing an array with unknown shape.  The output is always C-order.
fn write_array<T, S, D>(writer: impl io::Write, array: &ndarray::ArrayBase<S, D>) -> io::Result<()>
where
    T: Clone + npyz::AutoSerialize,
    S: ndarray::Data<Elem=T>,
    D: ndarray::Dimension,
{
    let shape = array.shape().iter().map(|&x| x as u64).collect::<Vec<_>>();
    let c_order_items = array.iter();

    let mut writer = npyz::WriteOptions::new().default_dtype().shape(&shape).writer(writer).begin_nd()?;
    writer.extend(c_order_items)?;
    writer.finish()
}

pub fn main() -> io::Result<()> {
    let array = Array::from_shape_fn((6, 7, 8), |(i, j, k)| 100*i as i32 + 10*j as i32 + k as i32);
    // even weirdly-ordered axes and non-contiguous arrays are fine
    let view = array.view(); // shape (6, 7, 8), C-order
    let view = view.reversed_axes(); // shape (8, 7, 6), fortran order
    let view = view.slice(ndarray::s![.., .., ..;2]); // shape (8, 7, 3), non-contiguous
    assert_eq!(view.shape(), &[8, 7, 3]);

    let mut file = io::BufWriter::new(File::create("examples/output/ndarray.npy")?);
    write_array(&mut file, &view)
}
```

## Structured arrays

`npyz` supports structured arrays!  Consider the following structured array created in Python:

```python
import numpy as np
a = np.array([(1,2.5,4), (2,3.1,5)], dtype=[('a', 'i4'),('b', 'f4'),('c', 'i8')])
np.save('test-data/simple.npy', a)
```

To load this in Rust, we need to create a corresponding struct.
There are three derivable traits we can define for it:

* [`Deserialize`] — Enables easy reading of `.npy` files.
* [`AutoSerialize`] — Enables easy writing of `.npy` files. (in a default format)
* [`Serialize`] — Supertrait of `AutoSerialize` that allows one to specify a custom [`DType`].

**Enable the `"derive"` feature in `Cargo.toml`,**
and make sure the field names and types all match up:
*/

// It is not currently possible in Cargo.toml to specify that an optional dependency should
// also be a dev-dependency.  Therefore, we discretely remove this example when generating
// doctests, so that:
//    - It always appears in documentation (`cargo doc`)
//    - It is only tested when the feature is present (`cargo test --features derive`)
#![cfg_attr(any(not(doctest), feature="derive"), doc = r##"
```
// make sure to add `features = ["derive"]` in Cargo.toml!
#[derive(npyz::Deserialize, Debug)]
struct Struct {
    a: i32,
    b: f32,
    c: i64,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = std::fs::read("test-data/structured.npy")?;

    let npy = npyz::NpyFile::new(&bytes[..])?;
    for row in npy.data::<Struct>()? {
        let row = row?;
        eprintln!("{:?}", row);
    }
    Ok(())
}
```
"##)]
/*!
The output is:

```text
Array { a: 1, b: 2.5, c: 4 }
Array { a: 2, b: 3.1, c: 5 }
```

## `.npz` files

* To work with `.npz` files in general, see the [`npz` module][`npz`].
* To work with `scipy.sparse` matrices see the [`sparse` module][`sparse`].

*/

// Reexport the macros.
#[cfg(feature = "derive")] pub use npyz_derive::*;

mod header;
mod read;
mod write;
mod type_str;
mod serialize;
#[cfg(feature = "npz")]
mod npz_feature;

pub mod npz;
#[cfg(feature = "npz")]
pub mod sparse;

// Expose public dependencies
#[cfg(feature = "num-complex")]
pub use num_complex;
#[cfg(feature = "zip")]
pub use zip;

pub use header::{DType, Field};
#[allow(deprecated)]
pub use read::{NpyData, NpyFile, NpyReader, Order};
#[allow(deprecated)]
pub use write::{to_file, to_file_1d, OutFile, NpyWriter, write_options, WriteOptions, WriterBuilder};
pub use serialize::{Serialize, Deserialize, AutoSerialize};
pub use serialize::{TypeRead, TypeWrite, TypeWriteDyn, TypeReadDyn, DTypeError};
pub use type_str::{TypeStr, ParseTypeStrError};