tfrecord 0.8.0

Serialize and deserialize TFRecord data format from TensorFlow
Documentation
# tfrecord-rust

The crate provides the functionality to serialize and deserialize TFRecord data format from TensorFlow.

- Provide both high level `Example` type as well as low level `Vec<u8>` bytes {,de}serialization.
- Support **async/await** syntax. It's easy to work with [futures-rs]https://github.com/rust-lang/futures-rs.
- Interoperability with [serde]https://crates.io/crates/serde, [image]https://crates.io/crates/image, [ndarray]https://crates.io/crates/ndarray and [tch]https://crates.io/crates/tch.
- TensorBoard support.

## Cargo Features

**Module features**

- `full`: Enable all features.
- `async_`: Enable async/await feature.
- `dataset`: Enable the dataset API that can load records from multiple TFRecord files.
- `summary`: Enable the summary and event types and writters, mainly for TensorBoard.

**Third-party crate support features**

- `with-serde`: Enable support with [serde]https://crates.io/crates/serde crate.
- `with-image`: Enable support with [image]https://crates.io/crates/image crate.
- `with-ndarray`: Enable support with [ndarray]https://crates.io/crates/ndarray crate.
- `with-tch`: Enable support with [tch]https://crates.io/crates/tch crate.


## Documentation

See [docs.rs](https://docs.rs/tfrecord/) for the API.

## Example

### File reading example

This is a snipplet copied from [examples/tfrecord\_info.rs](examples/tfrecord_info.rs).

```rust
use tfrecord::{Error, ExampleReader, Feature, RecordReaderInit};

fn main() -> Result<(), Error> {
    // use init pattern to construct the tfrecord reader
    let reader: ExampleReader<_> = RecordReaderInit::default().open(&*INPUT_TFRECORD_PATH)?;

    // print header
    println!("example_no\tfeature_no\tname\ttype\tsize");

    // enumerate examples
    for (example_index, result) in reader.enumerate() {
        let example = result?;

        // enumerate features in an example
        for (feature_index, (name, feature)) in example.into_iter().enumerate() {
            print!("{}\t{}\t{}\t", example_index, feature_index, name);

            match feature {
                Feature::BytesList(list) => {
                    println!("bytes\t{}", list.len());
                }
                Feature::FloatList(list) => {
                    println!("float\t{}", list.len());
                }
                Feature::Int64List(list) => {
                    println!("int64\t{}", list.len());
                }
                Feature::None => {
                    println!("none");
                }
            }
        }
    }

    Ok(())
}
```

### Work with async/await syntax

The snipplet from [examples/tfrecord\_info\_async.rs](examples/tfrecord_info_async.rs) demonstrates the integration with [async-std](https://github.com/async-rs/async-std).

```rust
use futures::stream::TryStreamExt;
use std::{fs::File, io::BufWriter, path::PathBuf};
use tfrecord::{Error, Feature, RecordStreamInit};

pub async fn _main() -> Result<(), Error> {
    // use init pattern to construct the tfrecord stream
    let stream = RecordStreamInit::default()
        .examples_open(&*INPUT_TFRECORD_PATH)
        .await?;

    // print header
    println!("example_no\tfeature_no\tname\ttype\tsize");

    // enumerate examples
    stream
        .try_fold(0, |example_index, example| {
            async move {
                // enumerate features in an example
                for (feature_index, (name, feature)) in example.into_iter().enumerate() {
                    print!("{}\t{}\t{}\t", example_index, feature_index, name);

                    match feature {
                        Feature::BytesList(list) => {
                            println!("bytes\t{}", list.len());
                        }
                        Feature::FloatList(list) => {
                            println!("float\t{}", list.len());
                        }
                        Feature::Int64List(list) => {
                            println!("int64\t{}", list.len());
                        }
                        Feature::None => {
                            println!("none");
                        }
                    }
                }

                Ok(example_index + 1)
            }
        })
        .await?;

    Ok(())
}
```

### Work with TensorBoard

This is a simplified example of [examples/tensorboard.rs](examples/tensorboard.rs) that sends summary data to `log_dir` directory. After running the example, launch `tensorboard --logdir log_dir` to watch the outcome in TensorBoard.

```rust
use super::*;
use rand::seq::SliceRandom;
use rand_distr::{Distribution, Normal};
use std::{f32::consts::PI, io, thread, time::Duration};
use tfrecord::EventWriterInit;

pub fn _main() -> Result<()> {
    // show log dir
    let prefix = "log_dir/my_prefix";

    // download image files
    println!("downloading images...");
    let images = IMAGE_URLS
        .iter()
        .cloned()
        .map(|url| {
            let mut bytes = vec![];
            io::copy(&mut ureq::get(url).call().into_reader(), &mut bytes)?;
            let image = image::load_from_memory(bytes.as_ref())?;
            Ok(image)
        })
        .collect::<Result<Vec<_>>>()?;

    // init writer
    let mut writer = EventWriterInit::from_prefix(prefix, None)?;
    let mut rng = rand::thread_rng();

    // loop
    for step in 0..30 {
        println!("step: {}", step);

        // scalar
        {
            let value: f32 = (step as f32 * PI / 8.0).sin();
            writer.write_scalar("scalar", step, value)?;
        }

        // histogram
        {
            let normal = Normal::new(-20.0, 50.0).unwrap();
            let values = normal
                .sample_iter(&mut rng)
                .take(1024)
                .collect::<Vec<f32>>();
            writer.write_histogram("histogram", step, values)?;
        }

        // image
        {
            let image = images.choose(&mut rng).unwrap();
            writer.write_image("image", step, image)?;
        }

        thread::sleep(Duration::from_millis(100));
    }

    Ok(())
}

```

### More examples

To read values from event files used by TensorBoard, you can see the [event reader](examples/event_reader) example.

More examples can be found in [examples](examples) and [tests](tests) directories.

## Notice on TensorFlow Updates

The crate compiles the pre-generated ProtocolBuffer code from TensorFlow. In case of TensorFlow updates or custom patches, please run the code generation manually, see [Generate ProtocolBuffer code from TensorFlow](#generate-protocolbuffer-code-from-tensorflow) section for details.

The build script accepts several ways to access the TensorFlow source code, controlled by the `TFRECORD_BUILD_METHOD` environment variable. The generated code will be placed under `prebuild_src` directory. See the examples below to understand the usage.

### Build from a source tarball

```sh
export TFRECORD_BUILD_METHOD="src_file:///home/myname/tensorflow-2.2.0.tar.gz"
cargo build --release --features serde,generate_protobuf_src  # with serde
cargo build --release --features generate_protobuf_src        # without serde
```

### Build from a source directory

```sh
export TFRECORD_BUILD_METHOD="src_dir:///home/myname/tensorflow-2.2.0"
cargo build --release --features serde,generate_protobuf_src  # with serde
cargo build --release --features generate_protobuf_src        # without serde
```

### Build from a URL

```sh
export TFRECORD_BUILD_METHOD="url://https://github.com/tensorflow/tensorflow/archive/v2.2.0.tar.gz"
cargo build --release --features serde,generate_protobuf_src  # with serde
cargo build --release --features generate_protobuf_src        # without serde
```

### Build from installed TensorFlow on system

The build script will search `${install_prefix}/include/tensorflow` directory for protobuf documents.

```sh
export TFRECORD_BUILD_METHOD="install_prefix:///usr"
cargo build --release --features serde,generate_protobuf_src  # with serde
cargo build --release --features generate_protobuf_src        # without serde
```

## License

MIT license. See [LICENSE](LICENSE) file for full license.