zTensor File Format
Version 0.1.0
zTensor is a binary format for storing large multi-dimensional arrays (tensors), designed for efficient, safe, and flexible access. It supports raw and compressed (zstd) encodings, quantized and sparse layouts, and is extensible.
For dense tensors stored with encoding: "raw" and matching endianness, zTensor enables zero-copy access to tensor data.
File Layout
+-------------------------------+
| Magic Number (8 bytes) |
+-------------------------------+
| Tensor Blob 0 (aligned) |
+-------------------------------+
| Padding (if needed) |
+-------------------------------+
| Tensor Blob 1 (aligned) |
+-------------------------------+
| ... |
+-------------------------------+
| CBOR Metadata Array |
+-------------------------------+
| CBOR Array Size (8 bytes) |
+-------------------------------+
- Magic Number: ASCII "ZTEN0001" at offset 0.
- Tensor Blobs: Each tensor's data, starting at a 64-byte aligned offset. No per-blob headers. Padding (undefined value, usually zero) is inserted as needed.
- CBOR Metadata Array: At the end of the file, a CBOR-encoded array of metadata maps (one per tensor).
- CBOR Array Size: Last 8 bytes, little-endian uint64, gives the size of the CBOR array.
Tensor Metadata (CBOR Map)
Each tensor's metadata is a CBOR map with these required fields:
name(string): Tensor name.offset(uint64): Absolute file offset to tensor data (multiple of 64).size(uint64): On-disk size in bytes (compressed if applicable).dtype(string): Data type (see below).shape(array): Array of dimensions (empty for scalar).encoding(string): Data encoding (see below).layout(string): "dense" (default) or "sparse". For sparse, see below.
Optional fields:
data_endianness(string): "little" or "big" (default: little, for raw multi-byte types).checksum(string): e.g., "crc32c:0x1234ABCD" or "sha256:...".- Custom fields are allowed; unknown keys are ignored by readers.
Supported Data Types (dtype)
float64,float32,float16,bfloat16int64,int32,int16,int8uint64,uint32,uint16,uint8bool
Supported Encodings (encoding)
raw: Direct binary dump of tensor elements (withdata_endiannessif multi-byte).zstd: Zstandard-compressed data.sizeis compressed size.
Layouts
dense(default): Standard contiguous tensor data.sparse: Data is stored in a sparse format. The metadata map must include asparse_formatfield (e.g., "csr", "coo") and any additional fields required to describe the sparse structure (such as index arrays, indptr, etc.).
Index Reading
To read the index:
- Read the last 8 bytes for the CBOR array size.
- Seek backwards by that amount to read the CBOR metadata array.
As a result, the start offset of the metadata is: (file size) - (size of the metadata) - (8 byte).
Zero-Tensor Files
A valid zTensor file may contain zero tensors:
- 8 bytes magic, 1 byte empty CBOR array (
0x80), 8 bytes size (0x01...00). - Total: 17 bytes.
Extensibility
- New
dtype,encoding, andlayoutvalues may be added in future versions. - Custom metadata fields are allowed; unknown fields are ignored.
Notes
- All offsets are absolute and account for the 8-byte magic number.
- All tensor data blobs are 64-byte aligned.
- No per-blob headers; all metadata is in the CBOR array at the end of the file.
- For
encoding: "raw"and multi-bytedtype, data is little-endian unlessdata_endiannessis specified. - For sparse tensors, the metadata must fully describe the sparse structure.