Expand description
Support for the Arrow IPC Format
The Arrow IPC format defines how to read and write RecordBatch
es to/from
a file or stream of bytes. This format can be used to serialize and deserialize
data to files and over the network.
There are two variants of the IPC format:
-
IPC Streaming Format: Supports streaming data sources, implemented by StreamReader and StreamWriter
-
IPC File Format: Supports random access, implemented by FileReader and FileWriter.
Modules§
- convert
- Utilities for converting between IPC types and native Arrow types
- gen
- Generated code
- reader
- Arrow IPC File and Stream Readers
- writer
- Arrow IPC File and Stream Writers
Structs§
- Binary
- Opaque binary data
- Binary
Args - Binary
Builder - Binary
View - Logically the same as Binary, but the internal representation uses a view struct that contains the string length and either the string’s entire data inline (for small strings) or an inlined prefix, an index of another buffer, and an offset pointing to a slice in that buffer (for non-small strings).
- Binary
View Args - Binary
View Builder - Block
- Body
Compression - Optional compression for the memory buffers constituting IPC message bodies. Intended for use with RecordBatch but could be used for other message types
- Body
Compression Args - Body
Compression Builder - Body
Compression Method - Provided for forward compatibility in case we need to support different strategies for compressing the IPC message body (like whole-body compression rather than buffer-level) in the future
- Bool
- Bool
Args - Bool
Builder - Buffer
- Compression
Type - Date
- Date is either a 32-bit or 64-bit signed integer type representing an elapsed time since UNIX epoch (1970-01-01), stored in either of two units:
- Date
Args - Date
Builder - Date
Unit - Decimal
- Exact decimal value represented as an integer value in two’s complement. Currently only 128-bit (16-byte) and 256-bit (32-byte) integers are used. The representation uses the endianness indicated in the Schema.
- Decimal
Args - Decimal
Builder - Dictionary
Batch - For sending dictionary encoding information. Any Field can be dictionary-encoded, but in this case none of its children may be dictionary-encoded. There is one vector / column per dictionary, but that vector / column may be spread across multiple dictionary batches by using the isDelta flag
- Dictionary
Batch Args - Dictionary
Batch Builder - Dictionary
Encoding - Dictionary
Encoding Args - Dictionary
Encoding Builder - Dictionary
Kind
- Duration
- Duration
Args - Duration
Builder - Endianness
- Feature
- Represents Arrow Features that might not have full support within implementations. This is intended to be used in two scenarios:
- Field
- Field
Args - Field
Builder - Field
Node
- Fixed
Size Binary - Fixed
Size Binary Args - Fixed
Size Binary Builder - Fixed
Size List - Fixed
Size List Args - Fixed
Size List Builder - Floating
Point - Floating
Point Args - Floating
Point Builder - Footer
- Footer
Args - Footer
Builder - Int
- IntArgs
- IntBuilder
- Interval
- Interval
Args - Interval
Builder - Interval
Unit - KeyValue
- KeyValue
Args - KeyValue
Builder - Large
Binary - Same as Binary, but with 64-bit offsets, allowing to represent extremely large data values.
- Large
Binary Args - Large
Binary Builder - Large
List - Same as List, but with 64-bit offsets, allowing to represent extremely large data values.
- Large
List Args - Large
List Builder - Large
List View - Same as ListView, but with 64-bit offsets and sizes, allowing to represent extremely large data values.
- Large
List View Args - Large
List View Builder - Large
Utf8 - Same as Utf8, but with 64-bit offsets, allowing to represent extremely large data values.
- Large
Utf8 Args - Large
Utf8 Builder - List
- List
Args - List
Builder - List
View - Represents the same logical types that List can, but contains offsets and sizes allowing for writes in any order and sharing of child values among list values.
- List
View Args - List
View Builder - Map
- A Map is a logical nested type that is represented as
- MapArgs
- MapBuilder
- Message
- Message
Args - Message
Builder - Message
Header
- Message
Header Union Table Offset - Metadata
Version - Null
- These are stored in the flatbuffer in the Type union below
- Null
Args - Null
Builder - Precision
- Record
Batch - A data header describing the shared memory layout of a “record” or “row” batch. Some systems call this a “row batch” internally and others a “record batch”.
- Record
Batch Args - Record
Batch Builder - RunEnd
Encoded - Contains two child arrays, run_ends and values. The run_ends child array must be a 16/32/64-bit integer array which encodes the indices at which the run with the value in each corresponding index in the values child array ends. Like list/struct types, the value array can be of any type.
- RunEnd
Encoded Args - RunEnd
Encoded Builder - Schema
- Schema
Args - Schema
Builder - Sparse
Matrix Compressed Axis - Sparse
Matrix IndexCSX - Compressed Sparse format, that is matrix-specific.
- Sparse
Matrix IndexCSX Args - Sparse
Matrix IndexCSX Builder - Sparse
Tensor - Sparse
Tensor Args - Sparse
Tensor Builder - Sparse
Tensor Index - Sparse
Tensor IndexCOO
- Sparse
Tensor IndexCOO Args - Sparse
Tensor IndexCOO Builder - Sparse
Tensor IndexCSF - Compressed Sparse Fiber (CSF) sparse tensor index.
- Sparse
Tensor IndexCSF Args - Sparse
Tensor IndexCSF Builder - Sparse
Tensor Index Union Table Offset - Struct_
- A Struct_ in the flatbuffer metadata is the same as an Arrow Struct (according to the physical memory layout). We used Struct_ here as Struct is a reserved word in Flatbuffers
- Struct_
Args - Struct_
Builder - Tensor
- Tensor
Args - Tensor
Builder - Tensor
Dim
- Tensor
DimArgs - Tensor
DimBuilder - Time
- Time is either a 32-bit or 64-bit signed integer type representing an elapsed time since midnight, stored in either of four units: seconds, milliseconds, microseconds or nanoseconds.
- Time
Args - Time
Builder - Time
Unit - Timestamp
- Timestamp is a 64-bit signed integer representing an elapsed time since a fixed epoch, stored in either of four units: seconds, milliseconds, microseconds or nanoseconds, and is optionally annotated with a timezone.
- Timestamp
Args - Timestamp
Builder - Type
- Type
Union Table Offset - Union
- A union is a complex type with children in Field
By default ids in the type vector refer to the offsets in the children
optionally typeIds provides an indirection between the child offset and the type id
for each child
typeIds[offset]
is the id used in the type vector - Union
Args - Union
Builder - Union
Mode - Utf8
- Unicode with UTF-8 encoding
- Utf8
Args - Utf8
Builder - Utf8
View - Logically the same as Utf8, but the internal representation uses a view struct that contains the string length and either the string’s entire data inline (for small strings) or an inlined prefix, an index of another buffer, and an offset pointing to a slice in that buffer (for non-small strings).
- Utf8
View Args - Utf8
View Builder
Enums§
- Binary
Offset - Binary
View Offset - Body
Compression Offset - Bool
Offset - Date
Offset - Decimal
Offset - Dictionary
Batch Offset - Dictionary
Encoding Offset - Duration
Offset - Field
Offset - Fixed
Size Binary Offset - Fixed
Size List Offset - Floating
Point Offset - Footer
Offset - IntOffset
- Interval
Offset - KeyValue
Offset - Large
Binary Offset - Large
List Offset - Large
List View Offset - Large
Utf8 Offset - List
Offset - List
View Offset - MapOffset
- Message
Offset - Null
Offset - Record
Batch Offset - RunEnd
Encoded Offset - Schema
Offset - Sparse
Matrix IndexCSX Offset - Sparse
Tensor IndexCOO Offset - Sparse
Tensor IndexCSF Offset - Sparse
Tensor Offset - Struct_
Offset - Tensor
DimOffset - Tensor
Offset - Time
Offset - Timestamp
Offset - Union
Offset - Utf8
Offset - Utf8
View Offset
Constants§
- ENUM_
MAX_ BODY_ COMPRESSION_ METHOD Deprecated - ENUM_
MAX_ COMPRESSION_ TYPE Deprecated - ENUM_
MAX_ DATE_ UNIT Deprecated - ENUM_
MAX_ DICTIONARY_ KIND Deprecated - ENUM_
MAX_ ENDIANNESS Deprecated - ENUM_
MAX_ FEATURE Deprecated - ENUM_
MAX_ INTERVAL_ UNIT Deprecated - ENUM_
MAX_ MESSAGE_ HEADER Deprecated - ENUM_
MAX_ METADATA_ VERSION Deprecated - ENUM_
MAX_ PRECISION Deprecated - ENUM_
MAX_ SPARSE_ MATRIX_ COMPRESSED_ AXIS Deprecated - ENUM_
MAX_ SPARSE_ TENSOR_ INDEX Deprecated - ENUM_
MAX_ TIME_ UNIT Deprecated - ENUM_
MAX_ TYPE Deprecated - ENUM_
MAX_ UNION_ MODE Deprecated - ENUM_
MIN_ BODY_ COMPRESSION_ METHOD Deprecated - ENUM_
MIN_ COMPRESSION_ TYPE Deprecated - ENUM_
MIN_ DATE_ UNIT Deprecated - ENUM_
MIN_ DICTIONARY_ KIND Deprecated - ENUM_
MIN_ ENDIANNESS Deprecated - ENUM_
MIN_ FEATURE Deprecated - ENUM_
MIN_ INTERVAL_ UNIT Deprecated - ENUM_
MIN_ MESSAGE_ HEADER Deprecated - ENUM_
MIN_ METADATA_ VERSION Deprecated - ENUM_
MIN_ PRECISION Deprecated - ENUM_
MIN_ SPARSE_ MATRIX_ COMPRESSED_ AXIS Deprecated - ENUM_
MIN_ SPARSE_ TENSOR_ INDEX Deprecated - ENUM_
MIN_ TIME_ UNIT Deprecated - ENUM_
MIN_ TYPE Deprecated - ENUM_
MIN_ UNION_ MODE Deprecated - ENUM_
VALUES_ BODY_ COMPRESSION_ METHOD Deprecated - ENUM_
VALUES_ COMPRESSION_ TYPE Deprecated - ENUM_
VALUES_ DATE_ UNIT Deprecated - ENUM_
VALUES_ DICTIONARY_ KIND Deprecated - ENUM_
VALUES_ ENDIANNESS Deprecated - ENUM_
VALUES_ FEATURE Deprecated - ENUM_
VALUES_ INTERVAL_ UNIT Deprecated - ENUM_
VALUES_ MESSAGE_ HEADER Deprecated - ENUM_
VALUES_ METADATA_ VERSION Deprecated - ENUM_
VALUES_ PRECISION Deprecated - ENUM_
VALUES_ SPARSE_ MATRIX_ COMPRESSED_ AXIS Deprecated - ENUM_
VALUES_ SPARSE_ TENSOR_ INDEX Deprecated - ENUM_
VALUES_ TIME_ UNIT Deprecated - ENUM_
VALUES_ TYPE Deprecated - ENUM_
VALUES_ UNION_ MODE Deprecated
Functions§
- finish_
footer_ buffer - finish_
message_ buffer - finish_
schema_ buffer - finish_
size_ prefixed_ footer_ buffer - finish_
size_ prefixed_ message_ buffer - finish_
size_ prefixed_ schema_ buffer - finish_
size_ prefixed_ sparse_ tensor_ buffer - finish_
size_ prefixed_ tensor_ buffer - finish_
sparse_ tensor_ buffer - finish_
tensor_ buffer - root_
as_ footer - Verifies that a buffer of bytes contains a
Footer
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_footer_unchecked
. - root_
as_ ⚠footer_ unchecked - Assumes, without verification, that a buffer of bytes contains a Footer and returns it.
- root_
as_ footer_ with_ opts - Verifies, with the given options, that a buffer of bytes
contains a
Footer
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_footer_unchecked
. - root_
as_ message - Verifies that a buffer of bytes contains a
Message
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_message_unchecked
. - root_
as_ ⚠message_ unchecked - Assumes, without verification, that a buffer of bytes contains a Message and returns it.
- root_
as_ message_ with_ opts - Verifies, with the given options, that a buffer of bytes
contains a
Message
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_message_unchecked
. - root_
as_ schema - Verifies that a buffer of bytes contains a
Schema
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_schema_unchecked
. - root_
as_ ⚠schema_ unchecked - Assumes, without verification, that a buffer of bytes contains a Schema and returns it.
- root_
as_ schema_ with_ opts - Verifies, with the given options, that a buffer of bytes
contains a
Schema
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_schema_unchecked
. - root_
as_ sparse_ tensor - Verifies that a buffer of bytes contains a
SparseTensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_sparse_tensor_unchecked
. - root_
as_ ⚠sparse_ tensor_ unchecked - Assumes, without verification, that a buffer of bytes contains a SparseTensor and returns it.
- root_
as_ sparse_ tensor_ with_ opts - Verifies, with the given options, that a buffer of bytes
contains a
SparseTensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_sparse_tensor_unchecked
. - root_
as_ tensor - Verifies that a buffer of bytes contains a
Tensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_tensor_unchecked
. - root_
as_ ⚠tensor_ unchecked - Assumes, without verification, that a buffer of bytes contains a Tensor and returns it.
- root_
as_ tensor_ with_ opts - Verifies, with the given options, that a buffer of bytes
contains a
Tensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_tensor_unchecked
. - size_
prefixed_ root_ as_ footer - Verifies that a buffer of bytes contains a size prefixed
Footer
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior usesize_prefixed_root_as_footer_unchecked
. - size_
prefixed_ ⚠root_ as_ footer_ unchecked - Assumes, without verification, that a buffer of bytes contains a size prefixed Footer and returns it.
- size_
prefixed_ root_ as_ footer_ with_ opts - Verifies, with the given verifier options, that a buffer of
bytes contains a size prefixed
Footer
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_footer_unchecked
. - size_
prefixed_ root_ as_ message - Verifies that a buffer of bytes contains a size prefixed
Message
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior usesize_prefixed_root_as_message_unchecked
. - size_
prefixed_ ⚠root_ as_ message_ unchecked - Assumes, without verification, that a buffer of bytes contains a size prefixed Message and returns it.
- size_
prefixed_ root_ as_ message_ with_ opts - Verifies, with the given verifier options, that a buffer of
bytes contains a size prefixed
Message
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_message_unchecked
. - size_
prefixed_ root_ as_ schema - Verifies that a buffer of bytes contains a size prefixed
Schema
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior usesize_prefixed_root_as_schema_unchecked
. - size_
prefixed_ ⚠root_ as_ schema_ unchecked - Assumes, without verification, that a buffer of bytes contains a size prefixed Schema and returns it.
- size_
prefixed_ root_ as_ schema_ with_ opts - Verifies, with the given verifier options, that a buffer of
bytes contains a size prefixed
Schema
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_schema_unchecked
. - size_
prefixed_ root_ as_ sparse_ tensor - Verifies that a buffer of bytes contains a size prefixed
SparseTensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior usesize_prefixed_root_as_sparse_tensor_unchecked
. - size_
prefixed_ ⚠root_ as_ sparse_ tensor_ unchecked - Assumes, without verification, that a buffer of bytes contains a size prefixed SparseTensor and returns it.
- size_
prefixed_ root_ as_ sparse_ tensor_ with_ opts - Verifies, with the given verifier options, that a buffer of
bytes contains a size prefixed
SparseTensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_sparse_tensor_unchecked
. - size_
prefixed_ root_ as_ tensor - Verifies that a buffer of bytes contains a size prefixed
Tensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior usesize_prefixed_root_as_tensor_unchecked
. - size_
prefixed_ ⚠root_ as_ tensor_ unchecked - Assumes, without verification, that a buffer of bytes contains a size prefixed Tensor and returns it.
- size_
prefixed_ root_ as_ tensor_ with_ opts - Verifies, with the given verifier options, that a buffer of
bytes contains a size prefixed
Tensor
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_tensor_unchecked
.