Skip to main content

Module parquet

Module parquet 

Source
Expand description

Apache Parquet columnar writer for packet export.

Uses the parquet crate’s column-writer API to produce typed, columnar Parquet files without requiring the Arrow library.

§Schema

message schema {
  REQUIRED INT64  timestamp_ns;
  OPTIONAL BINARY src_ip      (STRING);
  OPTIONAL BINARY dst_ip      (STRING);
  OPTIONAL INT32  src_port;
  OPTIONAL INT32  dst_port;
  OPTIONAL INT32  protocol;
  OPTIONAL INT64  flow_id;
  REQUIRED INT32  caplen;
  REQUIRED INT32  origlen;
  OPTIONAL INT32  tcp_flags;
  OPTIONAL BINARY payload;
}

Row groups contain up to [BATCH_SIZE] rows. When compress_payload is true ZSTD compression is applied to all columns (Parquet has no per-field compression; ZSTD on a binary column containing payloads yields the largest savings).

Structs§

ParquetSink
Streaming Parquet writer that implements PacketSink.

Functions§

column_repetitions
Check which physical type the schema assigns to each column index. Used only for sanity testing; not called in production.