Crate dsq_formats

Crate dsq_formats 

Source
Expand description

dsq-formats: File format support for dsq

This crate provides comprehensive support for reading and writing various structured data formats including CSV, Parquet, JSON, and more.

§Features

  • Format Detection: Automatic format detection from file extensions and content
  • Unified Interface: Consistent reader/writer traits across all formats
  • Performance: Optimized implementations using Polars DataFrames
  • Extensibility: Easy to add new formats with macro-based boilerplate reduction

§Supported Formats

§Input Formats

  • CSV (.csv) - Comma-separated values with customizable options
  • TSV (.tsv) - Tab-separated values
  • Parquet (.parquet) - Columnar storage with compression
  • JSON (.json) - Standard JSON arrays and objects
  • JSON Lines (.jsonl, .ndjson) - Newline-delimited JSON
  • Arrow (.arrow) - Apache Arrow IPC format
  • Avro (.avro) - Apache Avro serialization

§Output Formats

All input formats plus:

  • Excel (.xlsx) - Microsoft Excel format
  • ORC (.orc) - Optimized Row Columnar format

§Architecture

The format system is built around:

  • DataFormat - Enum representing all supported formats
  • [DataReader] / [DataWriter] - Traits for reading/writing data
  • Format-specific implementations with consistent option structs
  • Macros to reduce boilerplate for new format implementations

Re-exports§

pub use error::Error;
pub use error::FormatError;
pub use error::Result;
pub use format::detect_format_from_content;
pub use format::DataFormat;
pub use format::FormatOptions;
pub use reader::FormatReadOptions;
pub use reader::ReadOptions;
pub use writer::AvroCompression;
pub use writer::CompressionLevel;
pub use writer::CsvEncoding;
pub use writer::FormatWriteOptions;
pub use writer::OrcCompression;
pub use writer::WriteOptions;

Modules§

adt
ADT (ASCII Delimited Text) format reading and writing ADT (ASCII Delimited Text) format support
csv
CSV format reading and writing
error
Error types and result handling
format
File format detection and metadata
json
JSON format reading and writing
parquet
Parquet format reading and writing
reader
Generic data reader interface
writer
Generic data writer interface

Structs§

BuildInfo
Build information structure

Constants§

BUILD_INFO
Build information for dsq-formats
VERSION
Version information