docs.rs failed to build dsq-formats-0.2.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
dsq-formats-0.1.0
dsq-formats
File format support for DSQ - handles reading and writing various data formats.
Overview
dsq-formats provides comprehensive support for reading and writing multiple structured data formats. It serves as the I/O layer for DSQ, converting between different file formats and DSQ's internal data representations.
Features
- Multiple formats: CSV, JSON, JSON Lines, Parquet, Avro, Arrow IPC
- Format detection: Automatic format detection based on file content
- Streaming support: Efficient processing of large files
- Schema inference: Automatic schema detection for structured data
- Flexible options: Configurable parsing and writing options
- Error handling: Detailed error messages for format issues
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Enable specific formats:
[]
= { = "0.1", = ["csv", "json", "parquet"] }
Usage
Reading CSV Files
use read_csv_file;
Writing JSON
use write_json_file;
use *;
Reading Parquet
use read_parquet_file;
Format Detection
use detect_format;
Custom Options
use ;
Supported Formats
CSV (Comma-Separated Values)
- Read: Yes
- Write: Yes
- Features: Custom delimiters, headers, quotes, null values
- Streaming: Yes
JSON
- Read: Yes (standard JSON and JSON Lines)
- Write: Yes
- Features: Pretty printing, compact format
- Streaming: Yes (JSON Lines)
JSON5
- Read: Yes
- Write: No
- Features: Comments, trailing commas, unquoted keys
- Streaming: No
Parquet
- Read: Yes
- Write: Yes
- Features: Compression, column pruning, predicate pushdown
- Streaming: Yes (with chunking)
Avro
- Read: Yes
- Write: Yes
- Features: Schema evolution, compression
- Streaming: Yes
Arrow IPC
- Read: Yes
- Write: Yes
- Features: Zero-copy reads, compression
- Streaming: Yes
Format Detection
The library can automatically detect file formats based on:
- File extension
- Magic bytes (file signature)
- Content analysis
use detect_format;
let format = detect_format?;
Configuration Options
Each format supports various configuration options:
CSV Options
delimiter: Field separator characterhas_header: Whether first row contains headersquote_char: Character for quoting fieldsnull_values: List of strings to interpret as NULLskip_rows: Number of rows to skipencoding: Character encoding
JSON Options
pretty: Pretty-print outputindent: Indentation levelnull_handling: How to handle null values
Parquet Options
compression: Compression algorithm (snappy, gzip, lz4, zstd)row_group_size: Rows per row groupstatistics: Whether to compute column statistics
API Documentation
For detailed API documentation, see docs.rs/dsq-formats.
Performance
Format readers and writers are optimized for:
- Large file handling with streaming
- Memory-efficient processing
- Parallel parsing where applicable
- Zero-copy operations for compatible formats
Contributing
Contributions are welcome! To add support for new formats:
- Create a new module for the format
- Implement read/write functions
- Add format detection logic
- Include tests with sample data
- Update documentation
See CONTRIBUTING.md for more details.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.