rust-data-processing 0.3.6

Schema-first ingestion (CSV, JSON, Parquet, Excel) into an in-memory DataSet, plus Polars-backed pipelines, SQL, profiling, validation, and map/reduce-style processing.
docs.rs failed to build rust-data-processing-0.3.6
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

rust-data-processing

Phase 3 scope: Rust core with Python (PyO3) and Java (Panama) bindings, agent-ready JSON FFI, and shared batch/streaming connectors

Rust library: schema-first ingestion (CSV, JSON, Parquet, Excel with Cargo features) into an in-memory DataSet, plus Polars-backed pipelines, optional SQL, profiling, validation, map/reduce-style processing, Phase 2 export (JSONL, train/test splits), UTF-8 privacy transforms and summaries, median aggregations, Arrow interop, incremental ingest helpers, and Phase 3 JVM bindings (Panama / rdp_jvm_sys, Maven + Gradle) with JSON parity FFI for agents and orchestration.

Infographic: Phase 3 — one Rust engine; Python (PyO3 / PyPI) and Java (Panama / Maven + Gradle) bindings; Phase 1–2 ingest → DataSet → pipelines, SQL, profile, validate; agent-ready JSON in/out; shared connectors (Postgres, S3, Kafka, Snowflake).

Limits (masking / “PII”): UTF-8 transforms and validation checks are mechanical helpers only; callers supply policy and must not treat outputs as legal guarantees. See Planning/P2_E6_PRIVACY_POLICY.md in the repository.

This file is the crate README shown on crates.io and at the top of docs.rs (Rust-only). The repository’s README.md is the full monorepo overview (including Python).

Documentation

Link
Rust API (module tree) Use the crate index on this docs.rs page (left sidebar).
Repository github.com/scorpio-datalake/rust-data-processing
Markdown API overview API.md (shipped in this crate)
Rust examples & cookbook docs/rust/README.md
Python package (PyPI) pypi.org/project/rust-data-processing
JVM bindings (Maven Central) docs/java/README.md — Panama / rdp_jvm_sys
Python runnable examples (HTML) GitHub Pages — examples
HTML site (Rust + Python pages) GitHub Pages — homeRust (rustdoc): crate index on Pages (or docs.rs); Python (pdoc): module root. Setup if the site is empty.

Quick start (Rust)

use rust_data_processing::ingestion::{ingest_from_path, IngestionOptions};
use rust_data_processing::types::{DataType, Field, Schema};

let schema = Schema::new(vec![
    Field::new("id", DataType::Int64),
    Field::new("name", DataType::Utf8),
]);
let _ds = ingest_from_path("path/to/data.csv", &schema, &IngestionOptions::default())
    .expect("ingest");

More patterns: docs/rust/README.md.

Features (Cargo)

  • default: includes sql (Polars-backed SQL via polars-sql).
  • excel: Excel workbook ingestion (calamine).
  • sql: Polars SQL (on by default; use default-features = false to drop).
  • db_connectorx: optional DB → Arrow → DataSet.
  • arrow / serde_arrow: Arrow interop helpers.

Full list: Cargo.toml [features].

License

MIT OR Apache-2.0 - see LICENSE-MIT and LICENSE-APACHE.