Expand description
alimentar has moved to aprender-data.
This crate re-exports aprender-data for backward compatibility.
New code should depend on aprender-data directly.
Modules§
- async_
prefetch - Async prefetch for parallel I/O in streaming datasets.
- backend
- Storage backends for alimentar.
- cli
- CLI module for command-line interface alimentar CLI - Data Loading, Distribution and Tooling
- dataloader
- DataLoader for batched iteration over datasets.
- dataset
- Dataset types for alimentar.
- datasets
- Canonical ML dataset loaders
- drift
- Data drift detection for ML pipelines
- error
- Error types for alimentar.
- federated
- Federated Split Coordination for Privacy-Preserving ML
- format
- Alimentar Dataset Format (.ald)
- imbalance
- Imbalanced dataset detection for ML pipelines
- mmap
- Memory-mapped dataset for efficient large file access.
- parallel
- Parallel data loading with multi-worker support.
- quality
- Data quality assessment for ML pipelines
- registry
- Dataset registry for sharing and discovery.
- repl
- Interactive REPL for alimentar (ALIM-SPEC-006)
- serve
- WASM Serve Module - Browser-based data serving and sharing
- sketch
- Sketch-based statistics for distributed/federated drift detection
- split
- Dataset splitting utilities
- streaming
- Streaming dataset for lazy/chunked data loading.
- tensor
- Tensor conversion utilities for ML framework integration.
- transform
- Data transforms for alimentar.
- tui
- TUI dataset viewer module TUI Dataset Viewer Module
- weighted
- Weighted DataLoader for importance sampling.
Structs§
- Arrow
Dataset - An in-memory dataset backed by Arrow RecordBatches.
- Async
Prefetch Builder - Builder for creating async prefetch datasets.
- Async
Prefetch Dataset - A streaming dataset with async prefetch for parallel I/O.
- Cast
- A transform that casts columns to different data types.
- Centroid
- A centroid in a T-Digest (mean and weight)
- Chain
- A chain of transforms applied in sequence.
- Class
Distribution - Distribution of classes in a dataset
- Column
Drift - Per-column drift result
- Column
Quality - Quality statistics for a single column
- CsvOptions
- Options for CSV parsing.
- DDSketch
- DDSketch for distribution estimation
- Data
Loader - A data loader that provides batched iteration over a dataset.
- Data
Sketch - Serializable data sketch containing distribution summaries
- Dataset
Split - Dataset split with optional validation set
- Dataset
Viewer - A scrollable table view for displaying Arrow datasets
- Distributed
Drift Detector - Distributed drift detector using sketches
- Drift
Detector - Statistical drift detector
- Drift
Report - Overall drift detection report
- Drop
- A transform that drops (removes) specified columns from a RecordBatch.
- Federated
Split Coordinator - Federated split coordination (no raw data leaves nodes)
- Fill
Null - A transform that fills null values in specified columns.
- Filter
- A transform that filters rows based on a predicate.
- Fim
- Fill-in-the-Middle transform for code training data.
- FimTokens
- Configuration for FIM sentinel tokens.
- Global
Split Report - Report on global split quality across all nodes
- Imbalance
Detector - Detector for class imbalance in datasets
- Imbalance
Metrics - Metrics for measuring class imbalance
- Imbalance
Report - Report from imbalance analysis
- Json
Options - Options for JSON/JSONL parsing.
- Map
- A transform that applies a function to each RecordBatch.
- Mmap
Dataset - A memory-mapped dataset backed by a Parquet file.
- Mmap
Dataset Builder - Builder for configuring
MmapDatasetoptions. - Node
Split Instruction - Instructions for a node to execute its split
- Node
Split Manifest - Per-node split manifest (shared with coordinator, no raw data)
- Node
Summary - Summary for a single node
- Normalize
- A transform that normalizes numeric columns.
- Parallel
Data Loader - Parallel data loader with multi-worker support.
- Parallel
Data Loader Builder - Builder for parallel data loader configuration.
- Quality
Checker - Data quality checker
- Quality
Profile - Quality profile for customizing scoring rules per data type.
- Quality
Report - Overall data quality report
- Record
Batch - A two-dimensional batch of column-oriented data with a defined schema.
- Rename
- A transform that renames columns in a RecordBatch.
- RowDetail
View - Row detail view widget for displaying a single record
- Sample
- A transform that randomly samples rows from a RecordBatch.
- Schema
- Describes the meta-data of an ordered sequence of relative types.
- Schema
Inspector - Schema inspector widget for displaying dataset schema
- Select
- A transform that selects specific columns from a RecordBatch.
- Shuffle
- A transform that shuffles rows in a RecordBatch.
- Sketch
Drift Result - Result of distributed drift comparison
- Skip
- A transform that skips the first N rows from a RecordBatch.
- Sort
- A transform that sorts rows by one or more columns.
- Sync
Prefetch Dataset - Synchronous wrapper for async prefetch that works with DataLoader.
- TDigest
- T-Digest for streaming quantile estimation
- Take
- A transform that takes the first N rows from a RecordBatch.
- Text
Column Stats - Statistics for a text (string) column — useful for ML classification audits.
- Unique
- A transform that removes duplicate rows based on specified columns.
- Weighted
Data Loader - A data loader that samples with per-sample weights.
Enums§
- Dataset
Adapter - Adapter providing uniform access to Arrow datasets for TUI rendering
- Drift
Severity - Severity of detected drift
- Drift
Test - Statistical tests for drift detection
- Error
- Errors that can occur in alimentar operations.
- Federated
Split Strategy - Strategy for federated/distributed splitting
- Fill
Strategy - Strategy for filling null values.
- FimFormat
- FIM format variant.
- Imbalance
Recommendation - Recommendation for handling imbalanced data
- Imbalance
Severity - Severity of class imbalance
- Norm
Method - Normalization method for numeric columns.
- Quality
Issue - Types of data quality issues
- Resample
Strategy - Strategy for resampling an imbalanced dataset.
- Sketch
Type - Type of sketch algorithm
- Sort
Order - Sort order for the Sort transform.
- Split
Quality Issue - Quality issues that can be detected in federated splits
- TuiError
- TUI-specific error type
Traits§
- Dataset
- A dataset that can be iterated over.
- Transform
- A transform that can be applied to RecordBatches.
Functions§
- resample
- Resample a classification dataset to address class imbalance.
- sqrt_
inverse_ weights - Compute sqrt-inverse class weights for weighted loss.