Skip to main content

Crate alimentar

Crate alimentar 

Source
Expand description

alimentar has moved to aprender-data.

This crate re-exports aprender-data for backward compatibility. New code should depend on aprender-data directly.

Modules§

async_prefetch
Async prefetch for parallel I/O in streaming datasets.
backend
Storage backends for alimentar.
cli
CLI module for command-line interface alimentar CLI - Data Loading, Distribution and Tooling
dataloader
DataLoader for batched iteration over datasets.
dataset
Dataset types for alimentar.
datasets
Canonical ML dataset loaders
drift
Data drift detection for ML pipelines
error
Error types for alimentar.
federated
Federated Split Coordination for Privacy-Preserving ML
format
Alimentar Dataset Format (.ald)
imbalance
Imbalanced dataset detection for ML pipelines
mmap
Memory-mapped dataset for efficient large file access.
parallel
Parallel data loading with multi-worker support.
quality
Data quality assessment for ML pipelines
registry
Dataset registry for sharing and discovery.
repl
Interactive REPL for alimentar (ALIM-SPEC-006)
serve
WASM Serve Module - Browser-based data serving and sharing
sketch
Sketch-based statistics for distributed/federated drift detection
split
Dataset splitting utilities
streaming
Streaming dataset for lazy/chunked data loading.
tensor
Tensor conversion utilities for ML framework integration.
transform
Data transforms for alimentar.
tui
TUI dataset viewer module TUI Dataset Viewer Module
weighted
Weighted DataLoader for importance sampling.

Structs§

ArrowDataset
An in-memory dataset backed by Arrow RecordBatches.
AsyncPrefetchBuilder
Builder for creating async prefetch datasets.
AsyncPrefetchDataset
A streaming dataset with async prefetch for parallel I/O.
Cast
A transform that casts columns to different data types.
Centroid
A centroid in a T-Digest (mean and weight)
Chain
A chain of transforms applied in sequence.
ClassDistribution
Distribution of classes in a dataset
ColumnDrift
Per-column drift result
ColumnQuality
Quality statistics for a single column
CsvOptions
Options for CSV parsing.
DDSketch
DDSketch for distribution estimation
DataLoader
A data loader that provides batched iteration over a dataset.
DataSketch
Serializable data sketch containing distribution summaries
DatasetSplit
Dataset split with optional validation set
DatasetViewer
A scrollable table view for displaying Arrow datasets
DistributedDriftDetector
Distributed drift detector using sketches
DriftDetector
Statistical drift detector
DriftReport
Overall drift detection report
Drop
A transform that drops (removes) specified columns from a RecordBatch.
FederatedSplitCoordinator
Federated split coordination (no raw data leaves nodes)
FillNull
A transform that fills null values in specified columns.
Filter
A transform that filters rows based on a predicate.
Fim
Fill-in-the-Middle transform for code training data.
FimTokens
Configuration for FIM sentinel tokens.
GlobalSplitReport
Report on global split quality across all nodes
ImbalanceDetector
Detector for class imbalance in datasets
ImbalanceMetrics
Metrics for measuring class imbalance
ImbalanceReport
Report from imbalance analysis
JsonOptions
Options for JSON/JSONL parsing.
Map
A transform that applies a function to each RecordBatch.
MmapDataset
A memory-mapped dataset backed by a Parquet file.
MmapDatasetBuilder
Builder for configuring MmapDataset options.
NodeSplitInstruction
Instructions for a node to execute its split
NodeSplitManifest
Per-node split manifest (shared with coordinator, no raw data)
NodeSummary
Summary for a single node
Normalize
A transform that normalizes numeric columns.
ParallelDataLoader
Parallel data loader with multi-worker support.
ParallelDataLoaderBuilder
Builder for parallel data loader configuration.
QualityChecker
Data quality checker
QualityProfile
Quality profile for customizing scoring rules per data type.
QualityReport
Overall data quality report
RecordBatch
A two-dimensional batch of column-oriented data with a defined schema.
Rename
A transform that renames columns in a RecordBatch.
RowDetailView
Row detail view widget for displaying a single record
Sample
A transform that randomly samples rows from a RecordBatch.
Schema
Describes the meta-data of an ordered sequence of relative types.
SchemaInspector
Schema inspector widget for displaying dataset schema
Select
A transform that selects specific columns from a RecordBatch.
Shuffle
A transform that shuffles rows in a RecordBatch.
SketchDriftResult
Result of distributed drift comparison
Skip
A transform that skips the first N rows from a RecordBatch.
Sort
A transform that sorts rows by one or more columns.
SyncPrefetchDataset
Synchronous wrapper for async prefetch that works with DataLoader.
TDigest
T-Digest for streaming quantile estimation
Take
A transform that takes the first N rows from a RecordBatch.
TextColumnStats
Statistics for a text (string) column — useful for ML classification audits.
Unique
A transform that removes duplicate rows based on specified columns.
WeightedDataLoader
A data loader that samples with per-sample weights.

Enums§

DatasetAdapter
Adapter providing uniform access to Arrow datasets for TUI rendering
DriftSeverity
Severity of detected drift
DriftTest
Statistical tests for drift detection
Error
Errors that can occur in alimentar operations.
FederatedSplitStrategy
Strategy for federated/distributed splitting
FillStrategy
Strategy for filling null values.
FimFormat
FIM format variant.
ImbalanceRecommendation
Recommendation for handling imbalanced data
ImbalanceSeverity
Severity of class imbalance
NormMethod
Normalization method for numeric columns.
QualityIssue
Types of data quality issues
ResampleStrategy
Strategy for resampling an imbalanced dataset.
SketchType
Type of sketch algorithm
SortOrder
Sort order for the Sort transform.
SplitQualityIssue
Quality issues that can be detected in federated splits
TuiError
TUI-specific error type

Traits§

Dataset
A dataset that can be iterated over.
Transform
A transform that can be applied to RecordBatches.

Functions§

resample
Resample a classification dataset to address class imbalance.
sqrt_inverse_weights
Compute sqrt-inverse class weights for weighted loss.

Type Aliases§

Result
Result type alias for alimentar operations.
SchemaRef
A reference-counted reference to a Schema.
TuiResult
Result type for TUI operations