Expand description
Streaming Ingester for Memory-Efficient File Encoding
This module provides a streaming API for ingesting large files without loading them entirely into memory. It processes data in chunks, encoding each chunk as it arrives and bundling progressively into the root engram.
§Why Streaming?
For large files (>100MB), loading everything into memory before encoding:
- Causes memory pressure and potential OOM for multi-GB files
- Increases latency (must read entire file before any encoding starts)
- Wastes resources (chunks are independent, can encode in parallel)
Streaming ingestion solves these by:
- Processing fixed-size chunks as they arrive
- Encoding each chunk immediately
- Maintaining bounded memory usage (~chunk_size * pipeline_depth)
- Enabling early error detection
§Architecture
┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Data Source │────▶│ StreamingIngester│────▶│ VersionedEmbrFS │
│ (Read/io) │ │ (chunk + encode) │ │ (store + bundle) │
└─────────────┘ └─────────────────┘ └──────────────────┘
│
├── Chunk Buffer (4KB default)
├── Correction Accumulator
└── Progressive Hash§Example
ⓘ
use embeddenator_fs::streaming::StreamingIngester;
use std::fs::File;
use std::io::BufReader;
let file = File::open("large_file.bin")?;
let reader = BufReader::new(file);
let mut ingester = StreamingIngester::builder(&fs)
.with_chunk_size(8192)
.with_path("large_file.bin");
ingester.ingest_reader(reader)?;
let result = ingester.finalize()?;Structs§
- Streaming
Decode Progress - Progress information for streaming decode
- Streaming
Decoder - Streaming decoder for memory-efficient file reading
- Streaming
Decoder Builder - Builder for configuring streaming decode with options
- Streaming
Ingester - Streaming ingester for memory-efficient file encoding
- Streaming
Ingester Builder - Builder for configuring streaming ingestion
- Streaming
Progress - Progress information for streaming ingestion
- Streaming
Result - Result of a completed streaming ingestion