Skip to main content

Module streaming

Module streaming 

Source
Expand description

Streaming Ingester for Memory-Efficient File Encoding

This module provides a streaming API for ingesting large files without loading them entirely into memory. It processes data in chunks, encoding each chunk as it arrives and bundling progressively into the root engram.

§Why Streaming?

For large files (>100MB), loading everything into memory before encoding:

  • Causes memory pressure and potential OOM for multi-GB files
  • Increases latency (must read entire file before any encoding starts)
  • Wastes resources (chunks are independent, can encode in parallel)

Streaming ingestion solves these by:

  • Processing fixed-size chunks as they arrive
  • Encoding each chunk immediately
  • Maintaining bounded memory usage (~chunk_size * pipeline_depth)
  • Enabling early error detection

§Architecture

┌─────────────┐     ┌─────────────────┐     ┌──────────────────┐
│ Data Source │────▶│ StreamingIngester│────▶│ VersionedEmbrFS │
│ (Read/io)   │     │ (chunk + encode) │     │ (store + bundle) │
└─────────────┘     └─────────────────┘     └──────────────────┘
                             │
                             ├── Chunk Buffer (4KB default)
                             ├── Correction Accumulator
                             └── Progressive Hash

§Example

use embeddenator_fs::streaming::StreamingIngester;
use std::fs::File;
use std::io::BufReader;

let file = File::open("large_file.bin")?;
let reader = BufReader::new(file);

let mut ingester = StreamingIngester::builder(&fs)
    .with_chunk_size(8192)
    .with_path("large_file.bin");

ingester.ingest_reader(reader)?;
let result = ingester.finalize()?;

Structs§

StreamingDecodeProgress
Progress information for streaming decode
StreamingDecoder
Streaming decoder for memory-efficient file reading
StreamingDecoderBuilder
Builder for configuring streaming decode with options
StreamingIngester
Streaming ingester for memory-efficient file encoding
StreamingIngesterBuilder
Builder for configuring streaming ingestion
StreamingProgress
Progress information for streaming ingestion
StreamingResult
Result of a completed streaming ingestion