Module streaming_reader

Module streaming_reader 

Source
Expand description

Streaming reader for XLSX files with optimized memory usage

This module provides a reader that processes data row-by-row with an iterator interface.

Memory Usage:

  • Shared Strings Table (SST): Loaded fully (~3-5 MB for typical files)
  • Worksheet XML: Loaded fully from ZIP (uncompressed size)
  • Total memory ≈ SST + Uncompressed XML size

Important Notes:

  • XLSX files are compressed. A 86 MB file may contain 1.2 GB uncompressed XML
  • For small-medium files (< 100 MB): Memory usage is reasonable
  • For large files with huge XML: Memory = uncompressed XML size
  • Still faster than calamine (no style parsing) and uses optimized SST

Trade-offs:

  • Only supports simple XLSX files (no complex formatting)
  • Sequential read only (can’t jump to random rows)
  • Best for: Fast iteration, simple data extraction, no formatting needs

Structs§

RowIterator
Iterator over rows in a worksheet Streams XML data from ZIP without loading entire worksheet into memory
RowStructIterator
Iterator wrapper that returns Row structs instead of Vec for backward compatibility with the old calamine-based API
StreamingReader
Streaming reader for XLSX files