seqpacker 0.1.2

High-performance sequence packing for LLM training
Documentation

The core of SeqPacker, written in Rust. This crate provides 11 bin-packing algorithms for packing variable-length sequences into fixed-size bins, reducing padding waste from 20-40% down to 1-5%.

Python bindings are available via the seqpacker PyPI package.

Quick Start

use seqpacker::{Packer, PackStrategy, Sequence};

let packer = Packer::new(2048)
    .with_strategy(PackStrategy::OptimizedBestFitDecreasing);

let sequences = vec![
    Sequence::new(0, 500),
    Sequence::new(1, 600),
    Sequence::new(2, 400),
    Sequence::new(3, 1000),
];

let result = packer.pack(sequences).unwrap();
println!("Bins: {}", result.packs.len());
println!("Efficiency: {:.2}%", result.metrics.efficiency * 100.0);

Pack from lengths

use seqpacker::{Packer, PackStrategy};

let packer = Packer::new(1024)
    .with_strategy(PackStrategy::FirstFitDecreasing);

let result = packer.pack_lengths(&[1000, 800, 600, 500, 400, 300, 200, 100]).unwrap();

for (i, pack) in result.packs.iter().enumerate() {
    let ids: Vec<usize> = pack.sequences.iter().map(|s| s.id).collect();
    println!("Pack {}: items {:?}, used {}/{}", i, ids, pack.used_capacity(), pack.capacity);
}

Streaming

For online / bounded-space packing, use StreamPacker with NextFit or Harmonic:

use seqpacker::{StreamPacker, StreamStrategy};

let mut stream = StreamPacker::new(2048, StreamStrategy::NextFit);

let lengths = [500, 600, 1500, 400];

for &len in &lengths {
    for completed_pack in stream.add(len).unwrap() {
        println!("Completed: {} items, {}/{} used", completed_pack.len(), completed_pack.used_capacity(), completed_pack.capacity);
    }
}

// Flush remaining
for pack in stream.finish() {
    println!("Remaining: {} items", pack.len());
}

Algorithms

11 bin-packing algorithms from O(n) online to near-optimal offline:

Algorithm Enum Variant Time Approx. Ratio Best For
NextFit NextFit O(n) 2.0 Memory-constrained streaming
FirstFit FirstFit O(n log B) 1.7 Online baseline
BestFit BestFit O(n log B) 1.7 Tighter online packing
WorstFit WorstFit O(n log B) 2.0 Even distribution
FirstFitDecreasing FirstFitDecreasing O(n log n) 1.22 Good offline default
BestFitDecreasing BestFitDecreasing O(n log n) 1.22 Tighter offline packing
FirstFitShuffle FirstFitShuffle O(n log n) ~1.3 Training randomness
ModifiedFFD ModifiedFirstFitDecreasing O(n log n) 1.18 Mixed-size distributions
OptimizedBFD OptimizedBestFitDecreasing O(n log n) 1.22 Default (recommended)
ParallelOBFD ParallelOptimizedBestFitDecreasing O(n log n) 1.22 Large datasets (multi-threaded)
Harmonic-K Harmonic O(n) ~1.69 Bounded-space online

Select an algorithm via PackStrategy:

use seqpacker::{Packer, PackStrategy};

let packer = Packer::new(2048)
    .with_strategy(PackStrategy::ModifiedFirstFitDecreasing);

Performance

SeqPacker achieves equal packing efficiency to competitors while being significantly faster:

Comparison Speedup Efficiency
vs LightBinPack (C++) ~1.2-1.5x faster Equal (98.76%)
vs greedy_ffd (Python) ~400x faster Equal
vs binpacking (Python) ~1,700x faster Equal

See the interactive benchmark dashboard for detailed results.

License

MIT