Training LLMs on variable-length sequences? Naive padding wastes 20-40% of GPU compute. SeqPacker packs sequences into fixed-size bins, achieving 95-99% utilization with 11 bin-packing algorithms — from O(n) streaming to near-optimal offline.
- 11 algorithms — NF, FF, BF, WF, FFD, BFD, FFS, MFFD, OBFD, OBFDP, HK
- Streaming API — bounded-space packing with incremental output
- PyTorch integration — GPU-ready tensors out of the box
- NumPy zero-copy — pass arrays directly, no conversion overhead
- Cross-platform — Linux, macOS, Windows; Python 3.9-3.13
Installation
# Python (pip)
# Python (uv)
# Rust
Quick Start
Python
=
=
# [[0], [1, 7], [2, 4], [3, 5, 6]]
# 0.952...
Rust
use ;
let packer = new
.with_strategy;
let result = packer.pack_lengths.unwrap;
println!;
Algorithms
11 bin-packing algorithms from O(n) online to optimal offline:
| Algorithm | Short | Time | Approx. Ratio | Best For |
|---|---|---|---|---|
| NextFit | nf |
O(n) | 2.0 | Memory-constrained streaming |
| FirstFit | ff |
O(n log B) | 1.7 | Online baseline |
| BestFit | bf |
O(n log B) | 1.7 | Tighter online packing |
| WorstFit | wf |
O(n log B) | 2.0 | Even distribution |
| FirstFitDecreasing | ffd |
O(n log n) | 1.22 | Good offline default |
| BestFitDecreasing | bfd |
O(n log n) | 1.22 | Tighter offline packing |
| FirstFitShuffle | ffs |
O(n log n) | ~1.3 | Training randomness |
| ModifiedFFD | mffd |
O(n log n) | 1.18 | Mixed-size distributions |
| OptimizedBFD | obfd |
O(n log n) | 1.22 | Default (recommended) |
| ParallelOBFD | obfdp |
O(n log n) | 1.22 | Large datasets (multi-threaded) |
| Harmonic-K | hk |
O(n) | ~1.69 | Bounded-space online |
# Use any algorithm by short name (default: obfd)
=
=
# List all available strategies
Usage Modes
Batch Packing
Pack all sequences at once. Best for offline dataset preprocessing. All 11 algorithms available.
=
=
Streaming
Feed sequences one at a time. Completed packs are emitted incrementally. Only bounded-space algorithms supported: NextFit (nf) and Harmonic-K (hk).
=
# completed packs emitted as they fill
# flush remaining
Buffer + Batch
Accumulate sequences into a buffer and pack periodically. Requires no special library support -- just call pack() on each buffer. All algorithms available.
=
=
=
yield
=
yield
PyTorch Integration
seqpacker.torch_utils provides helpers for converting pack results into GPU-ready tensors. Torch is not a dependency -- import only when you need it.
=
=
=
Or convert a PackResult directly:
=
=
# batch.input_ids, batch.cu_seqlens, batch.position_ids, batch.labels, batch.attention_mask
NumPy Support
Both list and NumPy array inputs are supported with zero-copy for NumPy:
=
=
=
# Flat NumPy output for maximum performance
, =
=
Performance
SeqPacker achieves equal packing efficiency to competitors while being significantly faster:
| Comparison | Speedup | Efficiency |
|---|---|---|
| vs LightBinPack (C++) | ~1.2-1.5x faster | Equal (98.76%) |
| vs greedy_ffd (Python) | ~400x faster | Equal |
| vs binpacking (Python) | ~1,700x faster | Equal |
| vs prtpy (Python) | ~1,900x faster | Equal |
Benchmarked on 10,000 sequences across real-world datasets (Alpaca, UltraChat, C4). See the interactive benchmark dashboard for detailed results.
Contributing
See CONTRIBUTING.md for setup instructions and development workflow.
License
MIT