seqpacker 0.1.0

Training LLMs on variable-length sequences? Naive padding wastes 20-40% of GPU compute. SeqPacker packs sequences into fixed-size bins, achieving 95-99% utilization with 11 bin-packing algorithms — from O(n) streaming to near-optimal offline.

11 algorithms — NF, FF, BF, WF, FFD, BFD, FFS, MFFD, OBFD, OBFDP, HK
Streaming API — bounded-space packing with incremental output
PyTorch integration — GPU-ready tensors out of the box
NumPy zero-copy — pass arrays directly, no conversion overhead
Cross-platform — Linux, macOS, Windows; Python 3.9-3.13

Installation

# Python (pip)
pip install seqpacker

# Python (uv)
uv add seqpacker

# Rust
cargo add seqpacker

Quick Start

Python

from seqpacker import pack_sequences

lengths = [1000, 800, 600, 500, 400, 300, 200, 100]
result = pack_sequences(lengths, capacity=1024)

print(result.bins)        # [[0], [1, 7], [2, 4], [3, 5, 6]]
print(result.efficiency)  # 0.952...

Rust

use seqpacker::{Packer, PackStrategy};

let packer = Packer::new(1024)
    .with_strategy(PackStrategy::OptimizedBestFitDecreasing);

let result = packer.pack_lengths(&[1000, 800, 600, 500, 400, 300, 200, 100]).unwrap();
println!("Efficiency: {:.2}%", result.metrics.efficiency * 100.0);

Algorithms

11 bin-packing algorithms from O(n) online to optimal offline:

Algorithm	Short	Time	Approx. Ratio	Best For
NextFit	`nf`	O(n)	2.0	Memory-constrained streaming
FirstFit	`ff`	O(n log B)	1.7	Online baseline
BestFit	`bf`	O(n log B)	1.7	Tighter online packing
WorstFit	`wf`	O(n log B)	2.0	Even distribution
FirstFitDecreasing	`ffd`	O(n log n)	1.22	Good offline default
BestFitDecreasing	`bfd`	O(n log n)	1.22	Tighter offline packing
FirstFitShuffle	`ffs`	O(n log n)	~1.3	Training randomness
ModifiedFFD	`mffd`	O(n log n)	1.18	Mixed-size distributions
OptimizedBFD	`obfd`	O(n log n)	1.22	Default (recommended)
ParallelOBFD	`obfdp`	O(n log n)	1.22	Large datasets (multi-threaded)
Harmonic-K	`hk`	O(n)	~1.69	Bounded-space online

from seqpacker import Packer

# Use any algorithm by short name (default: obfd)
packer = Packer(capacity=2048, strategy="obfd")
result = packer.pack([500, 600, 400, 1000])

# List all available strategies
print(Packer.strategies())

Usage Modes

Batch Packing

Pack all sequences at once. Best for offline dataset preprocessing. All 11 algorithms available.

from seqpacker import Packer

packer = Packer(capacity=2048, strategy="obfd")
result = packer.pack(sequence_lengths)

for pack in result.packs:
    print(pack.sequence_ids, pack.lengths, pack.used)

print(f"Efficiency: {result.efficiency:.2%}")
print(f"Packs: {result.num_bins}")

Streaming

Feed sequences one at a time. Completed packs are emitted incrementally. Only bounded-space algorithms supported: NextFit (nf) and Harmonic-K (hk).

from seqpacker import StreamPacker

sp = StreamPacker(capacity=2048, strategy="nf")

for length in dataset_lengths:
    for pack in sp.add(length):
        process(pack)  # completed packs emitted as they fill

for pack in sp.finish():
    process(pack)      # flush remaining

Buffer + Batch

Accumulate sequences into a buffer and pack periodically. Requires no special library support -- just call pack() on each buffer. All algorithms available.

from seqpacker import Packer

packer = Packer(capacity=2048, strategy="obfd")
buffer = []

for sample in dataset_stream:
    buffer.append(len(sample["input_ids"]))
    if len(buffer) >= 10_000:
        result = packer.pack(buffer)
        for pack in result.packs:
            yield pack
        buffer.clear()

if buffer:
    result = packer.pack(buffer)
    for pack in result.packs:
        yield pack

PyTorch Integration

seqpacker.torch_utils provides helpers for converting pack results into GPU-ready tensors. Torch is not a dependency -- import only when you need it.

from seqpacker.torch_utils import packed_collate_fn
from torch.utils.data import DataLoader

collate = packed_collate_fn(capacity=2048, strategy="obfd")
loader = DataLoader(dataset, collate_fn=collate, batch_size=256)

for batch in loader:
    outputs = model(
        input_ids=batch.input_ids,
        position_ids=batch.position_ids,
        labels=batch.labels,
    )

Or convert a PackResult directly:

from seqpacker import pack_sequences
from seqpacker.torch_utils import pack_result_to_tensors

result = pack_sequences(lengths, capacity=2048)
batch = pack_result_to_tensors(result=result, token_ids=token_ids)
# batch.input_ids, batch.cu_seqlens, batch.position_ids, batch.labels, batch.attention_mask

NumPy Support

Both list and NumPy array inputs are supported with zero-copy for NumPy:

import numpy as np
from seqpacker import Packer

packer = Packer(capacity=2048)
lengths = np.array([500, 600, 400, 1000], dtype=np.int64)
result = packer.pack(lengths)

# Flat NumPy output for maximum performance
items_flat, bin_offsets = packer.pack_flat(lengths)
bins = np.split(items_flat, bin_offsets)

Performance

SeqPacker achieves equal packing efficiency to competitors while being significantly faster:

Comparison	Speedup	Efficiency
vs LightBinPack (C++)	~1.2-1.5x faster	Equal (98.76%)
vs greedy_ffd (Python)	~400x faster	Equal
vs binpacking (Python)	~1,700x faster	Equal
vs prtpy (Python)	~1,900x faster	Equal

Benchmarked on 10,000 sequences across real-world datasets (Alpaca, UltraChat, C4). See the interactive benchmark dashboard for detailed results.

Contributing

See CONTRIBUTING.md for setup instructions and development workflow.

make install       # Install dependencies
make build-dev     # Build the Rust extension
make test          # Run all tests (400 Rust + 249 Python)
make help          # See all commands

License

MIT