s4-codec (Python bindings)
In-process GPU/CPU compression from Python — no S4 gateway required.
Wraps the same Rust s4-codec crate that powers the S4
S3-compatible storage gateway, so a Python notebook / Airflow task / Spark
UDF can compress and decompress with the exact same byte format as
objects sitting in an S4 bucket.
Install
For GPU (nvCOMP) codecs you currently have to build from source, because the wheel needs to be linked against your CUDA toolchain. See Build from source below.
Example
=
= b * 10_000
, , =
=
assert ==
# RFC 1952 gzip output — decodable by any standard `gunzip`-aware client.
, * =
assert == b
API
| Class / function | Purpose |
|---|---|
CpuZstd(level: int = 3) |
CPU zstd, level 1..=22. |
CpuGzip(level: int = 6) |
CPU gzip (RFC 1952), level 0..=9. |
<codec>.compress(data: bytes) -> (bytes, int, int) |
Returns (compressed, original_size, crc32c). |
<codec>.decompress(data, original_size, crc32c) -> bytes |
Inverse of compress. |
gpu_available() -> bool |
True iff the wheel was built with --features nvcomp-gpu and a CUDA-capable GPU is reachable. |
The (original_size, crc32c) tuple corresponds to the
ChunkManifest.original_size / ChunkManifest.crc32c fields the Rust
crate uses; round-trip them alongside the compressed payload (e.g. as
JSON sidecar fields).
Build from source
# CPU-only wheel
# GPU wheel — requires NVCOMP_HOME pointed at an extracted nvCOMP redist
# tarball, plus a CUDA toolchain (nvcc) on the build host.
maturin develop installs the wheel into the current virtualenv for
iterative development.
Running tests
GPU codecs require a --features nvcomp-gpu build:
The pytest suite covers CPU codec round-trips, RFC 1952 gzip compatibility,
GIL-release threading, version inheritance, and the per-CodecError
exception class hierarchy (v0.8.5 #85). A separate Rust-side test
(tests/version_matches_workspace.rs) guards the workspace semver inherit.
Error handling
The binding raises a subclass tree per CodecError variant so callers can
branch programmatically instead of string-matching:
| Exception class | CodecError variant |
Base class |
|---|---|---|
S4Error |
(base + TruncatedStream) |
ValueError |
S4CrcMismatchError |
CrcMismatch |
S4Error |
S4SizeMismatchError |
SizeMismatch |
S4Error |
S4CodecMismatchError |
CodecMismatch |
S4Error |
S4UnregisteredCodecError |
UnregisteredCodec |
S4Error |
S4ManifestSizeExceedsLimitError |
ManifestSizeExceedsLimit |
S4Error |
S4ManifestSizeMismatchError |
ManifestSizeMismatch |
S4Error |
S4BackendError |
Backend / Join |
RuntimeError |
S4IoError |
Io |
OSError |
S4Error inherits from ValueError for backward compat with code that
caught the previous flat ValueError mapping. S4BackendError and
S4IoError deliberately escape that hierarchy so existing retry-on-IOError
middleware continues to fire on the right class.
Workspace integration
The crate ships a cdylib only and uses PyO3's extension-module
feature, so cargo check -p s4-codec-py and cargo build --workspace
succeed on a CI runner with no Python development headers installed —
no libpython link is performed; the Python interpreter that loads the
.so provides those symbols at runtime.
If you ever see a link error like
undefined reference to PyExc_…, drop pyo3/extension-module from the
features and you'll get the diagnostic build that does link libpython.
Threading / GIL
Both CpuZstd.compress() and CpuGzip.compress() (and their decompress()
counterparts) release the Python GIL while running, so other Python threads
make progress concurrently. This is safe for:
- Django / Flask workers
- ASGI / asyncio event loops (use
asyncio.to_thread()to wrap the blocking call) - multi-threaded data pipelines
Example (asyncio):
=
, , = await
return
Note: the methods themselves are synchronous — they don't return awaitables. The GIL release means another Python thread can run during the compress; it doesn't make the call async-aware.
Supported codecs
| Codec | Default | Requires --features nvcomp-gpu |
|---|---|---|
CpuZstd |
✓ | — |
CpuGzip |
✓ | — |
NvcompZstd |
— | ✓ + CUDA 12.x at runtime |
NvcompBitcomp |
— | ✓ + CUDA 12.x at runtime |
NvcompGDeflate |
— | ✓ + CUDA 12.x at runtime |
Use gpu_available() -> bool at runtime to confirm a CUDA-capable GPU is present
before constructing a GPU codec — building the wheel with --features nvcomp-gpu
on a host with no GPU still produces a wheel that loads but raises at codec
construction time.
Publishing status
- PyPI publish is manual (no CI automation as of v0.8.5):
- Workspace version inheritance was fixed in v0.8.5 #82 — the published wheel version now matches the gateway version.
target/wheels/ is gitignored — never commit .whl files.
License
Apache-2.0 — same as the rest of the S4 project.