S4 — Squished S3
Drop-in S3-compatible storage gateway with GPU-accelerated transparent compression. Cuts your AWS S3 bill 50–80% without changing a single line of application code.
What is S4?
S4 (Squished S3) is an S3-compatible storage gateway written in Rust that sits between your applications (boto3 / aws-sdk / aws-cli / Spark / Trino / DuckDB / anything S3) and your real S3 bucket — and transparently compresses every object with GPU codecs (NVIDIA nvCOMP zstd / Bitcomp / gANS) or CPU zstd before storing it.
endpoint: s4.example.com
your application ──────────────────────────▶ S4 (this project)
(boto3, Spark, │
Trino, ...) ▼
(compress with GPU)
│
▼
AWS S3 (real bucket)
- No app changes: same S3 wire protocol, same SigV4 auth, same SDK calls
- Transparent: PUT compresses, GET decompresses; clients see the original bytes
- No lock-in: stop the gateway, read your bucket directly with aws-cli
Why S4?
| Problem | Solution |
|---|---|
| Your S3 bill grows linearly with data, but most data is ≥3× compressible | S4 compresses on the way in, charging you only for the squished bytes |
| Your apps don't compress data themselves (and you don't want to change them) | S4 is a wire-compatible drop-in — just change --endpoint-url |
| Existing object-storage compressors (MinIO S2, Garage zstd) are CPU-only | S4 supports nvCOMP GPU codecs — Bitcomp gives 3.6–7.5× on integer columns |
| Analytics workloads need byte-range reads | S4 supports Range GET via sidecar frame index (parquet/ORC reader compatible) |
Quick Start
60-second local trial (Docker, CPU-only)
&&
# Use any S3 client. Below uses aws-cli; replace endpoint with anything.
# Inspect the compressed object directly on MinIO (different endpoint):
Try with GPU compression (NVIDIA nvCOMP)
# Requires NVIDIA Container Toolkit + a CUDA-capable GPU
See docker-compose.gpu.yml for details.
Build from source
NVCOMP_HOME=/path/to/nvcomp
How it Compares
| Feature | S4 | MinIO (built-in S2) | Garage | Wasabi / B2 | AWS S3 |
|---|---|---|---|---|---|
| S3 API compatibility | ✅ Full | ✅ Full | ⚠️ Subset | ✅ Full | ✅ Native |
| GPU compression | ✅ nvCOMP zstd / Bitcomp / gANS | ❌ | ❌ | ❌ | ❌ |
| CPU compression | ✅ zstd 1–22 | ⚠️ S2 only | ✅ zstd 1–22 | ❌ | ❌ |
| Auto codec selection | ✅ entropy + magic-byte sampling | ❌ | ❌ | — | — |
| Range GET on compressed | ✅ via sidecar frame index | n/a | n/a | ✅ | ✅ |
| Streaming I/O | ✅ TTFB ms-class, ~10 MiB peak | ✅ | ✅ | ✅ | ✅ |
| Acts as gateway to existing S3 | ✅ | ❌ (gateway mode removed) | ❌ | ❌ | n/a |
| License | Apache-2.0 | AGPLv3 / commercial | AGPLv3 | proprietary | proprietary |
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ S4 server │
│ ┌──────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ s3s framework │→ │ S4Service │→ │ s3s_aws::Proxy │ → │ → backend (AWS S3 / MinIO)
│ │ (HTTP + SigV4) │ │ (compress hook) │ │ (aws-sdk-s3) │ │
│ └──────────────────┘ └────────┬────────┘ └────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ s4-codec::CodecRegistry (multi-codec dispatch by id) │ │
│ │ ├─ Passthrough (no compression) │ │
│ │ ├─ CpuZstd (zstd-rs, streaming) │ │
│ │ ├─ NvcompZstd (nvCOMP, GPU, batch) │ │
│ │ └─ NvcompBitcomp (nvCOMP, integer columns) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ s4-codec::CodecDispatcher │ │
│ │ ├─ AlwaysDispatcher │ │
│ │ └─ SamplingDispatcher (entropy + 14 magic bytes) │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
▲ ▲ ▲ ▲
│ │ │ │
/health /ready /metrics OTLP traces
(probe) (probe) (Prometheus) (Jaeger / X-Ray)
Production Features
Streaming I/O
- Streaming GET for non-multipart
cpu-zstd/passthroughobjects: TTFB ms-class, memory ≈ zstd window + 64 KiB buffer - Streaming PUT for the same codecs: input never fully buffered, peak memory ≈ compressed size (5 GB → ~50 MB at 100× ratio)
- Multipart per-part compression: each part compressed and frame-encoded
(
S4F2magic), per-frame codec dispatch (mixed codecs in one object) - Range GET via sidecar
<key>.s4index: only the needed compressed bytes are fetched from backend, decoded, and sliced. Falls back to full read when sidecar is absent.
Observability
/health— liveness probe, always 200 OK/ready— readiness probe, runsListBucketsagainst the backend/metrics— Prometheus text format (s4_requests_total{op,codec,result},s4_bytes_in_total,s4_bytes_out_total,s4_request_latency_seconds)- Structured JSON logs (
--log-format json) with per-request fields:op,bucket,key,codec,bytes_in,bytes_out,ratio,latency_ms,ok - OpenTelemetry traces (
--otlp-endpoint http://collector:4317) — each PUT/GET emitted ass4.put_object/s4.get_objectspan with semantic attributes; export to Jaeger / Tempo / Grafana / AWS X-Ray.
Data Integrity
- CRC32C stored per-object (single PUT) or per-frame (multipart), verified on GET
copy_objectS4-aware: source'ss4-*metadata is preserved acrossMetadataDirective: REPLACE(prevents silent corruption of the destination)- Zstd decompression bomb hardening:
Decoder + take(manifest.original_size + margin)caps memory regardless of an attacker-controlled manifest claim
S3 API coverage (45+ ops)
- Compression hook:
put_object,get_object,upload_part - Range GET: full S3 spec (
bytes=N-M,bytes=-N,bytes=N-) - Multipart:
create_multipart_upload,upload_part,complete_multipart_upload,abort_multipart_upload,list_parts,list_multipart_uploads - Phase 2 delegations (passthrough): ACL, Tagging, Lifecycle, Versioning, Replication, CORS, Encryption, Logging, Notification, Website, Object Lock, Public Access Block, ...
- Hidden:
*.s4indexsidecars are filtered fromlist_objects[_v2]responses
Testing & Validation
| Tier | What runs | Where | Pass count |
|---|---|---|---|
| Unit + integration | parsers, registry, blob helpers, S3 trait | every push (CI) | 51 |
| proptest fuzz | 38 properties × 256–10K cases (push), × 1M (nightly) | every push + nightly | 38 |
| bolero coverage-guided | 7 targets, libfuzzer engine | nightly (matrix, 30 min × 5) | 7 |
| fuzz canary | proves fuzz framework is alive | every push | 3 |
| Docker MinIO E2E | full HTTP wire + SigV4 against real MinIO | nightly | 10 |
| GPU codec E2E | real CUDA, nvCOMP zstd/Bitcomp roundtrip | manual (--features nvcomp-gpu) |
4 |
| Soak / load | 24h sustained load, RSS / FD / connection leak detection | manual (scripts/soak/run.sh) |
continuous |
99 default tests + 10 ignored E2E + 4 GPU + canary = 116+ tests, plus PROPTEST_CASES=10000 stress run on every push (~73 sec, 380K fuzz cases), 1M cases × 38 properties nightly (~6 h, 38M+ fuzz cases).
Two real bugs already caught by fuzz infrastructure:
FrameIterinfinite-loop on 1-byte input (DoS) — fixed withfused: boolcpu_zstd::decompresscould OOM on attacker-controlled manifest claim — fixed withDecoder + take(limit)
PROPTEST_CASES=100000
NVCOMP_HOME=...
Configuration
| CLI flag | Default | Description |
|---|---|---|
--endpoint-url |
(required) | Backend S3 endpoint (e.g. https://s3.us-east-1.amazonaws.com) |
--host |
127.0.0.1 |
Bind host |
--port |
8014 |
Bind port |
--domain |
(none) | Virtual-hosted-style requests domain |
--codec |
cpu-zstd |
Default codec: passthrough, cpu-zstd, nvcomp-zstd, nvcomp-bitcomp |
--zstd-level |
3 |
CPU zstd compression level (1–22) |
--dispatcher |
sampling |
always (use --codec) or sampling (entropy + magic byte) |
--log-format |
pretty |
pretty (terminal) or json (CloudWatch / fluent-bit) |
--otlp-endpoint |
(none) | OpenTelemetry OTLP gRPC endpoint |
--service-name |
s4 |
OTel resource service.name |
AWS credentials are read from the standard AWS chain (AWS_ACCESS_KEY_ID /
AWS_SECRET_ACCESS_KEY / AWS_PROFILE / IAM role on EC2).
On-the-wire Format
S4 stores data as either:
Single PUT (non-framed, used for one-shot put_object)
S3 metadata holds the manifest:
x-amz-meta-s4-codec: passthrough | cpu-zstd | nvcomp-zstd | ...
x-amz-meta-s4-original-size: <decoded bytes>
x-amz-meta-s4-compressed-size: <stored bytes>
x-amz-meta-s4-crc32c: <CRC32C of original bytes>
Object body is the raw compressed bytes.
Multipart (framed, S4F2 magic, per-part compression)
x-amz-meta-s4-multipart: true
x-amz-meta-s4-codec: <default codec for the object>
Object body is a sequence of:
┌──────────── 28-byte frame header ────────────┐
│ "S4F2" │ codec_id u32 │ orig u64 │ comp u64 │ crc32c u32 │ payload (comp bytes)
└────────────────────────────────────────────────┘
(optional) ┌──── padding ────┐
│ "S4P1" │ len u64 │ <len zero bytes>
└─────────────────┘
A sidecar object <key>.s4index (binary, S4IX magic) maps decompressed
byte ranges to compressed byte offsets — used by Range GET to fetch only the
needed bytes from S3.
Project Status
- Phase 1 + 2.0 + 2.1 complete (24 commits, 116+ tests, fuzz / soak / OTel / Prometheus all wired)
- Production-ready for log archival, data lake, parquet/ORC analytics
- Known limitations / Phase 2.2 plans:
- GPU streaming compress (currently bytes-buffered, batch-API): per-chunk pipeline + framed-everywhere unification
- Multipart final-part padding trim (typical workloads not affected; up to 5 MiB overhead per object on highly-compressible last parts)
upload_part_copybyte-range awareness (currently passes through)- Single-PUT sidecar (currently multipart-only)
Contributing
Pull requests are welcome! See CONTRIBUTING.md for the development setup, coding conventions, and the test/fuzz/soak protocol.
By contributing, you agree your contributions will be licensed under Apache-2.0 (no separate CLA required).
Security
Found a vulnerability? Please do not open a public issue. Instead, follow SECURITY.md for coordinated disclosure.
License
Licensed under the Apache License, Version 2.0 (LICENSE).
See NOTICE for third-party attributions including the vendored
ferro-compress (Apache-2.0 OR MIT) and the optional NVIDIA nvCOMP SDK
(proprietary, BYO).
"S4" and "Squished S3" are unregistered trademarks of abyo software 合同会社.
"Amazon S3" and "AWS" are trademarks of Amazon.com, Inc. S4 is not
affiliated with, endorsed by, or sponsored by Amazon.
Authors
- abyo software 合同会社 — sponsoring organization, commercial AMI distribution
- masumi-ryugo — original author / maintainer
Looking for the Japanese-language version? → README.ja.md