S4 — Squished S3

Drop-in S3-compatible storage gateway with GPU-accelerated transparent compression. Cuts your AWS S3 bill 50–80% without changing a single line of application code.

日本語版 README → README.ja.md

What is S4?

S4 (Squished S3) is an S3-compatible storage gateway written in Rust that sits between your applications (boto3 / aws-sdk / aws-cli / Spark / Trino / DuckDB / anything S3) and your real S3 bucket — and transparently compresses every object with GPU codecs (NVIDIA nvCOMP zstd / Bitcomp / gANS) or CPU zstd before storing it.

                        endpoint: s4.example.com
   your application ──────────────────────────▶  S4 (this project)
   (boto3, Spark,                                       │
    Trino, ...)                                         ▼
                                            (compress with GPU)
                                                        │
                                                        ▼
                                                 AWS S3 (real bucket)

No app changes: same S3 wire protocol, same SigV4 auth, same SDK calls
Transparent: PUT compresses, GET decompresses; clients see the original bytes
No lock-in: stop the gateway, read your bucket directly with aws-cli

Why S4?

Problem	Solution
Your S3 bill grows linearly with data, but most data is ≥3× compressible	S4 compresses on the way in, charging you only for the squished bytes
Your apps don't compress data themselves (and you don't want to change them)	S4 is a wire-compatible drop-in — just change `--endpoint-url`
Existing object-storage compressors (MinIO S2, Garage zstd) are CPU-only	S4 supports nvCOMP GPU codecs — Bitcomp gives 3.6–7.5× on integer columns
Analytics workloads need byte-range reads	S4 supports `Range` GET via sidecar frame index (parquet/ORC reader compatible)

Quick Start

60-second local trial (Docker, CPU-only)

git clone https://github.com/abyo-software/s4 && cd s4
docker compose up -d                    # MinIO + S4 server on localhost:8014

# Use any S3 client. Below uses aws-cli; replace endpoint with anything.
aws --endpoint-url http://localhost:8014 s3 mb s3://demo
aws --endpoint-url http://localhost:8014 s3 cp big.log s3://demo/big.log
aws --endpoint-url http://localhost:8014 s3 cp s3://demo/big.log -

# Inspect the compressed object directly on MinIO (different endpoint):
aws --endpoint-url http://localhost:9000 s3 cp s3://demo/big.log -.compressed
ls -la big.log -.compressed             # the .compressed file is much smaller

Try with GPU compression (NVIDIA nvCOMP)

# Requires NVIDIA Container Toolkit + a CUDA-capable GPU
docker compose -f docker-compose.gpu.yml up -d
aws --endpoint-url http://localhost:8014 s3 cp parquet-file.parq s3://demo/

See docker-compose.gpu.yml for details.

Build from source

cargo build --release --workspace                       # CPU-only
NVCOMP_HOME=/path/to/nvcomp cargo build --release --workspace --features s4-server/nvcomp-gpu

target/release/s4 --endpoint-url https://s3.us-east-1.amazonaws.com \
    --host 0.0.0.0 --port 8014 --codec cpu-zstd --log-format json

How it Compares

Feature	S4	MinIO (built-in S2)	Garage	Wasabi / B2	AWS S3
S3 API compatibility	✅ Full	✅ Full	⚠️ Subset	✅ Full	✅ Native
GPU compression	✅ nvCOMP zstd / Bitcomp / gANS	❌	❌	❌	❌
CPU compression	✅ zstd 1–22	⚠️ S2 only	✅ zstd 1–22	❌	❌
Auto codec selection	✅ entropy + magic-byte sampling	❌	❌	—	—
Range GET on compressed	✅ via sidecar frame index	n/a	n/a	✅	✅
Streaming I/O	✅ TTFB ms-class, ~10 MiB peak	✅	✅	✅	✅
Acts as gateway to existing S3	✅	❌ (gateway mode removed)	❌	❌	n/a
License	Apache-2.0	AGPLv3 / commercial	AGPLv3	proprietary	proprietary

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                          S4 server                               │
│  ┌──────────────────┐  ┌─────────────────┐  ┌────────────────┐   │
│  │ s3s framework    │→ │ S4Service       │→ │ s3s_aws::Proxy │ → │ → backend (AWS S3 / MinIO)
│  │ (HTTP + SigV4)   │  │ (compress hook) │  │ (aws-sdk-s3)   │   │
│  └──────────────────┘  └────────┬────────┘  └────────────────┘   │
│                                 ▼                                │
│  ┌─────────────────────────────────────────────────────────┐     │
│  │ s4-codec::CodecRegistry  (multi-codec dispatch by id)   │     │
│  │   ├─ Passthrough          (no compression)              │     │
│  │   ├─ CpuZstd              (zstd-rs, streaming)          │     │
│  │   ├─ NvcompZstd           (nvCOMP, GPU, batch)          │     │
│  │   └─ NvcompBitcomp        (nvCOMP, integer columns)     │     │
│  └─────────────────────────────────────────────────────────┘     │
│  ┌─────────────────────────────────────────────────────────┐     │
│  │ s4-codec::CodecDispatcher                               │     │
│  │   ├─ AlwaysDispatcher                                   │     │
│  │   └─ SamplingDispatcher  (entropy + 14 magic bytes)     │     │
│  └─────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────┘
        ▲              ▲              ▲                ▲
        │              │              │                │
   /health         /ready         /metrics         OTLP traces
   (probe)        (probe)       (Prometheus)       (Jaeger / X-Ray)

Production Features

Streaming I/O

Streaming GET for non-multipart cpu-zstd / passthrough objects: TTFB ms-class, memory ≈ zstd window + 64 KiB buffer
Streaming PUT for the same codecs: input never fully buffered, peak memory ≈ compressed size (5 GB → ~50 MB at 100× ratio)
Multipart per-part compression: each part compressed and frame-encoded (S4F2 magic), per-frame codec dispatch (mixed codecs in one object)
Range GET via sidecar <key>.s4index: only the needed compressed bytes are fetched from backend, decoded, and sliced. Falls back to full read when sidecar is absent.

Observability

/health — liveness probe, always 200 OK
/ready — readiness probe, runs ListBuckets against the backend
/metrics — Prometheus text format (s4_requests_total{op,codec,result}, s4_bytes_in_total, s4_bytes_out_total, s4_request_latency_seconds)
Structured JSON logs (--log-format json) with per-request fields: op, bucket, key, codec, bytes_in, bytes_out, ratio, latency_ms, ok
OpenTelemetry traces (--otlp-endpoint http://collector:4317) — each PUT/GET emitted as s4.put_object / s4.get_object span with semantic attributes; export to Jaeger / Tempo / Grafana / AWS X-Ray.

Data Integrity

CRC32C stored per-object (single PUT) or per-frame (multipart), verified on GET
copy_object S4-aware: source's s4-* metadata is preserved across MetadataDirective: REPLACE (prevents silent corruption of the destination)
Zstd decompression bomb hardening: Decoder + take(manifest.original_size + margin) caps memory regardless of an attacker-controlled manifest claim

S3 API coverage (45+ ops)

Compression hook: put_object, get_object, upload_part
Range GET: full S3 spec (bytes=N-M, bytes=-N, bytes=N-)
Multipart: create_multipart_upload, upload_part, complete_multipart_upload, abort_multipart_upload, list_parts, list_multipart_uploads
Phase 2 delegations (passthrough): ACL, Tagging, Lifecycle, Versioning, Replication, CORS, Encryption, Logging, Notification, Website, Object Lock, Public Access Block, ...
Hidden: *.s4index sidecars are filtered from list_objects[_v2] responses

Testing & Validation

Tier	What runs	Where	Pass count
Unit + integration	parsers, registry, blob helpers, S3 trait	every push (CI)	51
proptest fuzz	38 properties × 256–10K cases (push), × 1M (nightly)	every push + nightly	38
bolero coverage-guided	7 targets, libfuzzer engine	nightly (matrix, 30 min × 5)	7
fuzz canary	proves fuzz framework is alive	every push	3
Docker MinIO E2E	full HTTP wire + SigV4 against real MinIO	nightly	10
GPU codec E2E	real CUDA, nvCOMP zstd/Bitcomp roundtrip	manual (`--features nvcomp-gpu`)	4
Soak / load	24h sustained load, RSS / FD / connection leak detection	manual (`scripts/soak/run.sh`)	continuous

99 default tests + 10 ignored E2E + 4 GPU + canary = 116+ tests, plus PROPTEST_CASES=10000 stress run on every push (~73 sec, 380K fuzz cases), 1M cases × 38 properties nightly (~6 h, 38M+ fuzz cases).

Two real bugs already caught by fuzz infrastructure:

FrameIter infinite-loop on 1-byte input (DoS) — fixed with fused: bool
cpu_zstd::decompress could OOM on attacker-controlled manifest claim — fixed with Decoder + take(limit)

cargo test --workspace                   # default
cargo test --workspace -- --ignored --test-threads=1   # E2E (Docker required)
PROPTEST_CASES=100000 cargo test --workspace --release --test fuzz_parsers --test fuzz_server --test fuzz_advanced
NVCOMP_HOME=... cargo test --workspace --features s4-server/nvcomp-gpu -- --ignored
./scripts/soak/run.sh                    # 24 h soak (Marketplace pre-release)

Configuration

CLI flag	Default	Description
`--endpoint-url`	(required)	Backend S3 endpoint (e.g. `https://s3.us-east-1.amazonaws.com`)
`--host`	`127.0.0.1`	Bind host
`--port`	`8014`	Bind port
`--domain`	(none)	Virtual-hosted-style requests domain
`--codec`	`cpu-zstd`	Default codec: `passthrough`, `cpu-zstd`, `nvcomp-zstd`, `nvcomp-bitcomp`
`--zstd-level`	`3`	CPU zstd compression level (1–22)
`--dispatcher`	`sampling`	`always` (use `--codec`) or `sampling` (entropy + magic byte)
`--log-format`	`pretty`	`pretty` (terminal) or `json` (CloudWatch / fluent-bit)
`--otlp-endpoint`	(none)	OpenTelemetry OTLP gRPC endpoint
`--service-name`	`s4`	OTel resource `service.name`

AWS credentials are read from the standard AWS chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_PROFILE / IAM role on EC2).

On-the-wire Format

S4 stores data as either:

Single PUT (non-framed, used for one-shot `put_object`)

S3 metadata holds the manifest:

x-amz-meta-s4-codec:           passthrough | cpu-zstd | nvcomp-zstd | ...
x-amz-meta-s4-original-size:   <decoded bytes>
x-amz-meta-s4-compressed-size: <stored bytes>
x-amz-meta-s4-crc32c:          <CRC32C of original bytes>

Object body is the raw compressed bytes.

Multipart (framed, `S4F2` magic, per-part compression)

x-amz-meta-s4-multipart: true
x-amz-meta-s4-codec:     <default codec for the object>

Object body is a sequence of:

┌──────────── 28-byte frame header ────────────┐
│ "S4F2" │ codec_id u32 │ orig u64 │ comp u64  │ crc32c u32 │  payload (comp bytes)
└────────────────────────────────────────────────┘

(optional) ┌──── padding ────┐
           │ "S4P1" │ len u64 │ <len zero bytes>
           └─────────────────┘

A sidecar object <key>.s4index (binary, S4IX magic) maps decompressed byte ranges to compressed byte offsets — used by Range GET to fetch only the needed bytes from S3.

Project Status

Phase 1 + 2.0 + 2.1 complete (24 commits, 116+ tests, fuzz / soak / OTel / Prometheus all wired)
Production-ready for log archival, data lake, parquet/ORC analytics
Known limitations / Phase 2.2 plans:
- GPU streaming compress (currently bytes-buffered, batch-API): per-chunk pipeline + framed-everywhere unification
- Multipart final-part padding trim (typical workloads not affected; up to 5 MiB overhead per object on highly-compressible last parts)
- upload_part_copy byte-range awareness (currently passes through)
- Single-PUT sidecar (currently multipart-only)

Contributing

Pull requests are welcome! See CONTRIBUTING.md for the development setup, coding conventions, and the test/fuzz/soak protocol.

By contributing, you agree your contributions will be licensed under Apache-2.0 (no separate CLA required).

Security

Found a vulnerability? Please do not open a public issue. Instead, follow SECURITY.md for coordinated disclosure.

License

Licensed under the Apache License, Version 2.0 (LICENSE). See NOTICE for third-party attributions including the vendored ferro-compress (Apache-2.0 OR MIT) and the optional NVIDIA nvCOMP SDK (proprietary, BYO).

"S4" and "Squished S3" are unregistered trademarks of abyo software 合同会社. "Amazon S3" and "AWS" are trademarks of Amazon.com, Inc. S4 is not affiliated with, endorsed by, or sponsored by Amazon.

Authors

abyo software 合同会社 — sponsoring organization, commercial AMI distribution
masumi-ryugo — original author / maintainer

Looking for the Japanese-language version? → README.ja.md

s4-codec 0.1.0