flow-fcs-compress
Column-oriented compression codecs and container formats for FCS flow cytometry data.
Overview
FCS files store event data row-major (all parameters for one event contiguous), which is optimal for acquisition but suboptimal for analysis — reading a single channel requires touching every row. flow-fcs-compress provides column-major codecs that exploit per-channel statistical structure for 2.5–6× compression while enabling parallel, single-channel decode at 0.5–3 GB/s.
Features
| Feature | Description |
|---|---|
multithread (default) |
Rayon-parallel encode/decode |
pco-backend |
Alternative lossless codec via pco (Piecewise Coding) |
lz4-baseline |
LZ4 frame codec for comparison benchmarking |
Codecs
| Codec | ID | Fidelity | Ratio | Encode | Decode (1T) | Decode (MT) |
|---|---|---|---|---|---|---|
| Mode A | LosslessF32 |
Lossless (bit-exact f32) | 2.5–3.2× | 400–450 MB/s | 0.7–1.0 GB/s | 2–3 GB/s |
| Mode B | AdcBitpack |
Lossless (bit-exact f32) | 2–4× | 1.0–1.1 GB/s | 1.5–2.5 GB/s | 3–5 GB/s |
| Mode C | LogQuant |
Lossy (≤ 0.09% rel error, ±0.5 ADC bin) | 4–6× | 350–400 MB/s | 2–3 GB/s | 4–6 GB/s |
| Pco | LosslessF32Pco |
Lossless (bit-exact f32) | 2.8–3.5× | 200–250 MB/s | 0.8–1.2 GB/s | 2–3 GB/s |
| LZ4 | Lz4Baseline |
Lossless (bit-exact f32) | 1.5–2× | 350–400 MB/s | 2–3 GB/s | 4–6 GB/s |
- Mode A (byte-stream-split + zstd): best ratio among lossless codecs for floating-point channels.
- Mode B (bit-reservoir bitpack): packs values at ADC resolution (
$PnRbits). Fastest encoder; ~3.5× faster decode than Mode A. - Mode C (arcsinh + fixed-point quantize): lossy — precision loss is ≤ ±0.5 of the least-significant ADC bit (sub-0.1% relative error away from zero). Uses sinh LUT for decode when bit width ≤ 14. Appropriate when downstream analysis already applies arcsinh transforms.
- Pco: highest ratio on integer-stored-as-f32 data (common in 16-bit instruments), slower encode.
- LZ4: fast baseline with modest ratio; useful for streaming/transient storage.
Throughput measured on Apple M1 Max (10-core), 80–1024 MB datasets. 1T = single-threaded, MT = rayon parallel.
Container Formats
.fcz Native Container
Memory-mapped, chunk-indexed format for zero-copy random access:
use ;
use ;
// Write
let mut writer = create?;
writer.set_fcs_text?;
let ch_idx = writer.add_channel?;
writer.write_chunk?;
writer.finish?;
// Read
let reader = open?;
reader.warm_cache; // prefault pages for benchmarking
let fsc_a = reader.read_full_channel?;
// Parallel decode all channels
let mut buffers = vec!;
reader.decode_all_par?;
Inline FCS Payload
Embeds compressed column data inside a standard FCS file's DATA segment with a $COMPRESSION = FCZ1 keyword:
use ;
let payload = encode_inline?;
// payload bytes go into the FCS DATA segment
let decoded = decode_inline?;
for ch in &decoded
Auto Codec Selection
use pick_codec;
let codec_id = pick_codec;
// Never selects a lossy codec unless allow_lossy = true
Architecture
┌─────────────────────────────────────────────┐
│ Container layer (.fcz / inline FCS) │
│ - Chunk indexing, mmap, parallel I/O │
├─────────────────────────────────────────────┤
│ Codec layer (ColumnCodec trait) │
│ - encode_chunk / decode_chunk │
│ - Per-channel, per-chunk granularity │
├─────────────────────────────────────────────┤
│ Transform layer (pre-processing) │
│ - Byte-stream split (f32 → 4 streams) │
│ - Arcsinh log-space mapping │
└─────────────────────────────────────────────┘
Scope
This crate owns:
- Column-oriented compression codecs for f32 event data
.fczcontainer format (write, read, mmap, parallel decode)- Inline FCS DATA-segment compression payload
- Pre-compression transforms (byte-stream split, arcsinh)
- Codec auto-selection based on channel characteristics
- (Future) Streaming encode for acquisition pipelines
- (Future) Parquet sidecar integration
It does not own: FCS file parsing/writing (see flow-fcs), analysis algorithms, or visualization.
Benchmarks
# Codec microbenchmarks (Criterion)
# Full-file benchmarks (requires FCS test data)
Tests
37 unit tests covering codec roundtrips, chunk splitting, container I/O, transform correctness, and auto-selection logic.
ISAC Proposal
This crate includes a draft proposal for the ISAC FCS Working Group to standardize compression and column-major layout in the FCS specification. See docs/isac-proposal.md.
License
MIT