linear-srgb

Fast, SIMD-accelerated sRGB↔linear conversion for image processing pipelines.
Handles f32, f64, u8, and u16 data. Supports in-place RGBA with alpha
preservation, fused premultiply/unpremultiply, custom gamma, and extended range.
no_std compatible.
[]
= "0.6"
# Optional: BT.709, PQ (HDR10), and HLG transfer functions
= { = "0.6", = ["transfer"] }
Quick Start
Use the slice functions. They're SIMD-accelerated (AVX-512, AVX2, SSE4.1, NEON, WASM SIMD128) with automatic runtime CPU dispatch — typically 4–16x faster than scalar loops.
use *;
// f32 slices (in-place, SIMD-accelerated)
let mut values = vec!;
srgb_to_linear_slice;
linear_to_srgb_slice;
For RGBA data, use the _rgba_ variants — they convert only the RGB channels
and leave alpha untouched. This matters: alpha is linear by definition, so
applying the sRGB transfer function to it is a bug.
use *;
// RGBA f32 — alpha channel preserved, only RGB converted
let mut rgba = vec!;
srgb_to_linear_rgba_slice;
assert_eq!; // alpha untouched
Type conversions
Convert directly between integer sRGB and linear f32 without intermediate steps.
use *;
// u8 sRGB → linear f32 (LUT-based, extremely fast)
let srgb_bytes: = vec!;
let mut linear = vec!;
srgb_u8_to_linear_slice;
// linear f32 → sRGB u8 (SIMD-accelerated)
let mut srgb_out = vec!;
linear_to_srgb_u8_slice;
// RGBA u8 → linear f32 (alpha passed through as a/255, not sRGB-decoded)
let rgba_bytes = vec!;
let mut rgba_linear = vec!;
srgb_u8_to_linear_rgba_slice;
// u16 support too — decode via 65536-entry LUT (30-40× faster than polynomial)
let mut u16_linear = vec!;
let srgb_u16: = .map.collect;
srgb_u16_to_linear_slice;
u16 encode: exact vs fast
Two paths for linear f32 → sRGB u16, depending on whether you need perfect roundtrip or maximum throughput:
use *;
let linear = srgb_u16_to_linear; // LUT decode (always fast, exact)
// Exact roundtrip (polynomial, ~89 Mops/s)
let exact = linear_to_srgb_u16;
// Fast encode (sqrt-indexed LUT, ~609 Mops/s, max ±1 level)
let fast = linear_to_srgb_u16_fast;
Slice variants: linear_to_srgb_u16_slice / linear_to_srgb_u16_slice_fast,
linear_to_srgb_u16_rgba_slice / linear_to_srgb_u16_rgba_slice_fast.
Premultiplied alpha (fused, single-pass)
Convert between sRGB straight-alpha and linear premultiplied alpha in one SIMD pass — no intermediate buffer, no second memory traversal.
use *;
// sRGB straight → linear premultiplied (f32 in-place)
let mut rgba = vec!;
srgb_to_linear_premultiply_rgba_slice;
// linear premultiplied → sRGB straight (f32 in-place)
unpremultiply_linear_to_srgb_rgba_slice;
// Also available as u8→f32 and f32→u8:
// srgb_u8_to_linear_premultiply_rgba_slice(&srgb_bytes, &mut linear_premul);
// unpremultiply_linear_to_srgb_u8_rgba_slice(&linear_premul, &mut srgb_out);
Single values
When you only need one value at a time (not a batch):
use *;
// f32 — rational polynomial (≤10 ULP max, perfectly monotonic)
let linear = srgb_to_linear;
let srgb = linear_to_srgb;
// u8 — LUT-based, zero math
let linear = srgb_u8_to_linear;
let srgb_byte = linear_to_srgb_u8;
// u16 — LUT-based
let linear = srgb_u16_to_linear;
let srgb_u16 = linear_to_srgb_u16;
Extended range (cross-gamut, HDR)
For values outside [0, 1] from gamut matrix conversions (P3→sRGB, BT.2020→sRGB):
use *;
// SIMD-accelerated, sign-preserving (CSS Color 4)
// Uses 6/6 rational polynomials — no powf, pure SIMD
let mut values = vec!;
srgb_to_linear_extended_slice;
linear_to_srgb_extended_slice;
The extended slice functions use purpose-fitted 6/6 rational polynomials with wider domains than the clamped path's 4/4. The S2L polynomial covers |encoded| ≤ 8 (u8-safe) / ≤ ~4.2 (u16-safe). The L2S polynomial covers |linear| ≤ 64 at u16 precision.
For exact powf-based extended range (scalar, any range):
use *;
let linear = srgb_to_linear_extended;
let srgb = linear_to_srgb_extended;
Precise (powf) conversions
For maximum accuracy:
use *;
// f32 — exact powf, C0-continuous (6 ULP max)
let linear = srgb_to_linear;
let srgb = linear_to_srgb;
// f64 high-precision
let linear = srgb_to_linear_f64;
Custom gamma
For pure power-law gamma (no linear toe segment) — gamma 2.2, 1.8, etc.:
use *;
let linear = gamma_to_linear;
let encoded = linear_to_gamma;
// SIMD-accelerated slices
let mut values = vec!;
gamma_to_linear_slice;
// Fused premultiply/unpremultiply also available:
// gamma_to_linear_premultiply_rgba_slice(&mut rgba, 2.2);
// unpremultiply_linear_to_gamma_rgba_slice(&mut rgba, 2.2);
HDR transfer functions (transfer feature)
BT.709, PQ (ST 2084 / HDR10), and HLG (ARIB STD-B67) — scalar and SIMD.
= { = "0.6", = ["transfer"] }
use *;
let linear = pq_to_linear; // PQ (HDR10) → linear
let pq = linear_to_pq;
let linear = hlg_to_linear; // HLG → linear
let linear = bt709_to_linear; // BT.709 → linear
LUT for custom bit depths
use ;
// 16-bit linearization (65536 entries)
let lut = new;
let linear = lut.lookup;
// Interpolated encoding
let encode_lut = new;
let srgb = lut_interp_linear_float;
API Summary
| Data | Function |
|---|---|
&mut [f32] |
srgb_to_linear_slice / linear_to_srgb_slice |
RGBA &mut [f32] |
srgb_to_linear_rgba_slice / linear_to_srgb_rgba_slice |
| RGBA f32 premultiply | srgb_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_srgb_rgba_slice |
&[u8] ↔ &mut [f32] |
srgb_u8_to_linear_slice / linear_to_srgb_u8_slice |
RGBA &[u8] ↔ &mut [f32] |
srgb_u8_to_linear_rgba_slice / linear_to_srgb_u8_rgba_slice |
| RGBA u8↔f32 premultiply | srgb_u8_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_srgb_u8_rgba_slice |
&[u16] ↔ &mut [f32] |
srgb_u16_to_linear_slice / linear_to_srgb_u16_slice |
Extended range &mut [f32] |
srgb_to_linear_extended_slice / linear_to_srgb_extended_slice |
Custom gamma &mut [f32] |
gamma_to_linear_slice / linear_to_gamma_slice |
| Custom gamma RGBA premul | gamma_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_gamma_rgba_slice |
| Single f32 | srgb_to_linear / linear_to_srgb |
| Single u8 | srgb_u8_to_linear / linear_to_srgb_u8 |
| Single u16 | srgb_u16_to_linear / linear_to_srgb_u16 |
| Exact powf f32/f64 | precise::srgb_to_linear / precise::linear_to_srgb |
| Extended range (scalar) | precise::srgb_to_linear_extended / precise::linear_to_srgb_extended |
All functions live in linear_srgb::default unless noted.
Accuracy
Transfer function constants
All code paths use C0-continuous constants derived from the moxcms reference implementation. These adjust the IEC 61966-2-1 offset from 0.055 to 0.055011 and the threshold from 0.04045 to 0.03929, making the piecewise transfer function mathematically continuous (~2.3e-9 gap eliminated).
At u8 precision the two constant sets produce identical values. At u16, the max difference is ~1 LSB near the threshold. See docs/iec.md for a detailed comparison.
For interop with software that uses the original IEC textbook constants, enable
the iec feature for linear_srgb::iec::srgb_to_linear /
linear_srgb::iec::linear_to_srgb.
Accuracy summary (exhaustive f32 sweep)
Exhaustive f32 sweep (all ~1B values in [0, 1]) against f64 reference. "SIMD" rows measured via the actual dispatched SIMD path (f32 FMA evaluation). "Scalar" rows use f64 intermediate precision.
| Path | Max ULP | Avg ULP | Monotonic | Fitted domain |
|---|---|---|---|---|
default s→l (4/4 scalar) |
8 | 0.18 | yes | [0, 1] |
default l→s (4/4 scalar) |
10 | 0.32 | yes | [0, 1] |
default s→l (4/4 SIMD) |
4 | 0.09 | yes | [0, 1] |
default l→s (4/4 SIMD) |
5 | 0.10 | yes | [0, 1] |
extended_slice s→l (6/6 SIMD) |
8* | 0.12 | yes | [0, 8] |
extended_slice l→s (6/6 SIMD) |
8* | 0.17 | yes | [0, 64] |
precise s→l (powf) |
6 | 0.1 | yes | unbounded |
precise l→s (powf) |
3 | 0.1 | yes | unbounded |
*The 6/6 extended polynomials use larger coefficients to cover a wider domain, which costs ~2 ULP vs the clamped 4/4 in a narrow band near the piecewise threshold (0.04–0.05). Affects < 0.1% of values; avg ULP is comparable.
What does 10 ULP mean in practice? 1 ULP (unit in the last place) is the spacing between adjacent f32 values at a given magnitude. At 0.5 that's ~6e-8, so 10 ULP ≈ 6e-7 — about 6 decimal digits of precision. At 0.01 it's ~1e-8. For any 8-bit or 16-bit output, this error is invisible — it's thousands of times smaller than one output level.
Reference: C0-continuous f64 powf. The scalar rational polynomial evaluates in f64 intermediate precision, guaranteeing perfect monotonicity (zero reversals across all ~1B f32 values in [0, 1]). SIMD paths use f32 evaluation for throughput and are also monotonic within each segment.
Feature Flags
std(default): Required for runtime SIMD dispatchavx512(default): AVX-512 code paths (16-wide f32)transfer: BT.709, PQ, HLG transfer functions (scalar + SIMD)iec: IEC 61966-2-1 textbook sRGB functions for legacy interopalt: Alternative/experimental implementations for benchmarking
# no_std (requires alloc for LUT generation)
= { = "0.6", = false }
Module Organization
default— Recommended API. Rational polynomial for f32, LUT for integers, SIMD for slices.precise— Exactpowf()conversions with C0-continuous constants (not IEC textbook). f32/f64, extended range.lut— Lookup tables for custom bit depths (10-bit, 12-bit, 16-bit).tf— Transfer functions: BT.709, PQ, HLG. Requirestransferfeature.iec— IEC 61966-2-1 textbook constants for legacy interop. Requiresiecfeature.tokens— Inlineable#[rite]functions for embedding in SIMD pipelines (see below).
Embedding in SIMD Pipelines (tokens module)
If you're writing your own SIMD code with archmage,
the tokens module provides #[rite] functions that inline directly into your
#[arcane] functions — zero dispatch overhead.
use x8;
use arcane;
Image tech I maintain
| State of the art codecs* | zenjpeg · zenpng · zenwebp · zengif · zenavif (rav1d-safe · zenrav1e · zenavif-parse · zenavif-serialize) · zenjxl (jxl-encoder · zenjxl-decoder) · zentiff · zenbitmaps · heic · zenraw · zenpdf · ultrahdr · mozjpeg-rs · webpx |
| Compression | zenflate · zenzop |
| Processing | zenresize · zenfilters · zenquant · zenblend |
| Metrics | zensim · fast-ssim2 · butteraugli · resamplescope-rs · codec-eval · codec-corpus |
| Pixel types & color | zenpixels · zenpixels-convert · linear-srgb · garb |
| Pipeline | zenpipe · zencodec · zencodecs · zenlayout · zennode |
| ImageResizer | ImageResizer (C#) — 24M+ NuGet downloads across all packages |
| Imageflow | Image optimization engine (Rust) — .NET · node · go — 9M+ NuGet downloads across all packages |
| Imageflow Server | The fast, safe image server (Rust+C#) — 552K+ NuGet downloads, deployed by Fortune 500s and major brands |
* as of 2026
General Rust awesomeness
archmage · magetypes · enough · whereat · zenbench · cargo-copter
And other projects · GitHub @imazen · GitHub @lilith · lib.rs/~lilith · NuGet (over 30 million downloads / 87 packages)
License
MIT OR Apache-2.0
AI-Generated Code Notice
Developed with Claude (Anthropic). All code has been reviewed and benchmarked, but verify critical paths for your use case.