linear-srgb
Fast, SIMD-accelerated sRGB↔linear conversion for image processing pipelines.
Handles f32, f64, u8, and u16 data. Supports in-place RGBA with alpha
preservation, fused premultiply/unpremultiply, custom gamma, and extended range.
no_std compatible.
[]
= "0.6"
# Optional: BT.709, PQ (HDR10), and HLG transfer functions
= { = "0.6", = ["transfer"] }
Quick Start
Use the slice functions. They're SIMD-accelerated (AVX-512, AVX2, SSE4.1, NEON, WASM SIMD128) with automatic runtime CPU dispatch — typically 4–16x faster than scalar loops.
use *;
// f32 slices (in-place, SIMD-accelerated)
let mut values = vec!;
srgb_to_linear_slice;
linear_to_srgb_slice;
For RGBA data, use the _rgba_ variants — they convert only the RGB channels
and leave alpha untouched. This matters: alpha is linear by definition, so
applying the sRGB transfer function to it is a bug.
use *;
// RGBA f32 — alpha channel preserved, only RGB converted
let mut rgba = vec!;
srgb_to_linear_rgba_slice;
assert_eq!; // alpha untouched
Type conversions
Convert directly between integer sRGB and linear f32 without intermediate steps.
use *;
// u8 sRGB → linear f32 (LUT-based, extremely fast)
let srgb_bytes: = vec!;
let mut linear = vec!;
srgb_u8_to_linear_slice;
// linear f32 → sRGB u8 (SIMD-accelerated)
let mut srgb_out = vec!;
linear_to_srgb_u8_slice;
// RGBA u8 → linear f32 (alpha passed through as a/255, not sRGB-decoded)
let rgba_bytes = vec!;
let mut rgba_linear = vec!;
srgb_u8_to_linear_rgba_slice;
// u16 support too
let mut u16_linear = vec!;
let srgb_u16: = .map.collect;
srgb_u16_to_linear_slice;
Premultiplied alpha (fused, single-pass)
Convert between sRGB straight-alpha and linear premultiplied alpha in one SIMD pass — no intermediate buffer, no second memory traversal.
use *;
// sRGB straight → linear premultiplied (f32 in-place)
let mut rgba = vec!;
srgb_to_linear_premultiply_rgba_slice;
// linear premultiplied → sRGB straight (f32 in-place)
unpremultiply_linear_to_srgb_rgba_slice;
// Also available as u8→f32 and f32→u8:
// srgb_u8_to_linear_premultiply_rgba_slice(&srgb_bytes, &mut linear_premul);
// unpremultiply_linear_to_srgb_u8_rgba_slice(&linear_premul, &mut srgb_out);
Single values
When you only need one value at a time (not a batch):
use *;
// f32 — rational polynomial (≤14 ULP max, perfectly monotonic)
let linear = srgb_to_linear;
let srgb = linear_to_srgb;
// u8 — LUT-based, zero math
let linear = srgb_u8_to_linear;
let srgb_byte = linear_to_srgb_u8;
// u16 — LUT-based
let linear = srgb_u16_to_linear;
let srgb_u16 = linear_to_srgb_u16;
Precise (powf) conversions
For maximum accuracy or extended-range (HDR/ICC) workflows:
use *;
// f32 — exact powf, C0-continuous (6 ULP max)
let linear = srgb_to_linear;
let srgb = linear_to_srgb;
// f64 high-precision
let linear = srgb_to_linear_f64;
// Extended range — no clamping, for cross-gamut / scRGB pipelines
let linear = srgb_to_linear_extended;
let srgb = linear_to_srgb_extended;
Custom gamma
For pure power-law gamma (no linear toe segment) — gamma 2.2, 1.8, etc.:
use *;
let linear = gamma_to_linear;
let encoded = linear_to_gamma;
// SIMD-accelerated slices
let mut values = vec!;
gamma_to_linear_slice;
// Fused premultiply/unpremultiply also available:
// gamma_to_linear_premultiply_rgba_slice(&mut rgba, 2.2);
// unpremultiply_linear_to_gamma_rgba_slice(&mut rgba, 2.2);
HDR transfer functions (transfer feature)
BT.709, PQ (ST 2084 / HDR10), and HLG (ARIB STD-B67) — scalar and SIMD.
= { = "0.6", = ["transfer"] }
use *;
let linear = pq_to_linear; // PQ (HDR10) → linear
let pq = linear_to_pq;
let linear = hlg_to_linear; // HLG → linear
let linear = bt709_to_linear; // BT.709 → linear
LUT for custom bit depths
use ;
// 16-bit linearization (65536 entries)
let lut = new;
let linear = lut.lookup;
// Interpolated encoding
let encode_lut = new;
let srgb = lut_interp_linear_float;
API Summary
| Data | Function |
|---|---|
&mut [f32] |
srgb_to_linear_slice / linear_to_srgb_slice |
RGBA &mut [f32] |
srgb_to_linear_rgba_slice / linear_to_srgb_rgba_slice |
| RGBA f32 premultiply | srgb_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_srgb_rgba_slice |
&[u8] ↔ &mut [f32] |
srgb_u8_to_linear_slice / linear_to_srgb_u8_slice |
RGBA &[u8] ↔ &mut [f32] |
srgb_u8_to_linear_rgba_slice / linear_to_srgb_u8_rgba_slice |
| RGBA u8↔f32 premultiply | srgb_u8_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_srgb_u8_rgba_slice |
&[u16] ↔ &mut [f32] |
srgb_u16_to_linear_slice / linear_to_srgb_u16_slice |
Custom gamma &mut [f32] |
gamma_to_linear_slice / linear_to_gamma_slice |
| Custom gamma RGBA premul | gamma_to_linear_premultiply_rgba_slice / unpremultiply_linear_to_gamma_rgba_slice |
| Single f32 | srgb_to_linear / linear_to_srgb |
| Single u8 | srgb_u8_to_linear / linear_to_srgb_u8 |
| Single u16 | srgb_u16_to_linear / linear_to_srgb_u16 |
| Exact powf f32/f64 | precise::srgb_to_linear / precise::linear_to_srgb |
| Extended range f32 | precise::srgb_to_linear_extended / precise::linear_to_srgb_extended |
All functions live in linear_srgb::default unless noted.
Accuracy
Transfer function constants
All code paths use C0-continuous constants derived from the moxcms reference implementation. These adjust the IEC 61966-2-1 offset from 0.055 to 0.055011 and the threshold from 0.04045 to 0.03929, making the piecewise transfer function mathematically continuous (~2.3e-9 gap eliminated).
At u8 precision the two constant sets produce identical values. At u16, the max difference is ~1 LSB near the threshold. See docs/iec.md for a detailed comparison.
For interop with software that uses the original IEC textbook constants, enable
the iec feature for linear_srgb::iec::srgb_to_linear /
linear_srgb::iec::linear_to_srgb.
Accuracy summary (exhaustive f32 sweep)
| Path | Max ULP | Avg ULP | Monotonic |
|---|---|---|---|
default s→l (rational poly) |
11 | ~0.5 | yes |
default l→s (rational poly) |
14 | ~0.4 | yes |
precise s→l (powf) |
6 | ~0.1 | yes |
precise l→s (powf) |
3 | ~0.1 | yes |
What does 14 ULP mean in practice? 1 ULP (unit in the last place) is the spacing between adjacent f32 values at a given magnitude. At 0.5 that's ~6e-8, so 14 ULP ≈ 8e-7 — about 6 decimal digits of precision. At 0.01 it's ~1e-8. For any 8-bit or 16-bit output, this error is invisible — it's thousands of times smaller than one output level.
Reference: C0-continuous f64 powf. The scalar rational polynomial evaluates in f64 intermediate precision, guaranteeing perfect monotonicity (zero reversals across all ~1B f32 values in [0, 1]). SIMD paths use f32 evaluation for throughput and are also monotonic within each segment.
Feature Flags
std(default): Required for runtime SIMD dispatchavx512(default): AVX-512 code paths (16-wide f32)transfer: BT.709, PQ, HLG transfer functions (scalar + SIMD)iec: IEC 61966-2-1 textbook sRGB functions for legacy interopalt: Alternative/experimental implementations for benchmarking
# no_std (requires alloc for LUT generation)
= { = "0.6", = false }
Module Organization
default— Recommended API. Rational polynomial for f32, LUT for integers, SIMD for slices.precise— Exactpowf()conversions with C0-continuous constants (not IEC textbook). f32/f64, extended range.lut— Lookup tables for custom bit depths (10-bit, 12-bit, 16-bit).tf— Transfer functions: BT.709, PQ, HLG. Requirestransferfeature.iec— IEC 61966-2-1 textbook constants for legacy interop. Requiresiecfeature.tokens— Inlineable#[rite]functions for embedding in SIMD pipelines (see below).
Embedding in SIMD Pipelines (tokens module)
If you're writing your own SIMD code with archmage,
the tokens module provides #[rite] functions that inline directly into your
#[arcane] functions — zero dispatch overhead.
use x8;
use arcane;
License
MIT OR Apache-2.0
AI-Generated Code Notice
Developed with Claude (Anthropic). All code has been reviewed and benchmarked, but verify critical paths for your use case.