Expand description
§8-bit and sub-byte floating point types for Rust
This crate implements microfloat types for Rust, including common 8-bit
formats and sub-byte 4-bit and 6-bit formats. Microfloats are a subset of
minifloat formats.
8-bit floating point representations:
f8e3m4- signed E3M4, bias 3, IEEE-like NaN/Inf.f8e4m3- signed E4M3, bias 7, IEEE-like NaN/Inf.f8e4m3b11fnuz- signed E4M3, bias 11, finite-only, unsigned zero.f8e4m3fn- signed E4M3, bias 7, finite-only, signed outer NaNs.f8e4m3fnuz- signed E4M3, bias 8, finite-only, unsigned zero.f8e5m2- signed E5M2, bias 15, IEEE-like NaN/Inf.f8e5m2fnuz- signed E5M2, bias 16, finite-only, unsigned zero.f8e8m0fnu- unsigned E8M0 scale, bias 127, no zero, single NaN.
Microscaling (MX) sub-byte floating point representations:
f4e2m1fn- signed 4-bit E2M1, bias 1, finite-only, saturating.f6e2m3fn- signed 6-bit E2M3, bias 1, finite-only, saturating.f6e3m2fn- signed 6-bit E3M2, bias 3, finite-only, saturating.
In type suffixes,
fmeans finite-only with no infinities,nmeans the format has a special NaN encoding,uzmeans unsigned zero with no distinct negative zero encoding, andumeans unsigned.
This crate is modeled to be compatible with the microfloat types in the
ml-dtypes Python package.
For broader minifloat types such as f16 and bf16, use the
half crate; microfloat is heavily inspired by
half.
§Usage
The float types attempt to match existing Rust floating point type functionality where
possible, and provide conversion operations, classification, formatting, parsing,
arithmetic operations, and common math operations. Calculations are performed through
f32 and rounded back to the target format.
use microfloat::f8e4m3;
let x = f8e4m3::from_f32(1.5);
let y = f8e4m3::from_f32(2.0);
let z = x + y;
assert_eq!(z.to_f32(), 3.5);This crate provides no_std
support.
Requires Rust 1.85 or greater.
§Optional Features
-
serde- ImplementSerializeandDeserializetraits for the float types. This adds a dependency on theserdecrate. -
num-traits- EnableToPrimitive,FromPrimitive,Num,NumCast,FloatCore,Signed,Bounded,Zero, andOnetrait implementations from thenum-traitscrate. -
bytemuck- EnableZeroableandPodtrait implementations from thebytemuckcrate. -
rand_distr- Enable sampling from distributions likeStandardUniformandStandardNormalfrom therand_distrcrate. -
rkyv- Enable zero-copy serialization support with therkyvcrate.
§Testing
Compatibility with ml-dtypes is tested by generated fixtures in tests/fixtures/.
These fixtures validate conversions, classifications, arithmetic, and math methods.
Structs§
- f4e2m1fn
- Signed 4-bit E2M1 MX finite-only type with bias 1, stored in a byte.
- f6e2m3fn
- Signed 6-bit E2M3 MX finite-only type with bias 1, stored in a byte.
- f6e3m2fn
- Signed 6-bit E3M2 MX finite-only type with bias 3, stored in a byte.
- f8e3m4
- Signed 8-bit E3M4 floating point type with bias 3 and IEEE-like NaN/Inf.
- f8e4m3
- Signed 8-bit E4M3 floating point type with bias 7 and IEEE-like NaN/Inf.
- f8e4m3b11fnuz
- Signed 8-bit E4M3 finite-only type with bias 11, unsigned zero, and a single NaN.
- f8e4m3fn
- Signed 8-bit E4M3 finite-only type with bias 7 and signed outer NaNs.
- f8e4m3fnuz
- Signed 8-bit E4M3 finite-only type with bias 8, unsigned zero, and a single NaN.
- f8e5m2
- Signed 8-bit E5M2 floating point type with bias 15 and IEEE-like NaN/Inf.
- f8e5m2fnuz
- Signed 8-bit E5M2 finite-only type with bias 16, unsigned zero, and a single NaN.
- f8e8m0fnu
- Unsigned 8-bit E8M0 MX scale format with bias 127, no zero, and a single NaN.