Skip to main content

Crate microfloat

Crate microfloat 

Source
Expand description

§8-bit and sub-byte floating point types for Rust

This crate implements microfloat types for Rust, including common 8-bit formats and sub-byte 4-bit and 6-bit formats. Microfloats are a subset of minifloat formats.

8-bit floating point representations:

  • f8e3m4 - signed E3M4, bias 3, IEEE-like NaN/Inf.
  • f8e4m3 - signed E4M3, bias 7, IEEE-like NaN/Inf.
  • f8e4m3b11fnuz - signed E4M3, bias 11, finite-only, unsigned zero.
  • f8e4m3fn - signed E4M3, bias 7, finite-only, signed outer NaNs.
  • f8e4m3fnuz - signed E4M3, bias 8, finite-only, unsigned zero.
  • f8e5m2 - signed E5M2, bias 15, IEEE-like NaN/Inf.
  • f8e5m2fnuz - signed E5M2, bias 16, finite-only, unsigned zero.
  • f8e8m0fnu - unsigned E8M0 scale, bias 127, no zero, single NaN.

Microscaling (MX) sub-byte floating point representations:

  • f4e2m1fn - signed 4-bit E2M1, bias 1, finite-only, saturating.
  • f6e2m3fn - signed 6-bit E2M3, bias 1, finite-only, saturating.
  • f6e3m2fn - signed 6-bit E3M2, bias 3, finite-only, saturating.

In type suffixes,

  • f means finite-only with no infinities,
  • n means the format has a special NaN encoding,
  • uz means unsigned zero with no distinct negative zero encoding, and
  • u means unsigned.

This crate is modeled to be compatible with the microfloat types in the ml-dtypes Python package. For broader minifloat types such as f16 and bf16, use the half crate; microfloat is heavily inspired by half.

§Usage

The float types attempt to match existing Rust floating point type functionality where possible, and provide conversion operations, classification, formatting, parsing, arithmetic operations, and common math operations. Calculations are performed through f32 and rounded back to the target format.

use microfloat::f8e4m3;

let x = f8e4m3::from_f32(1.5);
let y = f8e4m3::from_f32(2.0);
let z = x + y;

assert_eq!(z.to_f32(), 3.5);

This crate provides no_std support.

Requires Rust 1.85 or greater.

§Optional Features

  • serde - Implement Serialize and Deserialize traits for the float types. This adds a dependency on the serde crate.

  • num-traits - Enable ToPrimitive, FromPrimitive, Num, NumCast, FloatCore, Signed, Bounded, Zero, and One trait implementations from the num-traits crate.

  • bytemuck - Enable Zeroable and Pod trait implementations from the bytemuck crate.

  • rand_distr - Enable sampling from distributions like StandardUniform and StandardNormal from the rand_distr crate.

  • rkyv - Enable zero-copy serialization support with the rkyv crate.

§Testing

Compatibility with ml-dtypes is tested by generated fixtures in tests/fixtures/. These fixtures validate conversions, classifications, arithmetic, and math methods.

Structs§

f4e2m1fn
Signed 4-bit E2M1 MX finite-only type with bias 1, stored in a byte.
f6e2m3fn
Signed 6-bit E2M3 MX finite-only type with bias 1, stored in a byte.
f6e3m2fn
Signed 6-bit E3M2 MX finite-only type with bias 3, stored in a byte.
f8e3m4
Signed 8-bit E3M4 floating point type with bias 3 and IEEE-like NaN/Inf.
f8e4m3
Signed 8-bit E4M3 floating point type with bias 7 and IEEE-like NaN/Inf.
f8e4m3b11fnuz
Signed 8-bit E4M3 finite-only type with bias 11, unsigned zero, and a single NaN.
f8e4m3fn
Signed 8-bit E4M3 finite-only type with bias 7 and signed outer NaNs.
f8e4m3fnuz
Signed 8-bit E4M3 finite-only type with bias 8, unsigned zero, and a single NaN.
f8e5m2
Signed 8-bit E5M2 floating point type with bias 15 and IEEE-like NaN/Inf.
f8e5m2fnuz
Signed 8-bit E5M2 finite-only type with bias 16, unsigned zero, and a single NaN.
f8e8m0fnu
Unsigned 8-bit E8M0 MX scale format with bias 127, no zero, and a single NaN.