Expand description
ARPFloat is an implementation of arbitrary precision floating point data structures and utilities. The library can be used to emulate floating point operation, in software, or create new floating point data types.
Example
use arpfloat::Float;
use arpfloat::new_float_type;
// Create a new type: 15 bits exponent, 112 significand.
type FP128 = new_float_type!(15, 112);
// Use Newton-Raphson to find the square root of 5.
let n = FP128::from_u64(5);
let two = FP128::from_u64(2);
let mut x = n;
for _ in 0..1000 {
x = (x + (n / x))/two;
}
println!("fp128: {}", x);
println!("fp64: {}", x.as_f64());The program above will print this output:
fp128: 2.2360679774997896964091736687312763
fp64: 2.23606797749979
The library also provides API that exposes rounding modes, and low-level operations.
use arpfloat::{FP16, FP128, RoundingMode};
let x = FP128::from_u64(1<<53);
let y = FP128::from_f64(1000.0);
let val = FP128::mul_with_rm(x, y, RoundingMode::NearestTiesToEven);View the internal representation of numbers:
use arpfloat::{FP16, FP128, RoundingMode};
let fp = FP16::from_i64(15);
let m = fp.get_mantissa();
// Prints FP[+ E=+3 M=11110000000]
fp.dump();Control the rounding mode for type conversion:
use arpfloat::{FP16, FP32, RoundingMode};
let x = FP32::from_u64(2649);
let b : FP16 = x.cast_with_rm(RoundingMode::Zero);
println!("{}", b); // Prints 2648!Macros
Creates a new Float<> type with a specific number of bits for the exponent and mantissa.
The macros selects the appropriate size for the underlying storage.
Structs
This is a fixed-size big int implementation that’s used to represent the
significand part of the floating point number.
This is the main data structure of this library. It represents an
arbitrary-precision floating-point number. The data structure is generic
and accepts the EXPONENT and MANTISSA constants, that represent the encoding
number of bits that are dedicated to storing these values.
Enums
Defines the supported rounding modes.
See IEEE754-2019 Section 4.3 Rounding-direction attributes
Type Definitions
Predefined FP16 float with 5 exponent bits, and 10 mantissa bits.
Predefined FP32 float with 8 exponent bits, and 23 mantissa bits.
Predefined FP64 float with 11 exponent bits, and 52 mantissa bits.
Predefined FP128 float with 15 exponent bits, and 112 mantissa bits.
Predefined FP256 float with 19 exponent bits, and 236 mantissa bits.