pub trait PrimitiveFloat: 'static + Abs<Output = Self> + AbsAssign + Add<Output = Self> + AddAssign<Self> + AddMul<Output = Self> + AddMulAssign<Self, Self> + Ceiling<Output = Self> + CeilingAssign + CeilingLogBase2<Output = i64> + CeilingLogBasePowerOf2<u64, Output = i64> + CheckedFrom<u8> + CheckedFrom<u16> + CheckedFrom<u32> + CheckedFrom<u64> + CheckedFrom<u128> + CheckedFrom<usize> + CheckedFrom<i8> + CheckedFrom<i16> + CheckedFrom<i32> + CheckedFrom<i64> + CheckedFrom<i128> + CheckedFrom<isize> + CheckedInto<u8> + CheckedInto<u16> + CheckedInto<u32> + CheckedInto<u64> + CheckedInto<u128> + CheckedInto<usize> + CheckedInto<i8> + CheckedInto<i16> + CheckedInto<i32> + CheckedInto<i64> + CheckedInto<i128> + CheckedInto<isize> + CheckedLogBase2<Output = i64> + CheckedLogBasePowerOf2<u64, Output = i64> + ConvertibleFrom<u8> + ConvertibleFrom<u16> + ConvertibleFrom<u32> + ConvertibleFrom<u64> + ConvertibleFrom<u128> + ConvertibleFrom<usize> + ConvertibleFrom<i8> + ConvertibleFrom<i16> + ConvertibleFrom<i32> + ConvertibleFrom<i64> + ConvertibleFrom<i128> + ConvertibleFrom<isize> + Copy + Debug + Default + Display + Div<Output = Self> + DivAssign + Display + Floor<Output = Self> + FloorAssign + FloorLogBase2<Output = i64> + FloorLogBasePowerOf2<u64, Output = i64> + FmtRyuString + From<f32> + FromStr + IntegerMantissaAndExponent<u64, i64> + Into<f64> + IsInteger + IsPowerOf2 + Iverson + LowerExp + Min + Max + Mul<Output = Self> + MulAssign<Self> + Named + Neg<Output = Self> + NegAssign + NegativeOne + NextPowerOf2<Output = Self> + NextPowerOf2Assign + One + PartialEq<Self> + PartialOrd<Self> + Pow<i64, Output = Self> + Pow<Self, Output = Self> + PowAssign<i64> + PowAssign<Self> + PowerOf2<i64> + Product + RawMantissaAndExponent<u64, u64> + Rem<Output = Self> + RemAssign<Self> + RoundingFrom<u8> + RoundingFrom<u16> + RoundingFrom<u32> + RoundingFrom<u64> + RoundingFrom<u128> + RoundingFrom<usize> + RoundingFrom<i8> + RoundingFrom<i16> + RoundingFrom<i32> + RoundingFrom<i64> + RoundingFrom<i128> + RoundingFrom<isize> + RoundingInto<u8> + RoundingInto<u16> + RoundingInto<u32> + RoundingInto<u64> + RoundingInto<u128> + RoundingInto<usize> + RoundingInto<i8> + RoundingInto<i16> + RoundingInto<i32> + RoundingInto<i64> + RoundingInto<i128> + RoundingInto<isize> + SciMantissaAndExponent<Self, i64> + Sign + Sized + Sqrt<Output = Self> + SqrtAssign + Square<Output = Self> + SquareAssign + Sub<Output = Self> + SubAssign<Self> + SubMul<Output = Self> + SubMulAssign<Self, Self> + Sum<Self> + Two + UpperExp + Zero {
Show 16 associated constants and 16 methods const WIDTH: u64; const MANTISSA_WIDTH: u64; const MIN_POSITIVE_SUBNORMAL: Self; const MAX_SUBNORMAL: Self; const MIN_POSITIVE_NORMAL: Self; const MAX_FINITE: Self; const NEGATIVE_ZERO: Self; const POSITIVE_INFINITY: Self; const NEGATIVE_INFINITY: Self; const NAN: Self; const SMALLEST_UNREPRESENTABLE_UINT: u64; const LARGEST_ORDERED_REPRESENTATION: u64; const EXPONENT_WIDTH: u64 = Self::WIDTH - Self::MANTISSA_WIDTH - 1; const MIN_NORMAL_EXPONENT: i64 = -(1 << (Self::EXPONENT_WIDTH - 1)) + 2; const MIN_EXPONENT: i64 = Self::MIN_NORMAL_EXPONENT - (Self::MANTISSA_WIDTH as i64); const MAX_EXPONENT: i64 = (1 << (Self::EXPONENT_WIDTH - 1)) - 1; fn is_nan(self) -> bool; fn is_infinite(self) -> bool; fn is_finite(self) -> bool; fn is_normal(self) -> bool; fn classify(self) -> FpCategory; fn to_bits(self) -> u64; fn from_bits(v: u64) -> Self; fn is_negative_zero(self) -> bool { ... } fn abs_negative_zero(self) -> Self { ... } fn abs_negative_zero_assign(&mut self) { ... } fn next_higher(self) -> Self { ... } fn next_lower(self) -> Self { ... } fn to_ordered_representation(self) -> u64 { ... } fn from_ordered_representation(n: u64) -> Self { ... } fn precision(self) -> u64 { ... } fn max_precision_for_sci_exponent(exponent: i64) -> u64 { ... }
}
Expand description

This trait defines functions on primitive float types: f32 and f64.

Many of the functions here concern exponents and mantissas. We define three ways to express a float, each with its own exponent and mantissa. In the following, let $x$ be an arbitrary positive, finite, non-zero, non-NaN float. Let $M$ and $E$ be the mantissa width and exponent width of the floating point type; for f32s, this is 23 and 8, and for f64s it’s 52 and 11.

In the following we assume that $x$ is positive, but you can easily extend these definitions to negative floats by first taking their absolute value.

raw form

The raw exponent and raw mantissa are the actual bit patterns used to represent the components of $x$. The raw exponent $e_r$ is an integer in $[0, 2^E-2]$ and the raw mantissa $m_r$ is an integer in $[0, 2^M-1]$. Since we are dealing with a nonzero $x$, we forbid $e_r$ and $m_r$ from both being zero. We have $$ x = \begin{cases} 2^{2-2^{E-1}-M}m_r & \text{if} \quad e_r = 0, \\ 2^{e_r-2^{E-1}+1}(2^{-M}m_r+1) & \textrm{otherwise}, \end{cases} $$ $$ e_r = \begin{cases} 0 & \text{if} \quad x < 2^{2-2^{E-1}}, \\ \lfloor \log_2 x \rfloor + 2^{E-1} - 1 & \textrm{otherwise}, \end{cases} $$ $$ m_r = \begin{cases} 2^{M+2^{E-1}-2}x & \text{if} \quad x < 2^{2-2^{E-1}}, \\ 2^M \left ( \frac{x}{2^{\lfloor \log_2 x \rfloor}}-1\right ) & \textrm{otherwise}. \end{cases} $$

scientific form

We can write $x = 2^{e_s}m_s$, where $e_s$ is an integer and $m_s$ is a rational number with $1 \leq m_s < 2$. If $x$ is a valid float, the scientific mantissa $m_s$ is always exactly representable as a float of the same type. We have $$ x = 2^{e_s}m_s, $$ $$ e_s = \lfloor \log_2 x \rfloor, $$ $$ m_s = \frac{x}{2^{\lfloor \log_2 x \rfloor}}. $$

integer form

We can also write $x = 2^{e_i}m_i$, where $e_i$ is an integer and $m_i$ is an odd integer. We have $$ x = 2^{e_i}m_i, $$ $e_i$ is the unique integer such that $x/2^{e_i}$is an odd integer, and $$ m_i = \frac{x}{2^{e_i}}. $$

Required Associated Constants

The number of bits taken up by the type.

This is $M+E+1$. The three terms in the sum correspond to the width of the mantissa, the width of the exponent, and the sign bit.

  • For f32s, this is 32.
  • For f64s, this is 64.

The number of bits taken up by the mantissa.

  • For f32s, this is 23.
  • For f64s, this is 52.

The smallest positive float. This is $2^{2-2^{E-1}-M}$.

  • For f32s, this is $2^{-149}$, or 1.0e-45.
  • For f64s, this is $2^{-1074}$, or 5.0e-324.

The largest float in the subnormal range. This is $2^{2-2^{E-1}-M}(2^M-1)$.

  • For f32s, this is $2^{-149}(2^{23}-1)$, or 1.1754942e-38.
  • For f64s, this is $2^{-1074}(2^{52}-1)$, or 2.225073858507201e-308.

The smallest positive normal float. This is $2^{2-2^{E-1}}$.

  • For f32s, this is $2^{-126}$, or 1.1754944e-38.
  • For f64s, this is $2^{-1022}$, or 2.2250738585072014e-308.

The largest finite float. This is $2^{2^{E-1}-1}(2-2^{-M})$.

  • For f32s, this is $2^{127}(2-2^{-23})$, or 3.4028235e38.
  • For f64s, this is $2^{1023}(2-2^{-52})$, or 1.7976931348623157e308.

The smallest positive integer that cannot be represented as a float. This is $2^{M+1}+1$.

  • For f32s, this is $2^{24}+1$, or 16777217.
  • For f64s, this is $2^{53}+1$, or 9007199254740993.

If you list all floats in increasing order, excluding NaN and giving negative and positive zero separate adjacent spots, this will be index of the last element, positive infinity. It is $2^{M+1}(2^E-1)+1$.

  • For f32s, this is $2^{32}-2^{24}+1$, or 4278190081.
  • For f64s, this is $2^{64}-2^{53}+1$, or 18437736874454810625.

Provided Associated Constants

The number of bits taken up by the exponent.

  • For f32s, this is 8.
  • For f64s, this is 11.

The smallest possible exponent of a float in the normal range. Any floats with smaller exponents are subnormal and thus have reduced precision. This is $2-2^{E-1}$.

  • For f32s, this is -126.
  • For f64s, this is -1022.

The smallest possible exponent of a float. This is $2-2^{E-1}-M$.

  • For f32s, this is -149.
  • For f64s, this is -1074.

The largest possible exponent of a float. This is $2^{E-1}-1$.

  • For f32s, this is 127.
  • For f64s, this is 1023.

Required Methods

Provided Methods

Tests whether self is negative zero.

Worst-case complexity

Constant time and additional memory.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;

assert!((-0.0).is_negative_zero());
assert!(!0.0.is_negative_zero());
assert!(!1.0.is_negative_zero());
assert!(!f32::NAN.is_negative_zero());
assert!(!f32::POSITIVE_INFINITY.is_negative_zero());

If self is negative zero, returns positive zero; otherwise, returns self.

Worst-case complexity

Constant time and additional memory.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;
use malachite_base::num::float::NiceFloat;

assert_eq!(NiceFloat((-0.0).abs_negative_zero()), NiceFloat(0.0));
assert_eq!(NiceFloat(0.0.abs_negative_zero()), NiceFloat(0.0));
assert_eq!(NiceFloat(1.0.abs_negative_zero()), NiceFloat(1.0));
assert_eq!(NiceFloat((-1.0).abs_negative_zero()), NiceFloat(-1.0));
assert_eq!(NiceFloat(f32::NAN.abs_negative_zero()), NiceFloat(f32::NAN));

If self is negative zero, replaces it with positive zero; otherwise, leaves self unchanged.

Worst-case complexity

Constant time and additional memory.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;
use malachite_base::num::float::NiceFloat;

let mut f = -0.0;
f.abs_negative_zero_assign();
assert_eq!(NiceFloat(f), NiceFloat(0.0));

let mut f = 0.0;
f.abs_negative_zero_assign();
assert_eq!(NiceFloat(f), NiceFloat(0.0));

let mut f = 1.0;
f.abs_negative_zero_assign();
assert_eq!(NiceFloat(f), NiceFloat(1.0));

let mut f = -1.0;
f.abs_negative_zero_assign();
assert_eq!(NiceFloat(f), NiceFloat(-1.0));

let mut f = f32::NAN;
f.abs_negative_zero_assign();
assert_eq!(NiceFloat(f), NiceFloat(f32::NAN));

Returns the smallest float larger than self.

Passing -0.0 returns 0.0; passing NaN or positive infinity panics.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is NaN or positive infinity.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;
use malachite_base::num::float::NiceFloat;

assert_eq!(NiceFloat((-0.0f32).next_higher()), NiceFloat(0.0));
assert_eq!(NiceFloat(0.0f32.next_higher()), NiceFloat(1.0e-45));
assert_eq!(NiceFloat(1.0f32.next_higher()), NiceFloat(1.0000001));
assert_eq!(NiceFloat((-1.0f32).next_higher()), NiceFloat(-0.99999994));

Returns the largest float smaller than self.

Passing 0.0 returns -0.0; passing NaN or negative infinity panics.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is NaN or negative infinity.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;
use malachite_base::num::float::NiceFloat;

assert_eq!(NiceFloat(0.0f32.next_lower()), NiceFloat(-0.0));
assert_eq!(NiceFloat((-0.0f32).next_lower()), NiceFloat(-1.0e-45));
assert_eq!(NiceFloat(1.0f32.next_lower()), NiceFloat(0.99999994));
assert_eq!(NiceFloat((-1.0f32).next_lower()), NiceFloat(-1.0000001));

Maps self to an integer. The map preserves ordering, and adjacent floats are mapped to adjacent integers.

Negative infinity is mapped to 0, and positive infinity is mapped to the largest value, LARGEST_ORDERED_REPRESENTATION. Negative and positive zero are mapped to distinct adjacent values. Passing in NaN panics.

The inverse operation is from_ordered_representation.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is NaN.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;

assert_eq!(f32::NEGATIVE_INFINITY.to_ordered_representation(), 0);
assert_eq!((-0.0f32).to_ordered_representation(), 2139095040);
assert_eq!(0.0f32.to_ordered_representation(), 2139095041);
assert_eq!(1.0f32.to_ordered_representation(), 3204448257);
assert_eq!(
    f32::POSITIVE_INFINITY.to_ordered_representation(),
    4278190081
);

Maps a non-negative integer, less than or equal to LARGEST_ORDERED_REPRESENTATION, to a float. The map preserves ordering, and adjacent integers are mapped to adjacent floats.

Zero is mapped to negative infinity, and LARGEST_ORDERED_REPRESENTATION is mapped to positive infinity. Negative and positive zero are produced by two distinct adjacent integers. NaN is never produced.

The inverse operation is to_ordered_representation.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is greater than LARGEST_ORDERED_REPRESENTATION.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;

assert_eq!(f32::from_ordered_representation(0), f32::NEGATIVE_INFINITY);
assert_eq!(f32::from_ordered_representation(2139095040), -0.0f32);
assert_eq!(f32::from_ordered_representation(2139095041), 0.0f32);
assert_eq!(f32::from_ordered_representation(3204448257), 1.0f32);
assert_eq!(
    f32::from_ordered_representation(4278190081),
    f32::POSITIVE_INFINITY
);

Returns the precision of a nonzero finite floating-point number.

The precision is the number of significant bits of the integer mantissa. For example, the floats with precision 1 are the powers of 2, those with precision 2 are 3 times a power of 2, those with precision 3 are 5 or 7 times a power of 2, and so on.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is zero, infinite, or NaN.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;

assert_eq!(1.0.precision(), 1);
assert_eq!(2.0.precision(), 1);
assert_eq!(3.0.precision(), 2);
assert_eq!(1.5.precision(), 2);
assert_eq!(1.234f32.precision(), 23);

Given a scientific exponent, returns the largest possible precision for a float with that exponent.

See the documentation of the precision function for a definition of precision.

For exponents greater than or equal to MIN_NORMAL_EXPONENT, the maximum precision is one more than the mantissa width. For smaller exponents (corresponding to the subnormal range), the precision is lower.

Worst-case complexity

Constant time and additional memory.

Panics

Panics if self is less than MIN_EXPONENT or greater than MAX_EXPONENT.

Examples
use malachite_base::num::basic::floats::PrimitiveFloat;

assert_eq!(f32::max_precision_for_sci_exponent(0), 24);
assert_eq!(f32::max_precision_for_sci_exponent(127), 24);
assert_eq!(f32::max_precision_for_sci_exponent(-149), 1);
assert_eq!(f32::max_precision_for_sci_exponent(-148), 2);
assert_eq!(f32::max_precision_for_sci_exponent(-147), 3);

Implementations on Foreign Types

Implementors