ucum-units 0.1.0

A total, conformance-tested implementation of UCUM (Unified Code for Units of Measure): parse, validate, analyze, compare and convert units.
Documentation

ucum-units

A friendly, complete Rust implementation of UCUM, the Unified Code for Units of Measure: the standard used across healthcare, science, and data interchange to write units precisely and unambiguously.

With ucum-units you can parse unit codes, check whether they're valid, work out their dimensions, ask whether two units are comparable, and convert values between them, including temperatures and logarithmic units like decibels.

use ucum::{analyze, convert, is_comparable};

// Convert between units, including temperatures.
let ft = convert(1.0, "[ft_i]", "m").unwrap();
println!("{ft}");                                       // => 0.3048
let body = convert(98.6, "[degF]", "Cel").unwrap();
println!("{body}");                                     // => 37.0 (98.6 °F in Celsius)

// Check commensurability before converting.
println!("{}", is_comparable("kg/m3", "mg/L").unwrap()); // => true  (both densities)
println!("{}", is_comparable("kg", "m").unwrap());       // => false (mass vs length)

// Inspect a unit's scale to the canonical base units.
let km = analyze("km").unwrap();
println!("{}", km.factor);                              // => 1000.0 (1 km = 1000 m)

What is UCUM, and why would I use it?

A unit like “milligrams per deciliter” gets written a dozen different ways in the wild: mg/dL, mg/dl, MG/DL, milligram/deciliter, mg.dL-1. That ambiguity is fine for humans and a disaster for software: two systems exchanging lab results, sensor readings, or dosages need to agree on exactly what a unit is before they can compare or convert values.

UCUM (the Unified Code for Units of Measure) fixes this by giving every unit a single, unambiguous, machine-parseable code. It is built from a small grammar (a fixed set of base atoms, metric prefixes, and operators), so any unit, however exotic, has exactly one canonical spelling. It is the unit standard used by HL7/FHIR, IEEE 11073, and much of healthcare and laboratory data interchange.

You do not need to memorize UCUM to use this crate. If you already have unit codes (from a FHIR resource, a device, a spreadsheet), pass them straight in. If you're writing them yourself, the five-minute tutorial below is all you need.

Installation

cargo add ucum-units

The crate is dependency-light (just thiserror at runtime) and contains no unsafe code.

What you can do

Task Function
Parse a unit into an AST parse(expr)
Check a unit is well-formed and known validate(expr)
Get the dimension + conversion factor/offset analyze(expr)
Ask whether two units are comparable is_comparable(a, b)
Convert a value between units convert(value, from, to)
Normalize a unit string for display canonical(expr)
Get a readable name (mm(millimeter)) display_name(expr)

Everything is a plain function, deterministic, and thread-safe. The lookup tables are immutable and shared, so calls are cheap and safe to make from anywhere.

UCUM in five minutes

Every UCUM code is built from four ingredients. Once you can spot them, you can read and write almost any unit.

1. Atoms are the base vocabulary, the actual units. m (meter), g (gram), s (second), mol (mole), K (kelvin), L (liter), Pa (pascal), min (minute). Case matters: m is the meter.

2. Prefixes attach to the front of a metric atom to scale it. k = kilo, m = milli, u = micro, n = nano, M = mega, d = deci, c = centi.

Code Reads as
km kilometer
mg milligram
uL microliter
kPa kilopascal
nmol nanomole

⚠️ A prefix on its own is not a unit. M means the mega prefix, not the meter; the meter is m. This is the single most common newcomer surprise.

3. Operators combine atoms into compound units:

  • . multiplies: N.m is a newton-meter.
  • / divides: m/s is meters per second; mg/dL is milligrams per deciliter.
  • A leading / makes a reciprocal: /min is “per minute”.
  • A trailing number is an exponent: m2 is square meters, s-1 is per second, m3 is cubic meters. (UCUM writes the exponent as a suffix, not with ^.)

So a force, kg·m/s², is written kg.m/s2, and an acceleration is m/s2.

4. Brackets and braces handle the special cases:

  • [...] wraps customary, named, or “non-metric” units that don't follow the prefix rules: [ft_i] (international foot), [gal_us] (US gallon), [degF] (degree Fahrenheit), [in_i] (inch), [lb_av] (pound), [pH]. When a unit looks like a word or proper name, it usually lives in brackets.
  • {...} is a free-text annotation, a human note that carries no dimensional meaning. mg{total}, /min{beats}, and ng/mL{IgG} are dimensionally identical to mg, /min, and ng/mL; the brace text is along for the ride.
  • (...) groups terms, exactly like in arithmetic: kg/(m.s) is kilograms per (meter·second), which is different from kg/m.s.

A couple more building blocks you'll meet:

  • 1 is the dimensionless unity, a pure ratio. % (percent) and [pH] build on it.
  • 10*6 is UCUM's scientific notation for powers of ten, common in lab counts like 10*6/uL (millions per microliter).

That's the whole grammar. Putting it together:

UCUM code Plain English
mg/dL milligrams per deciliter
mmol/L millimoles per liter
km/h kilometers per hour
kg.m/s2 kilogram-meters per second² (a newton)
mm[Hg] millimeters of mercury (blood pressure)
10*6/uL millions per microliter (a cell count)
ng/mL{IgG} nanograms per milliliter, annotated “IgG”

Common gotchas

UCUM is precise, which means a few spellings can surprise newcomers:

You write It means
m vs M meter vs the mega prefix (M on its own isn't a unit)
ft femto·tonne (a mass!); the foot is [ft_i]
m2, s-1 exponents are suffixes, e.g. m3/s, s-1
[ft_i], [gal_us] customary units live in square brackets
1 the dimensionless unity
kg{wet} {…} is a free-text annotation; ignored dimensionally
/s a leading slash is a reciprocal, i.e. s⁻¹
kPa prefixes attach to metric units (kg, kPa, mL)

A guided tour (no conversions)

Conversions get their own section; here's everything else the crate can tell you about a unit code. Suppose someone hands you the string mg/dL and you want to make sense of it.

use ucum::{validate, analyze, canonical, display_name};

// 1. Is it even a real UCUM unit? validate() checks both the grammar and that
//    every atom is known. It never panics; bad input comes back as an Err.
println!("{}", validate("mg/dL").is_ok());    // => true
println!("{}", validate("flurble").is_ok());  // => false (unknown atom)
println!("{}", validate("mg/").is_ok());       // => false (malformed)

// 2. What does it mean in English? Handy for UIs and logs.
println!("{}", display_name("mg/dL").unwrap()); // => (milligram) / (deciliter)
println!("{}", display_name("mm").unwrap());    // => (millimeter)

// 3. What's its canonical spelling? canonical() re-serializes the parse tree,
//    dropping redundant parentheses but keeping meaningful ones.
println!("{}", canonical("((m))").unwrap());    // => m
println!("{}", canonical("kg/(m.s)").unwrap()); // => kg/(m.s)

// 4. What is it, dimensionally? analyze() gives the dimension vector plus the
//    factor to the canonical base units, without converting any value.
let a = analyze("mg/dL").unwrap();
println!("{}", a.is_dimensionless);             // => false
// mg/dL is a mass concentration: mass / volume = g · m⁻³.

analyze is the workhorse: it's how you'd group lab results by what they measure, validate that a device is reporting the unit you expect, or render a unit's meaning, all without performing arithmetic. The Dimensions section below explains the vector it returns.

Working with quantities

Quantity pairs a value with a unit and lets you do dimensional arithmetic:

use ucum::Quantity;

let speed = Quantity::new(100.0, "km").div(&Quantity::new(2.0, "h"));
println!("{}", speed.is_comparable("m/s").unwrap()); // => true

let in_ms = speed.convert_to("m/s").unwrap();
println!("{}", in_ms.value);                         // => 13.888888888888889

Case-insensitive mode

UCUM has a case-sensitive form (c/s, the default for data interchange) and a case-insensitive form (c/i). The free functions use c/s; for c/i, reach for the Ucum facade:

use ucum::Ucum;

let ci = Ucum::case_insensitive();
println!("{}", ci.validate("MOL").is_ok());          // => true (mole)
println!("{}", ci.convert(1.0, "M", "CM").unwrap()); // => 100.0

Dimensions

A dimension is what a unit measures, stripped of the particular unit you chose. A meter and a foot are different units but the same dimension: length. Meters-per-second and miles-per-hour are both length ÷ time. Capturing this is what lets the crate tell you that kg/m3 and mg/dL are comparable (both are mass ÷ volume) while kg and m are not.

UCUM expresses every dimension as a combination of seven base quantities. Dimension records how many powers of each appear, as a fixed exponent vector [i8; 7]:

index quantity base unit example unit using it
0 length m m, [ft_i], km
1 time s s, min, h
2 mass g g, kg, [lb_av]
3 plane angle rad rad, deg
4 temperature K K, Cel
5 electric charge C C, A.s
6 luminous intensity cd cd

So the vector reads off directly. A velocity is length¹·time⁻¹, i.e. the exponent 1 in slot 0 and -1 in slot 1:

use ucum::analyze;

// m/s is velocity: length per time.
println!("{:?}", analyze("m/s").unwrap().dimension);
// => Dimension([1, -1, 0, 0, 0, 0, 0])

// m³/s is volume flow rate.
println!("{:?}", analyze("m3/s").unwrap().dimension);
// => Dimension([3, -1, 0, 0, 0, 0, 0])

// A force (newton = kg·m/s²) is mass¹·length¹·time⁻². Both spellings reduce
// to the same vector, which is exactly why they're interchangeable.
println!("{:?}", analyze("kg.m/s2").unwrap().dimension); // => Dimension([1, -2, 1, 0, 0, 0, 0])
println!("{:?}", analyze("N").unwrap().dimension);       // => Dimension([1, -2, 1, 0, 0, 0, 0])

That last pair shows the key property: many units share one dimension. [ft_i] and m both reduce to [1,0,0,0,0,0,0]; kg/m3 and mg/dL both reduce to [-3,0,1,0,0,0,0].

Printing and comparing

Dimension has a Display that renders the vector back in UCUM base-unit syntax, handy for logs and error messages. The dimensionless dimension prints as 1:

use ucum::analyze;

println!("{}", analyze("m/s").unwrap().dimension);     // => m.s-1
println!("{}", analyze("kg.m/s2").unwrap().dimension); // => m.s-2.g
println!("{}", analyze("%").unwrap().dimension);       // => 1

is_comparable is, under the hood, just an equality check on these vectors (with one extra rule for arbitrary units; see Feature support). Two units are convertible iff their dimension vectors are identical.

Dimension arithmetic

Dimension is a value type you can manipulate directly. The operations mirror what happens when you multiply or divide units: exponents add, invert, and scale.

use ucum::Dimension;

let length = Dimension([1, 0, 0, 0, 0, 0, 0]);
let time   = Dimension([0, 1, 0, 0, 0, 0, 0]);

let velocity = length.mul(time.inv());        // length · time⁻¹
println!("{velocity:?}");                      // => Dimension([1, -1, 0, 0, 0, 0, 0])

let area = length.powi(2);                     // length²
println!("{area:?}");                          // => Dimension([2, 0, 0, 0, 0, 0, 0])

println!("{}", Dimension::DIMENSIONLESS.is_dimensionless()); // => true

All three operations saturate at the i8 bounds (±127) instead of overflowing, so Dimension can never panic, even on pathological input like m120.m120, whose length exponent simply caps at 127.

A couple of UCUM surprises

Two cases catch people out, because UCUM's notion of “dimension” is narrower than physics' notion of “base quantity”:

  • The mole is dimensionless. UCUM treats mol as a pure count ([0,0,0,0,0,0,0]), not a base quantity of its own. So analyze("mol") is flagged is_dimensionless, and a molar concentration mol/L has the same dimension as a plain inverse volume.
  • Plane angle is not dimensionless here. rad occupies its own slot (index 3), so rad is not comparable with the unity 1, and the steradian sr analyzes as rad2. This makes angles first-class but means is_comparable("rad", "1") returns false.

Feature support

Feature Status
Case-sensitive (c/s) grammar
Case-insensitive (c/i) mode
Full atom + prefix coverage (essence 2.2)
Multiplicative conversion
Affine conversion (Cel, [degF], [degRe])
Logarithmic conversion (B, dB, Np, [pH], …)
Display-name generation
Quantity arithmetic
Arbitrary units ([iU], [arb'U], …) Parse & analyze; incommensurable by design, so not convertible
Special units inside compound terms (Cel/s) Parse & analyze; convert reports them as unsupported

Reliability

Parsing and analysis are total: for any input at all (valid, malformed, or adversarial) they return a result or a precise error, and never hang, panic, or overflow the stack. This is enforced by step- and depth-bounded parsing and checked continuously with property tests (proptest) and a cargo-fuzz target.

Errors are descriptive and carry a byte offset for parse problems, so you can point users straight at the issue:

use ucum::validate;

// Errors are typed; match on the variant, or just print them.
println!("{:?}", validate("m/"));
// => Err(Parse { pos: 2, msg: "unexpected end of input, expected a unit" })
println!("{:?}", validate("flurble"));
// => Err(UnknownAtom { code: "flurble" })

Conformance

ucum-units runs against the official UCUM functional test suite and passes all 573 cases: validation, conversion, display-name generation, and quantity multiplication/division. The test suite is vendored and run as part of cargo test.

Benchmarks

A Criterion benchmark suite covers the hot paths: parsing, validation, analysis, conversion, and display-name generation:

cargo bench

As a rough guide on a modern desktop, parsing a simple unit takes tens of nanoseconds and a full conversion a few hundred; the unit tables are built once, lazily, and shared immutably thereafter.

Minimum supported Rust version

Rust 1.89 (edition 2024).

License and attribution

The crate's own source code is under the MIT license (see the LICENSE file).

This crate bundles two data files, each under its own license:

  • vendor/ucum-essence.xml: the machine-readable UCUM definitions, © Regenstrief Institute, Inc. and the UCUM Organization, redistributed verbatim under the UCUM Copyright Notice and License v1.1 (see vendor/UCUM-LICENSE.md). It is unmodified, and is parsed at build time to generate the unit and prefix tables.
  • vendor/UcumFunctionalTests.xml: the conformance test suite, © Grahame Grieve and contributors, under the Eclipse Public License 1.0.

UCUM is a standard of the Regenstrief Institute and the UCUM Organization (https://ucum.org). This project is independent and is not affiliated with or endorsed by them. With thanks to the UCUM maintainers for the specification and the open data that make this crate possible.