fhir-fhirpath
FHIRPath compiler + VM with support for the HL7 R5 FHIRPath test suite. The crate is designed to:
- Parse and compile FHIRPath expressions into a compact bytecode
Plan - Evaluate a
Planagainst JSON-backed FHIR resources - Support both lenient evaluation (pragmatic for JSON) and strict semantics (spec-style validation)
- Provide pipeline visualization (AST / HIR / VM Plan) for debugging and development
This repository also contains a CLI wrapper in crates/cli that exercises the engine.
Quick Start (API)
use ;
use json;
let engine = with_fhir_version?;
let resource = json!;
let root = from_json;
let ctx = new;
// Rooted expression
let result = engine.evaluate_expr?;
// Unrooted expression with type-aware compilation
let result = engine.evaluate_expr?;
# Ok::
CLI
From the workspace root:
# evaluate (resource file)
# evaluate (stdin)
|
# strict semantics (invalid paths error instead of returning empty)
Architecture Overview
The engine implements a classic compiler pipeline:
- Lexer (
src/lexer.rs) → token stream - Parser (
src/parser.rs) → AST (src/ast.rs) - Semantic analysis (
src/analyzer.rs) → HIR (src/hir.rs) - Type resolution pass (
src/typecheck.rs) → typed HIR - Codegen (
src/codegen.rs) → bytecodePlan(src/vm.rs) - VM execution (
src/vm.rs,src/vm/operations.rs,src/vm/functions/*) →Collection
The top-level orchestration lives in src/engine.rs.
Engine and Caching
Engine (src/engine.rs) owns the “global” registries and caches:
TypeRegistry(src/types.rs): System type IDs + helpers used throughout compilation.FunctionRegistry(src/functions.rs): compile-time PHF map from function name →FunctionId+ signature metadata.VariableRegistry(src/variables.rs): assigns numeric IDs to external variables (e.g.%resource).FhirContext(fhir-contextcrate): StructureDefinition access for type inference and validation.LruCache<String, Arc<Plan>>: caches compiled bytecode, keyed by(base_type, expr)when a base type is provided.
Key property: compilation can be expensive; evaluation is intended to be cheap, so Engine::compile() + Engine::evaluate() is the “hot path” for repeated evaluation.
Data Model
Value and Collection
The runtime value model is in src/value.rs:
Valueis anArc<ValueData>for cheap cloning.Collectionis an optimized container for the “mostly singleton” nature of FHIRPath:- small collections use an inline
SmallVec - larger ones switch to an
Arc-backed representation for O(1) clones
- small collections use an inline
ValueData variants cover:
- primitives:
Boolean,Integer,Decimal,String - temporals:
Date,DateTime,Time(each with explicit precision) Quantity(value + unit)Object(JSON object asHashMap<field, Collection>)Empty(FHIRPath empty collection marker; the VM treats{}as an empty collection, not a singletonEmptyvalue)
Temporal Precision and Timezones
Temporals model two distinct aspects:
- Precision (
DatePrecision,DateTimePrecision,TimePrecision) - Timezone presence for
DateTime:timezone_offset: None→ no timezone was specified (unknown/local)Some(0)→ explicitZSome(n)→ fixed offset seconds east of UTC (e.g.+08:00is28800)
This matters for both equality/ordering (comparability rules) and boundary functions.
Context and Variables
Context (src/context.rs) contains:
$this(current focus),$index(iteration index)resource/root(the original evaluation input)variablesfor environment values (e.g.%resource,%context,%rootResource(and legacy%root),%profile,%sct,%loinc, …)strictflag for strict semantic validation
The VM also tracks $total for aggregate() and threads it through nested evaluation contexts.
The Compiler Pipeline in Detail
Lexer
The lexer (src/lexer.rs) produces tokens for:
- identifiers, string/number literals
- operators (
=,!=,<,<=,|,in,contains, …) - temporal literals (e.g.
@2014-01,@2014-01-01T08,@T10:30:00.000) - special invocations (
$this,$index,$total,%resource, …)
Parser (AST)
The parser (src/parser.rs) builds an AST (src/ast.rs) using precedence rules that match the FHIRPath spec.
Notable parsing details:
is/asbind at the correct precedence relative to equality and union.- Temporal literals preserve precision and timezone presence.
- DateTime literals with hour-only forms are normalized to minute precision (
T08→T08:00) to match the FHIR constraints used by the HL7 suite.
Analyzer (AST → HIR)
The analyzer (src/analyzer.rs) is the semantic lowering step:
- Assigns expression types (
ExprType) and cardinality ranges where possible. - Resolves function names to
FunctionIdusingFunctionRegistry. - Rewrites syntactic sugar:
- collection literals
{ a, b, c }become chained unions (a | b | c)
- collection literals
- Produces a compact HIR (
src/hir.rs) better suited for codegen and execution.
HIR nodes include:
- literals, variables, unary/binary ops
- paths (
base + [segments]) where a segment is a field, choice segment, or index - function calls and method calls
- higher-order forms like
where,select,repeat,aggregate,exists(predicate), andall(predicate)
TypePass (HIR Type Resolution)
After analysis, a second pass (src/typecheck.rs) uses FhirContext to:
- Resolve FHIR element types along paths
- Validate path segments when a base type is provided
- Improve output types for downstream operations
This pass is where “compile-time strictness” comes from: passing a base type (e.g. Some("Patient")) enables StructureDefinition-backed validation.
Code Generation (HIR → Plan)
CodeGenerator (src/codegen.rs) emits a Plan (src/vm.rs):
opcodes: the bytecode instruction stream- constant pool: literal
Values - segment pool: field names, stored as interned
Arc<str> - type specifier pool: for
is/asoperations - function ID table and nested
subplansfor closures
Important design choices:
- Many higher-order functions compile to dedicated opcodes (
Where,Select,Repeat,Aggregate,Exists,All,Iif) so the VM can evaluate subplans lazily and with correct scoping for$this/$index/$total. - Standalone function calls compile with an implicit base of
$this(rather than “empty”) to match FHIRPath behavior.
VM Execution Model
The VM (src/vm.rs) is a stack machine:
- Each opcode consumes/produces
Collections. - Binary operators delegate to
src/vm/operations.rs. - Functions dispatch through
src/vm/functions.rs(split intosrc/vm/functions/*modules).
Opcode Model (Mental Reference)
The VM stack holds Collections (not single Values). Most opcodes follow “push result collection”.
Core opcodes:
PushConst: push a literal from the constant pool.ValueData::Emptyis treated specially and becomes an empty collection.LoadThis,LoadIndex,LoadTotal: load$this,$index,$total.Navigate(segment_id): field navigation (dot access).Index(i): positional indexing ([i]).CallBinary(impl_id): run a binary operator implementation (e.g.Eq,Union,DivInt, …).CallFunction(func_id, argc): call a normal function with an explicit base (for method calls the base is the receiver collection).
Higher-order opcodes execute nested Plans (“subplans”) with a derived Context:
Where(subplan_id): for each item in input, run predicate subplan with$this=item,$index=index, and retain matches.Select(subplan_id): map each item via projection subplan and flatten.Repeat(subplan_id): iterative projection with cycle detection.Aggregate(subplan_id, init_subplan_id?): fold left with$totalset and correct$indexpropagation.Exists(subplan_id?):exists()optionally with predicate.All(subplan_id):all(predicate)with spec behavior (all({}) == true).Iif(true_plan, false_plan, else_plan?): lazy branch evaluation. Important: branch VMs preserve$indexand$total.
If you’re debugging a behavioral difference, it’s often easiest to visualize the VM plan and then reason about stack effects.
Navigation (Navigate)
Navigate is responsible for “dot access” across collections. Key behaviors:
- Navigating a field over a collection flattens results.
- Missing fields return empty.
- In strict semantic mode (
Context.strict == true), unknown fields error when the base collection was non-empty.
Choice Types
FHIR choice elements are represented as [x] in StructureDefinitions but appear as expanded properties in JSON (e.g. valueQuantity, valueString, …).
This implementation supports two modes:
- Strict mode: direct access to choice expansions like
valueQuantityis rejected (encourages spec expressions likevalue.ofType(Quantity)). - Lenient mode: the VM can still find JSON choice expansions at runtime when compile-time validation is not enforced.
This is why CLI behavior depends on whether it compiles with a base type and/or strict semantics.
Operators and Semantics
src/vm/operations.rs implements the core operator semantics:
- arithmetic (
+,-,*,/,div,mod) - equality and equivalence (
=,!=,~,!~) - ordering (
<,<=,>,>=) - boolean logic and collection membership (
in,contains) - set-like behavior (
|union deduplicates)
Some semantics are subtle:
- Many operators are defined on singleton inputs; otherwise they may return empty (or error) per spec rules.
- Temporal comparability depends on precision and timezone presence.
- Equivalence (
~) includes special handling for strings (case/whitespace normalization), quantities (unit conversion + least-precise rounding), and complex types.
Type Operations: is, as, ofType
Type operations are implemented across:
- compile-time parsing/type-op nodes (
src/parser.rs,src/hir.rs) - runtime checks (
src/vm.rsforTypeIs/TypeAs, andsrc/vm/functions/type_helpers.rs)
Important behavior:
- Unqualified names can refer to System types (e.g.
Boolean,Integer) or FHIR types depending on context. is(T)supports inheritance checks for FHIR types (e.g.Age is Quantity).as(T)andofType(T)are implemented as exact type filters/casts (no inheritance), which matches HL7 suite expectations for subtype-vs-supertype edge cases.- Type specifiers are validated:
- unknown unqualified names like
string1produce execution errors - unknown
System.*types do not necessarily error (some HL7 cases expect non-matching rather than failure)
- unknown unqualified names like
Temporal Strings in FHIR JSON
FHIR JSON represents dates/dateTimes/times as strings. This engine keeps JSON strings as ValueData::String by default (no eager parsing).
To still support correct temporal comparisons in common cases (e.g. Period.start <= Period.end), src/temporal_parse.rs provides:
parse_temporal_pair()which tries to interpret two strings as date/dateTime/time values- lenient dateTime parsing with timezone handling
The comparison/equality operators consult this helper when comparing String vs String.
Function System
Registry
src/functions.rs provides a compile-time PHF map from function name → ID and metadata:
min_args,max_args- a conservative
return_type(TypeId::Unknownfor polymorphic functions)
Dispatch
At runtime, src/vm/functions.rs dispatches by numeric FunctionId into modules:
vm/functions/existence.rs:empty,exists,all,allTrue,subsetOf, …vm/functions/filtering.rs:ofType,extension, …vm/functions/conversion.rs:toInteger,toDecimal,toString, …vm/functions/string.rs,math.rs,utility.rs, …
Higher-order functions that require closures often compile to opcodes rather than “normal” functions, to preserve correct scope and laziness.
resolve() and Reference Handling
Engine can be constructed with a custom ResourceResolver (src/resolver.rs), which is used by the resolve() function. This allows integration with:
- database-backed reference resolution
- bundle-contained resources
- custom lookup strategies
If no resolver is configured, resolve() returns empty (or errors, depending on strictness and function behavior).
Feature Flags
From crates/fhir-fhirpath/Cargo.toml:
- Default features include
regex,base64,hex,html-escape(enables parts of the string/function surface). xml-supportadds XML evaluation helpers via the optionalfhir-formatdependency.
Debugging and Visualization
The crate supports visualizing compiler IR stages via src/visualize.rs:
- AST / HIR / Plan renderers
- output formats: ASCII tree, Mermaid, DOT
The CLI subcommand visualize wraps this capability.
Testing
This repo includes the HL7 R5 FHIRPath suite runner:
- Tests:
crates/fhir-fhirpath/tests/hl7/test_hl7_suite.rs - Source suite:
fhir-test-cases/r5/fhirpath/tests-fhir-r5.xml
Run the full suite:
Run a specific group:
HL7_TEST_GROUP=testPlus
Strengths
- End-to-end pipeline: full compiler + bytecode VM, not an AST interpreter.
- Fast repeated evaluation:
Plancaching and a compact opcode set. - Good debuggability: IR visualization (AST/HIR/Plan).
- Practical JSON support: runtime navigation works naturally over JSON objects/arrays.
- Temporal correctness: explicit precision and timezone presence enable spec-style comparability and boundary operations.
- HIR/VM specializations: dedicated opcodes for higher-order functions preserve correct
$this/$index/$totalscoping and laziness.
Weaknesses / Trade-offs
- Strict vs lenient split is nuanced: compile-time base type validation and runtime strict semantics interact with JSON choice expansions; users must choose the right mode.
- Not all FHIRPath features are implemented: some spec functions and terminology-dependent operations may be missing or stubbed (terminology services are out of scope).
- FHIR JSON temporals are strings: temporal semantics for strings rely on heuristic parsing (
src/temporal_parse.rs), which is pragmatic but not a full “typed JSON model”. - Type inference is best-effort: the engine depends on
FhirContextavailability; with incomplete contexts, typing falls back to unknowns. - Complex type equivalence is hard: deep equivalence across arbitrary FHIR datatypes can be expensive and subtle; current rules aim to match HL7 suite expectations but may need refinement for edge cases.
Contributing Notes
- Start with
src/engine.rsto follow the pipeline. - Add/adjust functions by extending:
src/functions.rs(registry metadata)src/vm/functions.rsand appropriate module undersrc/vm/functions/
- Use
visualizeoutput early when debugging parser precedence or HIR lowering.