shape
: a portable static type system for JSON-compatible data
This library implements a Rust-based type system that can represent any kind of JSON data, offering type-theoretic operations like simplification, acceptance testing and validation, child shape selection, union and intersection shapes, delayed shape binding, namespaces, automatic name propagation, copy-on-write semantics, error handling, and more.
The term "shape" is used here as a very close synonym to "type" but with a focus on structured JSON value types rather than abstract, object-oriented, functional or higher-order types. The core idea is that both shape and type denote an abstract (possibly infinite) set of possible values that satisfy a given shape/type, and we can ask interesting questions about the relationships between these sets of values. For example, when we simplify a shape, we are "asking" whether another smaller representation of the shape exists that denotes the same set of possible values.
[!CAUTION] This library is still in early-stage development, so you should not expect its API to be fully stable until the 1.0.0 release.
Installation
This crate provides a library, so installation means adding it as a dependency
to your Cargo.toml
file:
Documentation
See the cargo doc
-generated documentation
for detailed information about the Shape
struct and ShapeCase
enum.
The Shape
struct
pub type Ref<T> = Arc;
To support recombinations of shapes and their subshapes, the top-level
Shape
struct wraps a reference counted ShapeCase
enum variant. Reference
counting not only simplifies sharing subtrees among different Shape
structures, but also prevents rustc
from complaining about the Shape
struct
referring to itself without indirection.
The meta
field stores metadata about the shape, such as source code locations
where the shape was defined or derived from. This metadata is preserved when
shapes are combined or transformed.
The Shape
struct implements Hash
and PartialEq
based on its case
field,
ignoring the metadata:
Copy-on-write semantics
The shape
library leverages Arc
(Atomically Reference Counted) pointers to
provide efficient copy-on-write semantics for Shape
instances. Since shapes are
immutable after creation, multiple Shape
instances can safely share references
to the same underlying ShapeCase
data without copying.
pub type Ref<T> = Arc;
When a shape needs to be modified (such as during name propagation or metadata
updates), the library uses Arc::make_mut
to perform copy-on-write operations.
This ensures that:
- Shapes are only copied when actually modified, not when simply accessed
- Multiple shapes can share common subtrees efficiently
- Thread safety is maintained through
Arc
's atomic reference counting - Memory usage is minimized through structural sharing
This copy-on-write approach reduces memory usage when working with large shape hierarchies or when creating many similar shapes that share common substructures.
Obtaining a Shape
Instead of tracking whether a given ShapeCase
has been simplified or not, we
can simply mandate that Shape
always wraps simplified shapes.
This invariant is enforced by restricting how Shape
instances can be
(publicly) created: all Shape
instances must come from calling the Shape::new
method with a simplified ShapeCase
and associated metadata.
The impl IntoIterator<Item = ...>
parameters are intended to allow maximum
flexibility of iterator argument passing style, including [shape1, shape2, ...]
, vector slices, etc.
Shape validation and acceptance testing
The library provides two primary methods for checking shape compatibility:
-
shape.validate(&other) -> Option<ShapeMismatch>
- Returns detailed error information when shapes don't match, including a hierarchy of causes explaining exactly where validation failed. -
shape.accepts(&other) -> bool
- A convenience method that returnstrue
if validation succeeds (equivalent toshape.validate(&other).is_none()
).
# use ;
let expected = string;
let received = int;
// Quick boolean check
assert!;
// Detailed validation information
let mismatch = expected.validate;
assert_eq!;
For example, a Shape::one
union shape accepts any member shape of the union:
let int_string_union = one;
assert!;
assert!;
assert!;
// Using validate for more details
let mismatch = int_string_union.validate;
assert!; // Float doesn't match Int or String
Error satisfaction
A ShapeCase::Error
variant generally represents a failure of shape processing,
but it can also optionally report Some(partial)
shape information in cases
when there is a likely best guess at what the shape should be.
An Error
with Some(partial)
behaves as much like that partial shape as
possible - it accepts the same shapes, supports the same field/item access, etc.
However, unlike regular shapes, errors are never deduplicated during
simplification, ensuring each error's diagnostic message is preserved.
This partial: Option<Shape>
field allows errors to provide guidance
(potentially with chains of multiple errors) without interfering with the
accepts logic.
let error = error_with_partial;
assert!;
assert!;
assert!;
assert!;
The null
singleton and the None
shape
ShapeCase::Null
represents the singleton null
value found in JSON. It
accepts only itself and no other shapes, except unions that allow
null
as a member, or errors that wrap null
as a partial shape.
ShapeCase::None
represents the absence of a value, and is often used to
represent optional values. Like null
, None
is satisfied by (accepts) only
itself and no other shapes (except unions that include None
as a member, or
errors that wrap None
as a partial shape for some reason).
When either null
or None
participate in a Shape::one
union shape, they are
usually preserved (other than being deduplicated) because they represent
distinct possibilities. However, ::Null
and ::None
do have a noteworthy
difference of behavior when simplifying ::All
intersection shapes.
When null
participates in a ShapeCase::All
intersection shape, it "poisons"
the intersection and causes the whole thing to simplify to null
. This allows a
single intersection member shape to override the whole intersection, which is
useful for reporting certain kinds of error conditions (especially in GraphQL).
By contrast, None
does not poison intersections, but is simply ignored. This
makes sense if you think of Shape::all
intersections as merging their member
shapes: when you merge None
with another shape, you get the other shape back,
because None
imposes no additional expectations.
Namespaces and named shapes
The shape
library provides a namespace system for managing collections
of named shapes and enabling recursive type definitions. Namespaces solve two
important problems: they allow shapes to reference themselves (creating recursive
types), and they provide automatic name propagation throughout shape hierarchies.
Basic namespace usage
A Namespace
is a collection of named Shape
s that supports a two-stage
lifecycle:
# use ;
// Create an unfinalized namespace
let mut namespace = new;
// Insert a shape with a name - this triggers automatic name propagation
let id_shape = one;
let named_shape = namespace.insert;
// The shape now has names throughout its hierarchy
assert_eq!;
assert_eq!;
// Finalize the namespace to resolve all name references
let final_namespace = namespace.finalize;
Automatic name propagation
When a shape is inserted into a namespace, the library automatically propagates derived child names to all nested shapes within the hierarchy. This means that every field, array element, and nested structure receives contextual naming based on its position:
# use ;
let user_shape = object;
let mut namespace = new;
let named_user = namespace.insert;
// Names are automatically propagated to nested shapes:
// - User.name for the name field
// - User.age for the age field
// - User.contacts for the contacts array
// - User.contacts.* for elements in the contacts array
assert_eq!;
Name
s can only be assigned to Shape
s through Namespace
insertion.
Recursive type definitions
Namespace
s enable recursive type definitions through ShapeCase::Name
references. A shape can reference other shapes in the namespace by name,
including itself:
// Define a recursive JSON type that can contain arrays and objects of itself
let json_shape = one;
let mut namespace = new;
namespace.insert;
let final_namespace = namespace.finalize;
let json_type = final_namespace.get.unwrap;
// The resolved type can now handle recursive structures
assert!;
assert!;
assert!;
Namespace
finalization
Namespace
s have two distinct phases:
Namespace<NotFinal>
: Allows mutations likeinsert()
andextend()
Namespace<Final>
: Read-only with allShapeCase::Name
references resolved
The finalization process:
- Ensures exclusive ownership of all shape references for thread safety
- Resolves
ShapeCase::Name
references by updating theirWeakScope
to provide weak references to all named shapes - Allows recursive type self-references that expand incrementally/lazily to avoid infinite cycles
- Prevents
Arc
memory leaks by using only weak references (WeakScope
) fromShapeCase::Name
to the targetShape
defined in someNamespace<Final>
let mut namespace = new;
namespace.insert;
namespace.extend; // Can merge with other namespaces
let final_namespace = namespace.finalize; // No more mutations allowed
let my_type = final_namespace.get; // Resolved shape available
This namespace system, combined with automatic name propagation and copy-on-write semantics, enables complex type modeling scenarios while maintaining performance and thread safety.
Metadata management and MergeSet
The shape
library implements a metadata management system that tracks
provenance information (source locations and names) separately from the logical
structure of shapes (as defined by the ShapeCase
enum). This separation allows
shapes to be compared and hashed based purely on their structural content, while
preserving all metadata for debugging, error reporting, and tooling purposes.
The challenge: preserving metadata during simplification
One risk of using only logical structure for equality/hashing and excluding metadata occurs when storing such data structures in a set or map. Structurally equivalent shapes are generally deduplicated by these data structures, typically clobbering/discarding metadata from one side of the collision.
Since Shape
deduplication is a common and important simplification step, when
shapes are combined or simplified, the library uses a data structure called a
MergeSet
to ensure metadata is merged from all contributing sources. For
example, the null
shape may end up in a ShapeCase::One
union from multiple
different sources, carrying different names, locations, and other metadata:
# use ;
let mut ns = new;
// Insert shapes with different names - this gives them name metadata
let nullable_id = ns.insert;
let nullable_str = ns.insert;
let optional = ns.insert;
// Each union contains a null with different name metadata
assert_eq!;
assert_eq!;
assert_eq!;
// Now create a union of these already-named shapes
let combined = one;
// The union deduplicates structurally identical shapes.
// All three nulls merge into one, but it retains all the names via MergeSet.
assert_eq!;
// With names shown, the output reveals how the shapes were deduplicated:
// The null has accumulated all three names via MergeSet
assert_eq!;
The MetaMergeable trait
The MetaMergeable
trait enables efficient in-place merging of metadata between
structurally identical shapes. When duplicate shapes are detected during
operations like union or intersection formation, their metadata can be combined
without duplicating the structural information:
Both Shape
and Name
implement MetaMergeable
, allowing them to:
- Combine location sets from multiple sources
- Merge name information across shape transformations
- Preserve all provenance data during simplification
- Perform copy-on-write updates only when metadata actually changes
MergeSet: deduplication with metadata preservation
MergeSet<T>
is a specialized collection that combines the deduplication
properties of IndexSet
with automatic metadata merging. When a duplicate item
is inserted, instead of being ignored, its metadata is merged with the existing
item:
# use Shape;
# use Location;
let loc1 = new;
let loc2 = new;
let loc3 = new;
// Create shapes with different source locations but identical structure
let shapes = ;
let union = one;
// The resulting union contains two distinct shapes (String and Int),
// but the String shape now has location information from both loc1 and loc3
assert_eq!;
This system is used internally by:
Shape::one()
- Unions that deduplicate member shapes while preserving all metadataShape::all()
- Intersections that merge compatible shapes and their provenanceNamespace
operations - When inserting namedShape
s into aNamespace<NotFinal>
, multiple shapes added with the same name are merged usingShape::all
, which often leads to merging the metadata ofShapeCase
-equivalent shapes in the resulting intersection
The ShapeVisitor trait
The ShapeVisitor
trait provides a visitor pattern for traversing and analyzing
Shape
structures. It enables custom logic for processing different
ShapeCase
variants while maintaining full control over traversal decisions,
especially around recursive name resolution.
Visitor methods and default fallback
The trait defines visit methods for each ShapeCase
variant, all with default
implementations that delegate to a default()
method:
Each visit method receives both the complete Shape
(with metadata) and the
inner data specific to that variant. This design allows visitors to access both
structural information and provenance data during traversal.
Critical design decision: name resolution
An important design choice of ShapeVisitor
lies in how it handles
ShapeCase::Name
variants. The visitor does NOT automatically resolve name
bindings, even when a WeakScope
is available. This design prevents infinite
loops when traversing recursive type definitions:
// In the visitor implementation:
Name => visitor.visit_name,
// Notice: no automatic call to weak.upgrade(name) here!
The comment in the source code explains:
// It's tempting to perform that resolution here unconditionally,
// but then the visitor could fall into an infinitely deep cycle, so
// we have to let the implementation of visit_name decide whether to
// proceed further.
This decision gives visit_name
implementations complete control over whether
and how to resolve named references, enabling cycle-aware traversal strategies.
Example: detecting unbound names
Here's a practical visitor that finds ShapeCase::Name
variants that cannot be
resolved in their WeakScope
, which is useful for validating shape definitions
before namespace finalization:
# use ;
# use HashSet;
;
// Usage example:
# /*
let mut detector = UnboundNameDetector::new();
shape.visit_shape(&mut detector)?;
let unbound = detector.into_unbound_names();
if !unbound.is_empty() {
println!("Found unbound names: {:?}", unbound);
}
# */
Use cases
The ShapeVisitor
trait enables:
- Reference validation: Detecting unresolved
ShapeCase::Name
references before namespace finalization - Shape analysis: Collecting metrics about shape complexity, depth, or variant distribution
- Custom serialization: Converting shapes to alternative formats while handling recursive references appropriately
- Constraint checking: Validating shapes against custom rules or schemas
- Transformation: Building modified shape trees with cycle-aware logic
- Documentation generation: Extracting shape information for API documentation while handling recursive types safely
The key insight is that ShapeVisitor
provides a safe foundation for complex
shape analysis by requiring explicit decisions about name resolution, preventing
the infinite loops that would otherwise occur with recursive type definitions.
JSON validation and conversion
The library provides direct support for validating JSON data against shapes and converting JSON values into their corresponding shape representations.
Converting JSON to shapes
# use Shape;
# use json;
let json = json!;
let json_shape = from_json;
assert_eq!;
Validating JSON against shapes
# use Shape;
# use json;
let expected_shape = object;
let valid_json = json!;
let invalid_json = json!;
assert!;
assert!;
// Get detailed mismatch information
let mismatch = expected_shape.validate_json;
assert!;
Child shape navigation
Shapes provide methods to navigate to the shapes of nested fields and array elements.
Field and item access
# use Shape;
let object_shape = object;
// Access field shapes
assert_eq!;
assert_eq!;
assert_eq!;
let array_shape = array;
// Access array element shapes
assert_eq!;
assert_eq!;
assert_eq!;
Field access on arrays
When accessing a field on an array shape, the field access is mapped over all array elements:
# use Shape;
let users_array = array;
// Accessing a field on an array maps it over elements
let names_shape = users_array.field;
assert_eq!;
Special shape types
Unknown shape
Shape::unknown()
accepts any shape and is absorbed when it appears in unions:
# use Shape;
let unknown = unknown;
// Unknown accepts everything
assert!;
assert!;
assert!;
// Unknown subsumes all other shapes in unions
let union_with_unknown = one;
assert_eq!;
Dict shape
Shape::dict()
represents objects with arbitrary string keys but consistent value types:
# use Shape;
let string_dict = dict;
// Accepts objects with any keys, as long as values are strings
let dict_instance = object;
assert!;
Record shape
Shape::record()
represents objects with exactly the specified fields and no others:
# use Shape;
let user_record = record;
// Record shapes have no rest/wildcard - only exact fields
assert_eq!;
assert_eq!;
Location tracking
The Location
system tracks where shapes originate in source text:
# use ;
let loc1 = new;
let loc2 = new;
let shape_with_location = string;
// Locations are preserved through operations
let union = one;
// The union's string shape now has both locations
Pretty printing
Shapes provide human-readable string representations:
# use Shape;
let complex_shape = object;
assert_eq!;
// With names (when inserted into a namespace)
# use Namespace;
let mut ns = new;
let named = ns.insert;
assert_eq!;
Error shapes
Error shapes represent validation failures and can carry partial shape information. An
Error
with Some(partial)
shape behaves as much like that partial shape as possible
(for acceptance testing, field access, etc.), except that it won't be deduplicated
during simplification - each error remains distinct to preserve diagnostic information:
# use Shape;
let error = error;
let error_with_partial = error_with_partial;
// Errors with partial shapes satisfy shapes that accept the partial
assert!;
assert!;
// Multiple errors with the same partial shape remain distinct in unions
let error1 = error_with_partial;
let error2 = error_with_partial;
let union = one;
// Both errors are preserved, not deduplicated like regular shapes would be
// Errors can be nested in structures
let object_with_error = object;
// Errors can chain: an Error with another Error as its partial
let inner_error = error_with_partial;
let outer_error = error_with_partial;
// This creates a chain of errors with diagnostic information at each level
assert!; // Accepts int due to chain
Shape validation hierarchy
The Shape::validate()
method returns Option<ShapeMismatch>
providing
detailed, hierarchical information about validation failures. Each
ShapeMismatch
contains a causes: Vec<ShapeMismatch>
field that creates a
tree of errors, allowing precise identification of where validation failed in
complex nested structures.
ShapeMismatch structure
# use ;
When validation fails, the returned ShapeMismatch
captures not just the
top-level mismatch, but also the specific sub-shapes that caused the failure.
This hierarchical structure helps debug complex shape validation issues.
Example: object field validation
Here's a real example from the test suite showing how field-level mismatches appear in the causes array:
# use ;
let object_a_bool_b_int = record;
let object_a_int_b_bool = record;
// Validation fails with detailed causes for each field mismatch
assert_eq!;
This hierarchical error structure allows tools and error reporters to:
- Show exactly which fields or array elements failed validation
- Provide context-aware error messages at each level
- Navigate from high-level structural mismatches down to specific value differences
- Build detailed validation reports for complex nested data structures
The causes
field can be empty for leaf-level mismatches (like primitive type
differences), or contain multiple entries for compound shapes where several
sub-validations failed simultaneously.