Crate symbolic_mgu

Crate symbolic_mgu 

Source
Expand description

A library to explore the most general unifier of mathematical formulas.

Ἔξεστι περὶ τούτου μηδὲν ὑπολαμβάνειν καὶ μὴ ὀχλεῖσθαι τῇ ψυχῇ· αὐτὰ γὰρ τὰ πράγματα οὐκ ἔχει φύσιν ποιητικὴν τῶν ἡμετέρων κρίσεων.

It’s possible to form no opinion on this and not be troubled, because things themselves don’t force our judgments.

— Marcus Aurelius, Meditations, VI.52


§What This Library Is

symbolic-mgu is a Rust crate for experimenting with unification in symbolic logic. It provides a way to represent logical formulae as structured objects, then apply operations like unification, contraction, and application in order to explore the space of consequences that follow from given axioms.

Rather than telling the machine what to prove, you give it rules, and the library shows you what can be unified.

The library is intentionally minimal, focusing on correctness over convenience. It enforces structural constraints (type matching, arity checking, occurs check) but makes no assumptions about semantics or presentation. For example, while operators like “and” are commutative and associative, the library doesn’t provide variadic syntax—you must choose whether And(α, β, γ) maps to And(And(α, β), γ) or And(α, And(β, γ)). This keeps the system simple, trustworthy, and free of semantic assumptions that would limit its applicability to different logical frameworks.


§Why It Exists

§From Stoicism to Frege to Meredith

The history of logic moved from persuasive arguments (Stoics, Aristotle)
→ to late 19th century symbolic notation (Frege, 1879)
→ to 20th century mechanical proof systems (Hilbert, Meredith).

Frege gave us a formal notation in which logic could be carried out and checked without reliance on human judgment. This paved the way for mechanical reasoning systems, where proofs emerge from rules applied to symbols. Around 1900, following Frege’s formal notation and Hilbert’s program, the meaning of “proof” changed from a persuasive argument expressed in natural language to a rigorously formal sequence of symbolic steps governed by explicit rules — something that could itself be studied and eventually mechanized.

Meredith, with his principle of condensed detachment, showed that even complex derivations could be reduced to a simple operation of substitution and inference.

Here, we take the next step, following Meredith: generalizing to unifications that don’t always return to the rule of detachment (modus ponens). With a most general unifier (MGU), the machine can search the space of consequences from a given set of axioms — not by prejudging what matters, but by systematically exploring what follows.

This library and its examples demonstrate how symbolic logic and automated proof search can be expressed in code, letting the machine discover paths through the landscape of formal reasoning.

The goal is not to automate every proof. We do not provide a way to find a particular proof among the possibilities you supply. Instead, the library offers a precise, programmable framework for exploring how formulae transform under unification — and, given unlimited time and space, for uncovering all reachable proofs.


§What You Can Do With It

§Core Unification Operations

  • Define STATEMENTs: assertions with hypotheses and distinctness constraints
  • Unify terms using Robinson’s algorithm with occurs checking
  • Use APPLY to connect one STATEMENT’s hypothesis with another’s assertion
  • Use CONTRACT to unify two hypotheses inside a STATEMENT
  • Put STATEMENTs into canonical form for easier comparison of conveyed meaning
  • Check inclusion (S₁ ⊇ S₂) and identity (S₁ ≡ S₂) between STATEMENTs

§Automated Theorem Discovery

  • Enumerate all Boolean terms up to a specified depth using the search module
  • Test whether a formula is a tautology by exhaustive evaluation
  • Find counterexamples by searching for assignments that falsify a claim
  • Systematically explore the space of derivable theorems from given axioms
  • Memory-efficient iteration handles millions of terms without exhausting memory

§Error Handling & Validation

  • Inspect error conditions with fine-grained error types:
    • Failed unification with descriptive messages
    • Distinctness constraint violations
    • Type mismatches between slots and trees
    • Index and arity errors

§Example Applications

Build theorem provers, proof assistants, or logic explorers:

  • Implement condensed detachment systems (Meredith-style)
  • Search for shortest proofs of propositional tautologies
  • Discover new theorems reachable from axiom systems
  • Validate proof steps in formal verification workflows
  • Generate and test logical equivalences

This keeps the math rigorous, but gives you the tools to build higher-level reasoning systems in Rust.


§Who It’s For

  • Rust developers curious about logic programming and unification.
  • Logicians / proof theorists who want a programmable lab for testing rules.
  • Anyone exploring automated reasoning who prefers clarity and simplicity over heavyweight theorem provers.

§Our System

Every STATEMENT is a triple of an assertion SENTENCE, an enumerated, possibly empty, list of distinct hypothesis SENTENCEs, and a DISTINCTNESS graph where the nodes are Metavariables used in the STATEMENT and any edge between two Metavariables means no substitution of a tree into one Metavariable may equal or include part of what was substituted for the other.

The hypotheses of a STATEMENT are stored as an ordered list but semantically treated as a set for equality; operations like CONTRACT and APPLY require indices into that list.

STATEMENT A is included in STATEMENT B when there is an admissible mapping of the Metavariables of B that makes its SENTENCEs identical to those of A (modulo a possible renumbering of any hypotheses) and makes the edges a subset of the edges of A.

When A and B mutually include each other, the mapping is of Metavariables to Metavariables and we say A and B are identical.

Given an ordering on Metavariables of each type and an ordering on NODEs, it follows there is a canonical relabeling of the Metavariables and reordering of the hypotheses, which creates an identical STATEMENT with minimal lexicographic ordering, which we call its canonical form.

Likewise, given two STATEMENTs A and B, there is a C that is identical to B but with all Metavariables disjoint from those of A.

Example: “Modus Ponens”, a named rule, is a STATEMENT with

  • an assertion, “β”,
  • hypothesis 1, “Implies(α, β)”,
  • hypothesis 2, “α”, and a
  • DISTINCTNESS graph with no edges.

(Implies, α and β, obviously, are of TYPE Boolean.)

Let’s define CONTRACT(A, n, m), which takes STATEMENT A and attempts to unify the contents of hypotheses numbered n and m, and if that is possible, return the form with minimal lexicographic ordering.

Let’s define APPLY(A, n, B), which attempts to unify the contents of hypothesis numbered n of STATEMENT A with the assertion of STATEMENT B. If that is possible, the result has the remapped assertion of A and hypotheses consisting of the (distinct) union of those that remain from A and any from B.

§Formal expression of the main content

§DISTINCTNESS Graph

Given a finite set of metavariables V, a DISTINCTNESS graph is an undirected graph G = (V, E) where E ⊆ {{x, y} | x ≠ y ∈ V}

An edge {x, y} ∈ E enforces: No substitution may assign trees to x and y that share any metavariable in common.

§STATEMENT

A STATEMENT is a triple (A; H; D) where:

  • A is the assertion, a SENTENCE.
  • H = [H₁, …, Hₙ] is a list of hypotheses, each a SENTENCE.
  • D is a DISTINCTNESS graph over the set of all metavariables used in A and in any Hᵢ.

We require that A, each Hᵢ, and D be well-formed as per the above definitions.

§Substitution and Unification

A substitution, σ is a partial function mapping metavariables to trees of matching TYPE.

We write τσ for the tree resulting from applying substitution σ recursively to all metavariables in τ.

Two trees, τ₁ and τ₂ are unifiable iff there exists a substitution, σ, such that τ₁σ = τ₂σ.

A substitution must be acyclic: no metavariable may map to a tree containing itself (the occurs check).

Note: The occurs check prevents cyclic substitutions, but does not restrict the construction of self-referential expressions. Terms like Equals(A, A) or Contains(A, A) are syntactically valid. The system enforces structural well-formedness and type constraints, but does not validate semantic correctness—that responsibility belongs to the user’s choice of axioms. For example, proving Equals(Union(x, y), Union(y, x)) is not the system’s responsibility; the user must supply such proofs from their chosen axioms. Distinctness checks operate syntactically (shared metavariables), not semantically (semantic equivalence). This intentional fragility to syntactic variation keeps the system applicable to diverse logical frameworks.

A most general unifier (MGU) is a unifier, σ, such that any other unifier factors through σ (that is, any other unifier σ’ can be expressed as σ followed by another substitution).

§Inclusion and Identity of STATEMENTs

Let S₁ = (A₁; H₁; D₁) and S₂ = (A₂; H₂; D₂).

We say S₁ includes S₂ (written S₁ ⊇ S₂) iff there exists any substitution σ such that:

  • A₁σ = A₂ (S₁’s assertion, after substitution, equals S₂’s assertion);
  • For each hᵢ in H₁, hᵢσ equals some hⱼ in H₂ (S₁’s hypotheses, after substitution, form a subset of S₂’s); and
  • D₂ is a super-graph of D₁σ (S₂’s distinctness constraints include all of S₁’s constraints after transformation). Note: Each edge {v₁, v₂} in D₁ transforms to edges between each metavariable in v₁σ and each metavariable in v₂σ. When a metavariable is substituted with a 0-arity node, it contributes no metavariables to the transformed graph. When substituted with a tree containing multiple metavariables, one original edge may become many edges.

Note: Multiple substitutions may satisfy these conditions. Any one suffices to demonstrate inclusion.

Intuitively: S₁ ⊇ S₂ means S₁ is more general and S₂ is more specific. The substitution σ specializes S₁ into S₂.

Equivalently, we can write S₂ ⊆ S₁ (S₂ is included in S₁), meaning the same relation from the opposite perspective.

We say S₁ and S₂ are identical iff S₁ ⊇ S₂ and S₂ ⊇ S₁ (mutual inclusion via variable renaming).

§Example of Inclusion

Consider:

  • S₁ = (P → P; []; {}) — An axiom with no hypotheses or distinctness constraints
  • S₂ = ((x=x → y=y) → (x=x → y=y); [x ∈ ℕ]; {x≠y}) — A statement with hypothesis and distinctness

We have S₁ ⊇ S₂ (the axiom includes the statement) because:

  • Substitution σ = {P ↦ (x=x → y=y)} transforms S₁ into S₂:
  • A₁σ = (P → P)σ = ((x=x → y=y) → (x=x → y=y)) = A₂
  • H₁σ = []σ = [] ⊆ [x ∈ ℕ] = H₂
  • D₁σ = {}σ = {} ⊆ {x≠y} = D₂

The axiom is more general (fewer constraints), the statement is more specific (more constraints).

§Canonical Form

A canonical (normalized) form of a STATEMENT is one where:

  • Metavariables are renamed to be minimal under a fixed total ordering on 𝑀ₜ (this ordering is externally defined in each implementation),
  • Hypotheses are reordered to be lexicographically minimal under that metavariable ordering, and
  • Two STATEMENTs are equal if their canonical forms are identical.

Note on Canonicalization: Canonicalization is an optional operation that does not affect the meaning (logical content) of a STATEMENT. It provides a unique representative for α-equivalent statements, useful for:

  • Comparing statements for semantic equality (mutual inclusion)
  • Deduplication in proof databases
  • Consistent presentation to users

Canonicalization may be computationally expensive (factorial in the number of hypotheses), so it is typically performed:

  • Once, after a sequence of operations completes (not after each intermediate step)
  • On-demand, when comparison or storage requires it
  • Explicitly by the user or high-level operations (like condensed detachment)

The operations CONTRACT, APPLY, and APPLY_MULTIPLE return results that are logically correct but not necessarily in canonical form. Users may call the canonicalize operation explicitly when needed.

§Relabeling Disjointness

Given two STATEMENTs, S₁ and S₂, a disjoint copy S₂’ of S₂ is obtained by renaming all metavariables in S₂ so that they are disjoint from those in S₁.

§Operation: CONTRACT

Given S = (A; H; D) and indices n ≠ m,

CONTRACT(S, n, m):

  • Attempts to unify Hₙ and Hₘ.
  • If unification fails or violates a constraint in D, returns an error state.
  • Else, applies the resulting MGU σ to all parts of S.
  • Returns the STATEMENT with:
    • Updated assertion
    • Hypotheses with the nth and mth removed and the unified hypothesis added once. Note: the substitution may cause other hypothesis pairs to become identical, which are also eliminated (as hypotheses are treated semantically as a set).
    • A transformed DISTINCTNESS graph : for each edge {v₁, v₂} in D, if v₁σ and v₂σ contain metavariables, add edges between all pairs of distinct metavariables from v₁σ and v₂σ.

§Operation: APPLY

Given two STATEMENTs, S₁ = (A₁; H₁; D₁) and S₂ = (A₂; H₂; D₂), and index n:

APPLY(S₁, n, S₂):

  • Relabel S₂ → S₂’, to be disjoint from S₁.
  • Attempt to unify H₁[n] with the assertion of S₂’.
  • If unification fails or violates distinctness requirements, return an error.
  • Else, apply the MGU σ to:
    • A₁, all H₁ (excluding H₁[n]), and all H₂’
  • Merge these as the new hypothesis list (eliminating duplicates, as hypotheses are treated semantically as a set)
  • Merge DISTINCTNESS graphs under σ: for each edge {v₁, v₂} in D₁ or D₂, if v₁σ and v₂σ contain metavariables, add edges between all pairs of distinct metavariables from v₁σ and v₂σ
  • Return a new STATEMENT (not necessarily in canonical form; call canonicalize explicitly if needed).

§Simple Axioms and Empty Structures

A STATEMENT with no hypotheses and no distinctness constraints (D has no edges) is called a simple axiom. For example, the Simp axiom Implies(α, Implies(β, α)) is a simple axiom.

Constants are NODEs of arity 0. There is no empty SENTENCE.

§Practical Considerations

§Hypothesis Ordering: Operational vs. Semantic

Hypotheses are stored as an ordered list but treated semantically as a set:

  • Operationally: The operations APPLY(S₁, n, S₂) and CONTRACT(S, n, m) require integer indices to identify which hypotheses to unify. The order matters for specifying which operation to perform.

  • Semantically: Two statements that differ only in hypothesis ordering are considered identical (they mutually include each other). The ordering carries no logical meaning.

This creates a tension: indices are necessary to perform operations, but meaningless to interpret results.

§Indices are Ephemeral

Hypothesis indices should not be considered stable across operations:

  • APPLY(S₁, n, S₂) consumes hypothesis n from S₁, shifts remaining indices, and may add hypotheses from S₂
  • Canonicalization reorders hypotheses to achieve minimal lexicographic ordering
  • After any operation, the mapping from “old index” to “new index” is not preserved

Applications should not rely on indices persisting; they are purely operational handles for the current statement.

§Why Canonical Form Matters

Because hypothesis permutations are semantically equivalent but operationally distinct, a database of statements could accumulate many copies of the “same” statement with different hypothesis orderings. Canonical form prevents this:

  • Converts all equivalent statements to a single normal form
  • Enables efficient deduplication and equality checking
  • Provides a stable representation for storage and comparison

Without canonicalization, a proof search application might redundantly explore permutations of previously-discovered statements.

§Application Design Pattern

The intended use case is an application that maintains a database of canonical statements with proof receipts:

  1. Start with a set of axioms (already in canonical form)
  2. Apply APPLY and CONTRACT operations to generate new statements
  3. Canonicalize each result before adding to the database
  4. Store a receipt showing how each statement was derived (which axioms, which operations)
  5. Check new statements for inclusion in existing statements before adding (avoid redundancy)

This pattern supports automated theorem discovery while managing the computational cost of maintaining canonical forms.

§Computational Cost

Canonicalization is an idealized process with no guaranteed efficient algorithm:

  • Finding minimal lexicographic ordering may require trying all hypothesis permutations
  • With many hypotheses (especially those unconnected to the assertion), this becomes costly
  • The current implementation attempts full canonicalization; applications with performance constraints may need to evaluate trade-offs

For practical proof search, limiting the number of hypotheses or using heuristics to prune the search space may be necessary.

§Scholarly Context and Mathematical Foundations

§Mathematical Background

The symbolic-mgu crate implements typed symbolic unification—the process of finding most general unifiers (MGUs) between symbolic expressions—and supports inference via condensed detachment. These mechanisms, first formalized by Robinson (1965) and Meredith (1968), remain central to the automation of logical reasoning. The project’s goal is to expose the algebraic and computational essence of unification in a safe, modular Rust design.

§Reproducible Experiments

Each API component is designed for reproducibility and inspection. Developers can reproduce or compare unifiers deterministically, trace factory operations, and serialize internal representations. This approach aims to make symbolic reasoning experiments both verifiable and auditable, aligning with open-science standards in formal methods.

§Relation to Prior Work

The symbolic-mgu crate situates itself in a lineage beginning with Robinson’s (1965) introduction of unification as the computational core of resolution, extended through Meredith’s (1953) development of condensed detachment as a minimal, self-contained inference rule. Where traditional implementations have been largely untyped and imperative, this project re-casts those foundations in a strongly-typed, ownership-safe setting. Rust’s type system enforces invariants—such as variable distinctness, substitution immutability, and occurs-check soundness—that earlier theorem provers expressed only informally. The result is both a research platform for exploring symbolic reasoning and a reproducible, audit-friendly implementation that demonstrates how modern programming languages can make the classical machinery of unification precise, efficient, and verifiable.

Unlike large monolithic provers such as Prover9 or minimalist goal-directed systems like leanCoP, symbolic-mgu focuses narrowly on the algebra of unification itself rather than full proof search. Its aim is to make the unifier and substitution machinery transparent, typed, and modular—objects that can be inspected, serialized, or extended by other reasoning engines. In this sense, the crate serves as a research kernel: it exposes the mechanisms underlying condensed detachment and term rewriting without committing to a particular logic or search strategy. That separation of concerns—between symbolic correctness and procedural heuristics—makes the project a bridge between classical logic programming theory and modern typed functional design.

§Mathematical References and Books for Further Reading

§Historical & Philosophical Foundations

  • Marcus Aurelius (167). Meditations Stoic philosophy emphasizing rational analysis—a precursor to treating mathematical reasoning as procedure rather than intuition.

  • Frege, G. (1879). Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (Concept Script, a formal language of pure thought modelled upon that of arithmetic).

  • van Heijenoort, J. (1967). From Frege to Gödel, a source book in mathematical logic, 1879–1931 Contains a translation of Frege (1879).

§Classical Logic & Proof Theory

  • Smullyan, R. (1968) First-Order Logic Also available as a 1995 Dover Publications reprint.

  • Schütte, K. (1977). Proof Theory

  • Smith, P. (2020). An Introduction to Formal Logic, 2nd ed.

§Condensed Detachment

§Unification Theory and Algorithms

  • Robinson, J. A. (1965). A Machine-Oriented Logic Based on the Resolution Principle. Journal of the ACM, 12(1), 23–41. Available at https://dl.acm.org/doi/abs/10.1145/321250.321253

  • Plotkin, G. D. (1972). Building-in Equational Theories. Machine Intelligence 7, Edinburgh University Press, 73–90.

  • Martelli, A., & Montanari, U. (1982). An Efficient Unification Algorithm. ACM Transactions on Programming Languages and Systems, 4(2), 258–282. Available at https://dl.acm.org/doi/abs/10.1145/357162.357169

§Automated Theorem Proving: Textbooks

  • Chang & Lee (1973). Symbolic Logic and Mechanical Theorem Proving

  • Wos, L., Overbeek, R., Lusk, E., & Boyle, J. (1992). Automated Reasoning: Introduction and Applications (2nd ed.). McGraw-Hill, Inc., New York.

  • Fitting, M. (1996). First-Order Logic and Automated Theorem Proving

§Automated Theorem Proving: Systems and Implementations

§Broader Perspectives

  • Hofstadter, D. (1980). Gödel, Escher, Bach: An Eternal Golden Braid

  • Stillwell, J. (2022). The Story of Proof: Logic and the History of Mathematics

§Citation and Scholarly Use

If you use symbolic-mgu in published research, please cite it as a software artifact alongside foundational works like Robinson (1965), for example:

symbolic-mgu (v0.1.x) — a Rust library for typed symbolic unification and condensed detachment. Available at crates.io/crates/symbolic-mgu.

§Acknowledgments

Thank you to the Metamath community.

Editorial and explanatory materials in this project were prepared with the assistance of OpenAI’s ChatGPT (GPT-5) and Anthropic’s Claude (Claude Sonnet 4.5), used under the direction and review of the project maintainer. All design, implementation, and interpretive decisions remain those of the human contributors. This acknowledgment is included for scholarly transparency in keeping with best practices for AI-assisted research and technical documentation.

Modules§

bool_eval
Boolean evaluation of terms.
logic
Helper functions for building fundamental logic statements.
metamath
Metamath database parser with UTF-8 and filesystem abstraction support.
search
Term enumeration for automated search.

Macros§

byte_try_from_signed
Implements fallible conversions (TryFrom) from signed integer types larger than 8 bits into a target enum or new type represented by u8.
byte_try_from_unsigned
Implements fallible conversions (TryFrom) from unsigned integer types larger than 8 bits into a target enum or new type represented by u8.
dlgt0
Generates a wrapper method that delegates its call to an inner field.
enforce_bigger_than_byte
Check that the single type supplied is a legal integer larger than a byte.
enforce_primitive_type
Check that the single type supplied is a primitive type.
enum0
Helper macro for defining enums with rich, structured documentation.
last_ident
Returns the last identifier in a whitespace-separated sequence.
ub_prim_impl
Implements the UnsignedBits trait for primitive unsigned integer types.

Structs§

Color
Simple color representation supporting both terminal and HTML output.
DistinctnessGraph
An undirected graph.
EnumTermFactory
A simple factory for creating EnumTerm instances.
MetaByte
A byte-sized metavariable representation.
MetaByteFactory
A very simple example of a factory for a very simple metavariable.
NodeByteFactory
Factory for creating NodeByte instances.
NormalizingSubstitution
A wrapper around Substitution that maintains the normal form invariant.
Pair
An unordered pair of two distinct elements, stored internally in a canonical order.
ParametricMetavariable
Generic metavariable with pluggable types, decorators, and character sets.
SimpleGraph
A set of Pair elements is a simple graph.
SimpleTypeFactory
Stateless factory for constructing SimpleType instances.
Statement
The primary object representing an axiom, inference rule, or statement of a theorem.
Substitution
A substitution mapping metavariables to terms.
WideCharSet
Character set for WideMetavariable-style display.
WideMetavariableFactory
Stateless factory for creating WideMetavariable instances.

Enums§

AsciiMetaVar
Named ASCII metavariable variants.
EnumTerm
A simple implementation of Term based straightforwardly on supplied Metavariable and Node implementations.
MguError
The common error type of the entire crate.
MguErrorType
The type of MGU error, automatically generated from MguError variants.
NodeByte
Selected 222 Node types.
Prime
Prime notation decoration (φ, φ′, φ″, φ‴).
SimpleType
The type of a Metavariable, Node, or Term. Said SimpleType is a Boolean, Setvar, or Class.

Constants§

WIDE_BOOLEANS
Lowercase Greek letters used for Boolean metavariables.
WIDE_BOOLEANS_ASCII
ASCII representations for Boolean metavariables.
WIDE_CLASSES
Italic Latin uppercase letters used for Class metavariables.
WIDE_CLASSES_ASCII
ASCII representations for Class metavariables.
WIDE_SETVARS
Italic Latin lowercase letters used for Setvar metavariables.
WIDE_SETVARS_ASCII
ASCII representations for Setvar metavariables.

Traits§

Decorator
Trait for decoration styles that can be applied to metavariables.
Metavariable
Trait encapsulating behavior of the metavariable type.
MetavariableFactory
Factory for creating Metavariable instances
Node
Trait for nodes representing operations in logical terms.
NodeFactory
Factory for creating Node instances
OutputFormatter
Trait for formatting logical objects to various output representations.
Term
Trait to create tree-like structure of Nodes and Metavariables while enforcing the constraints of the Types and the fixed arities of the children of the Nodes.
TermFactory
Factory for creating Term instances
Type
Trait for type systems in formal logic
TypeCore
Dyn-safe prototype Trait for type systems in formal logic
TypeFactory
Factory for constructing Type instances.

Functions§

apply_substitution
Apply a substitution to a term, replacing metavariables with their mapped terms.
check_clique
Check the essential constraint on Cliques.
check_decomposition
Check the essential constraint on a Decomposition.
get_formatter
Get a formatter by name.
get_type_color
Get color for a type by name.
get_type_color_from_trait
Get color for a type using the Type trait.
occurs_check
Check if a metavariable occurs anywhere in a term (occurs check).
register_formatter
Register a custom formatter.
register_type_color
Register a color for a type.
unify
Compute the most general unifier of two terms.

Type Aliases§

Clique
Shorthand notation for a complete graph with ½(n²−n) edges for the n distinct vertices.
Decomposition
A presentation of the cliques of a simple graph.
WideMetavariable
A metavariable implementation with unlimited index space.