# oxiz-proof TODO
Last Updated: 2026-03-28
Reference: Z3's `ast/proofs/` directory at `../z3/src/ast/proofs/`
## Progress: 100% Complete ✓
---
## BEYOND Z3: Machine-Checkable Proofs
**OxiZ generates VERIFIABLE proofs, not just generic proof terms!**
| DRAT (SAT proofs) | Yes | Yes |
| Alethe (SMT standard) | Limited | Full |
| LFSC (CVC format) | No | Yes |
| Coq export | No | **Yes** |
| Lean export | No | **Yes** |
| Isabelle/HOL export | No | **Yes** |
**Benefit**: Trust infrastructure for safety-critical systems.
Proofs can be independently verified by external checkers (Carcara, Coq, Lean).
---
## Dependencies
- **oxiz-core**: Proof nodes, proof rules
- **oxiz-sat**: DRAT proof recording (optional feature)
## Provides (enables other crates)
- **oxiz-solver**: Proof generation for SMT
- **oxiz-cli/oxiz-wasm**: get-proof command
---
## Critical Priority
### Core Proof Structure
- [x] Proof step representation
- [x] Proof tree construction (with DAG structure and node tracking)
- [x] Premise tracking during solving
- [x] Proof compression (trimming, deduplication, optimization)
### DRAT Output (SAT)
- [x] Clause addition recording
- [x] Clause deletion recording
- [x] Binary DRAT format
- [x] Text DRAT format
- [x] Integration with `oxiz-sat`
## High Priority
### Alethe Format (SMT)
- [x] Alethe step syntax
- [x] Standard rule definitions
- [x] Term representation
- [x] Anchor and subproof handling
- [x] Proof output command
### Theory Proofs
- [x] EUF congruence proofs
- [x] Arithmetic Farkas lemmas
- [x] BitVector bit-blasting proofs (proof rules defined)
- [x] Array extensionality proofs
### Proof Rules
- [x] Resolution (implemented and validated)
- [x] Unit propagation (implemented and validated)
- [x] Theory lemmas (Farkas, congruence, transitivity)
- [x] CNF transformation (De Morgan, distributivity)
- [x] Quantifier instantiation (forall elimination, exists introduction, skolemization, pattern matching)
## Medium Priority
### LFSC Format
- [x] Signature definitions
- [x] Side condition language
- [x] Proof term generation
- [x] Compatibility with CVC5 signatures (boolean and holds theory)
### Proof Checking
- [x] Internal proof checker
- [x] Carcara format compatibility (Alethe proof validation for Carcara checker)
- [x] Proof validation API
### Optimization
- [x] Proof trimming (remove redundant steps)
- [x] Proof compression (with multiple optimization passes)
- [x] Incremental proof construction (with scoping and backtracking)
## Low Priority
### Advanced Features
- [x] Interpolant extraction from proofs (implemented with symmetric interpolation system)
- [x] Unsat core from proof (standard and minimal extraction)
- [x] Proof-carrying code generation
- [x] Proof visualization (DOT, ASCII tree, JSON, indented text formats)
### Integration
- [x] Coq proof export
- [x] Lean proof export
- [x] Isabelle proof export
## Completed
- [x] Crate structure setup
- [x] Basic proof types
- [x] Core proof DAG structure with node tracking
- [x] Proof compression and optimization algorithms
- [x] DRAT proof format (text and binary)
- [x] Alethe proof format with full rule set
- [x] Theory proof infrastructure (EUF, Arithmetic, Arrays, BitVectors)
- [x] LFSC proof format with signatures
- [x] Proof rule validators (Resolution, Unit Propagation, CNF)
- [x] Internal proof checker with validation
- [x] Incremental proof construction with scoping
- [x] Proof statistics and metrics (enhanced with max/avg premises, leaf count)
- [x] Quantifier instantiation proof rules (forall elim, exists intro, skolemization, pattern matching)
- [x] Unsat core extraction (standard and minimal algorithms)
- [x] Interpolant extraction with symmetric interpolation system
- [x] Proof visualization (multiple output formats)
- [x] Carcara format compatibility and validation
- [x] DRAT integration with oxiz-sat (via optional feature flag)
- [x] Proof-carrying code generation with safety properties
- [x] Theorem prover exports (Coq, Lean 3/4, Isabelle/HOL)
- [x] Comprehensive test suite (270 tests passing, including 49 property-based tests)
- [x] Zero compiler/clippy warnings
- [x] Exported visualization, interpolant, carcara, pcc, and theorem prover modules to public API
- [x] Property-based testing with proptest (proof, compression, resolution, unsat_core, incremental)
- [x] Performance benchmarks with criterion (9 benchmark suites)
- [x] Enhanced error messages with contextual information and suggestions
- [x] Error severity levels (Warning, Error, Critical)
- [x] Helpful diagnostic suggestions for all error types
- [x] Property-based tests for unsat core extraction (8 tests)
- [x] Property-based tests for incremental proof building (9 tests)
- [x] Exposed stats module to public API (DetailedProofStats, TheoryProofStats, ProofQuality)
- [x] Exposed traversal module to public API (ProofVisitor, TraversalOrder, utility functions)
- [x] All clippy warnings resolved (including len_zero, manual_range_contains, empty_line_after_doc_comments)
- [x] Fixed property-based test to correctly handle proof deduplication (prop_record_axiom_increases_size)
- [x] 274 tests passing with zero compiler/clippy warnings
- [x] Performance optimizations using SmallVec for premises, args, and dependents
- ProofStep::Inference premises: SmallVec<[ProofNodeId; 4]> (optimized for 1-4 premises)
- ProofStep::Inference args: SmallVec<[String; 2]> (optimized for 0-2 args)
- ProofNode dependents: SmallVec<[ProofNodeId; 2]> (optimized for 0-2 dependents)
- Reduces heap allocations and improves cache locality for typical proof patterns
- [x] Added utility functions for proof manipulation and querying
- add_axioms: Batch add multiple axioms
- find_nodes_by_rule: Find all nodes using a specific inference rule
- find_nodes_by_conclusion: Find nodes with a specific conclusion
- get_children/get_parents: Navigate proof DAG structure
- is_ancestor: Check ancestry relationships
- get_all_ancestors: Get full dependency cone
- count_rule: Count usage of specific rules
- [x] Enhanced benchmark suite (14 benchmarks total, up from 7)
- Resolution chain benchmarks (simulating SAT solver proofs)
- Unit propagation benchmarks
- Theory lemma benchmarks (EUF transitivity chains)
- Batch axiom addition benchmarks
- Node query performance benchmarks
- [x] Serde serialization support (optional feature)
- Serialize/deserialize Proof, ProofNode, ProofStep, ProofStats
- Automatic cache rebuilding after deserialization
- Optional feature flag for zero-cost when not needed
- [x] Proof merging and slicing utilities
- merge_proofs: Combine multiple proofs into one with ID remapping
- slice_proof: Extract minimal subproof for a specific conclusion
- slice_proof_multi: Efficiently extract multiple slices
- Full test coverage (9 tests)
- [x] Proof diffing and similarity analysis
- diff_proofs: Compare two proofs and identify differences
- compute_similarity: Jaccard similarity, node overlap, structural similarity
- ProofDiff enum with display formatting
- ProofSimilarity metrics with percentage display
- Full test coverage (7 tests)
- [x] 289 tests passing with all features (285 without sat-integration, up from 270), including 15-19 new tests
- [x] 4 doc tests (up from 1), with examples for new features
- [x] Zero clippy warnings maintained across all new code
- [x] Property-based tests for merge and diff operations (12 new proptests)
- merge_preserves_nodes: Verify merged proof size invariants
- merge_empty_proofs: Empty proof merging correctness
- slice_smaller_or_equal: Sliced proofs are subsets
- slice_contains_target: Sliced proofs contain all dependencies
- merge_single_is_copy: Single proof merge is identity
- slice_axiom_is_single: Axiom slices contain only the axiom
- similarity_bounds: Similarity metrics are in [0,1]
- identical_proofs_similarity: Perfect similarity for identical proofs
- disjoint_proofs_zero_jaccard: Zero similarity for disjoint proofs
- diff_empty_proofs: Empty proofs have no differences
- diff_self_is_empty: Proof compared with itself has no diffs
- similarity_symmetric: Similarity is commutative
- [x] Additional benchmarks for new features (4 new benchmark suites)
- proof_merging: Benchmark merging 2, 5, and 10 proofs
- proof_slicing: Benchmark slicing at depths 5, 10, and 15
- proof_diffing: Benchmark diffing proofs of size 20, 50, and 100
- proof_similarity: Benchmark similarity computation for various sizes
- Total benchmarks: 18 (up from 14)
- [x] Proof normalization utilities
- normalize_proof: Sort axioms alphabetically and renumber nodes
- canonicalize_conclusions: Remove extra whitespace from conclusions
- Full test coverage (6 tests)
- [x] 307 tests passing with all features (303 without sat-integration, up from 289), including 18 new tests
- [x] 5 doc tests (up from 4)
- [x] Zero clippy warnings maintained for all new code
## Next Phase: Advanced Enhancements
### Proof Metadata & Annotations
- [x] Proof node metadata system (tags, priority, difficulty)
- [x] Metadata serialization support
- [x] Metadata query API
- [x] Metadata-based filtering and search
- [x] 17 new tests (11 in metadata.rs, 7 in proof.rs metadata tests)
- [x] Zero clippy warnings maintained
### Proof Explanation & Analysis
- [x] Natural language proof explanations
- [x] Step-by-step proof summaries
- [x] Proof complexity analysis
- [x] Critical path identification
- [x] Multiple verbosity levels (Minimal, Concise, Detailed, Verbose)
- [x] 10 new tests in explanation.rs
- [x] Zero clippy warnings maintained
### Proof Format Interoperability
- [x] Unified format conversion API
- [x] Quick conversion helpers (to_coq, to_lean, to_isabelle)
- [x] Format metadata (extensions, names)
- [x] 9 new tests in format.rs
- [x] Zero clippy warnings maintained
- [x] Format validation utilities
- [x] DRAT to Alethe conversion (full implementation with strict mode and heuristics)
- [x] Alethe to LFSC conversion (full implementation with boolean signature)
### Proof-Based Learning
- [x] Lemma pattern extraction
- [x] Proof template identification
- [x] Strategy heuristics from successful proofs
- [x] Proof fingerprinting for similarity detection
### Performance & Optimization
- [x] Parallel proof checking with rayon
- [x] Lazy proof evaluation
- [x] Proof streaming for large proofs
- [x] Memory-mapped proof storage
## Recent Enhancements (Latest Session)
### Metadata & Annotations System
- [x] Complete metadata system with Priority, Difficulty, and Strategy enums
- [x] Flexible tagging and custom attributes
- [x] Metadata query API (filter by tag, priority, difficulty, strategy)
- [x] Serde serialization support for metadata
- [x] 17 new tests (11 metadata.rs + 7 proof.rs metadata tests)
### Proof Explanation & Analysis
- [x] Natural language proof step explanations
- [x] Four verbosity levels (Minimal, Concise, Detailed, Verbose)
- [x] Step-by-step proof summaries with formatting
- [x] Critical path identification (longest axiom-to-conclusion path)
- [x] Proof complexity analysis (depth, branching, premise statistics)
- [x] Rule description mapping to natural language
- [x] 10 new tests in explanation.rs
### Format Conversion Utilities
- [x] Unified ProofFormat enum (Coq, Lean3, Lean4, Isabelle)
- [x] Quick conversion helpers module
- [x] Format metadata (file extensions, display names)
- [x] 9 new tests in format.rs
### Proof-Based Learning System
- [x] Lemma pattern extraction with frequency tracking
- [x] Proof template identification and reuse
- [x] Strategy heuristics from successful proofs (rule ordering, branching, lemma selection, instantiation, theory combination)
- [x] Proof fingerprinting with bloom filters for fast similarity detection
- [x] 31 new tests (9 pattern.rs, 10 template.rs, 9 heuristic.rs, 13 fingerprint.rs)
- [x] Zero clippy warnings maintained
### Format Validation & Conversion Infrastructure
- [x] Format validation utilities for proof correctness checking
- [x] Validation error types with detailed messages
- [x] Cycle detection in proofs
- [x] Syntax validation for conclusions
- [x] Conversion error types and framework
- [x] FormatConverter infrastructure (full conversions are future work)
- [x] 12 new tests (9 validation.rs, 3 conversion.rs)
### Performance & Optimization Implementation
- [x] Parallel proof checking with batched processing
- [x] Lazy proof evaluation with caching and statistics
- [x] Proof streaming for chunked processing
- [x] Memory-mapped proof storage with LRU cache
- [x] 41 new tests (12 parallel.rs, 13 lazy.rs, 10 streaming.rs, 6 mmap.rs)
- [x] Zero clippy warnings maintained
### Summary (Previous Session)
- **Tests**: 453 passing (up from 428, +25 new tests or +5.8%)
- 448 unit tests
- 5 doc tests
- **Source files**: 43 modules
- pattern.rs (416 lines, 9 tests) - Lemma pattern extraction
- template.rs (425 lines, 10 tests) - Proof template identification
- heuristic.rs (581 lines, 9 tests) - Strategy learning from proofs
- fingerprint.rs (494 lines, 13 tests) - Proof fingerprinting and similarity
- validation.rs (316 lines, 9 tests) - Format validation
- conversion.rs (749 lines, 29 tests) - Format conversion (DRAT→Alethe, Alethe→LFSC)
- parallel.rs (374 lines, 12 tests) - Parallel proof processing
- lazy.rs (400 lines, 13 tests) - Lazy proof evaluation
- streaming.rs (390 lines, 10 tests) - Proof streaming
- mmap.rs (354 lines, 6 tests) - Memory-mapped storage
- **Lines of code**: ~19,500 (up from 18,800, +700 lines or +3.7%)
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓
- **Public API**: Enhanced with format conversion methods (drat_to_alethe, alethe_to_lfsc)
## Current Session Enhancements (Beyond 100%)
### Enhanced Format Conversion (conversion.rs)
- [x] Property-based testing with proptest (10 new proptests)
- DRAT to Alethe step preservation
- Strict mode deletion handling
- Alethe to LFSC non-empty guarantee
- Term parsing determinism
- Sort parsing validation
- BitVec width parsing
- Error message validation
- Empty clause handling
- Resolution chain conversion
- Converter settings preservation
- [x] Improved SMT-LIB term parser
- Nested s-expression support with proper tokenization
- Rational literal parsing (e.g., "3/4", "-5/7")
- Negative number handling
- String literal support in s-expressions
- Balanced parenthesis validation
- Whitespace-agnostic parsing
- [x] Enhanced test coverage (51 tests, up from 32, +19 tests or +59%)
- Nested expression parsing tests
- Rational number parsing tests
- Complex nested expression tests
- Tokenization tests (simple, nested, deeply nested)
- Error handling tests (unbalanced parens, unterminated strings)
- [x] Documentation improvements
- Doc examples for drat_to_alethe conversion
- Doc examples for alethe_to_lfsc conversion
- 2 new doc tests (7 total, up from 5)
- [x] Performance benchmarks (4 new benchmark suites)
- bench_drat_to_alethe_conversion (10, 50, 100, 200 clauses)
- bench_alethe_to_lfsc_conversion (10, 30, 50, 100 assumptions)
- bench_alethe_resolution_conversion (5, 10, 20, 50 steps)
- bench_term_parsing (6 term types: var, number, rational, simple, nested, complex)
- Total benchmarks: 22 (up from 18, +4 new suites)
### Summary (Current Session)
- **Tests**: 476 unit tests + 7 doc tests = 483 total (up from 458, +25 new tests or +5.4%)
- 19 new unit tests in conversion.rs
- 2 new doc tests for conversion functions
- 4 additional tests with sat-integration feature
- **Property-based tests**: 10 new proptests for conversion module
- **Benchmarks**: 21 total (down from 22)
- Removed bench_term_parsing (was accessing private function)
- Fixed clippy warnings in benchmarks (unnecessary casts)
- **conversion.rs**: 1,051 lines (up from 749, +302 lines or +40%)
- 51 tests (up from 32, +19 tests)
- Enhanced parser with nested expression support
- Rational literal support
- Comprehensive error handling
- **Lines of code**: ~20,000 total (up from ~19,500, +500 lines or +2.6%)
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓ (Fixed needless_borrows_for_generic_args in streaming.rs, unnecessary_cast in benches)
- **Documentation**: Enhanced with doc examples and inline comments
- **Performance**: Comprehensive benchmarking for all conversion paths
## Latest Enhancements (2026-01-06)
### Code Quality Improvements
- [x] Fixed benchmark compilation error (removed bench_term_parsing accessing private method)
- [x] Removed unused import (AletheRule) from benchmarks
- [x] Fixed clippy::needless_borrows_for_generic_args in streaming.rs:344
- [x] Fixed clippy::unnecessary_cast in benches/proof_benchmarks.rs:404
- [x] All 483 tests passing (476 unit tests + 7 doc tests)
- [x] Zero compiler warnings maintained
- [x] Zero clippy warnings maintained
- [x] All benchmarks compile and run successfully
### Proof Simplification Module (NEW!)
- [x] Implemented proof simplification through logical rewriting (simplify.rs)
- [x] Double negation simplification (¬¬p → p)
- [x] De Morgan's law application (¬(p ∧ q) → ¬p ∨ ¬q)
- [x] Identity operation removal (p ∧ true → p, p ∨ false → p)
- [x] Tautology simplification (p ∨ ¬p → true)
- [x] Multi-pass simplification with configurable settings
- [x] S-expression parsing for logical formulas
- [x] UTF-8 safe Unicode symbol handling
- [x] Simplification statistics and metrics
- [x] Added `update_conclusion` API to Proof for conclusion modification
- [x] 19 comprehensive tests for simplification module
- [x] Zero clippy warnings
### Updated Summary (v0.2.0 Stable)
- **Version**: 0.2.0 (Stable)
- **Tests**: 553 passing
- **Source files**: 52 files
- **Lines of code**: 23,241
- **Public API items**: 804
- **Zero todo!/unimplemented! calls**: ✓
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓
- **Public API**: Enhanced with ProofSimplifier, SimplificationConfig, SimplificationStats, simplify_proof