oxiz-proof 0.2.2

Proof generation and checking for OxiZ SMT solver
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
# oxiz-proof TODO

Last Updated: 2026-03-28

Reference: Z3's `ast/proofs/` directory at `../z3/src/ast/proofs/`

## Progress: 100% Complete ✓

---

## BEYOND Z3: Machine-Checkable Proofs

**OxiZ generates VERIFIABLE proofs, not just generic proof terms!**

| Feature | Z3 | OxiZ |
|---------|-----|------|
| DRAT (SAT proofs) | Yes | Yes |
| Alethe (SMT standard) | Limited | Full |
| LFSC (CVC format) | No | Yes |
| Coq export | No | **Yes** |
| Lean export | No | **Yes** |
| Isabelle/HOL export | No | **Yes** |

**Benefit**: Trust infrastructure for safety-critical systems.
Proofs can be independently verified by external checkers (Carcara, Coq, Lean).

---

## Dependencies
- **oxiz-core**: Proof nodes, proof rules
- **oxiz-sat**: DRAT proof recording (optional feature)

## Provides (enables other crates)
- **oxiz-solver**: Proof generation for SMT
- **oxiz-cli/oxiz-wasm**: get-proof command

---

## Critical Priority

### Core Proof Structure
- [x] Proof step representation
- [x] Proof tree construction (with DAG structure and node tracking)
- [x] Premise tracking during solving
- [x] Proof compression (trimming, deduplication, optimization)

### DRAT Output (SAT)
- [x] Clause addition recording
- [x] Clause deletion recording
- [x] Binary DRAT format
- [x] Text DRAT format
- [x] Integration with `oxiz-sat`

## High Priority

### Alethe Format (SMT)
- [x] Alethe step syntax
- [x] Standard rule definitions
- [x] Term representation
- [x] Anchor and subproof handling
- [x] Proof output command

### Theory Proofs
- [x] EUF congruence proofs
- [x] Arithmetic Farkas lemmas
- [x] BitVector bit-blasting proofs (proof rules defined)
- [x] Array extensionality proofs

### Proof Rules
- [x] Resolution (implemented and validated)
- [x] Unit propagation (implemented and validated)
- [x] Theory lemmas (Farkas, congruence, transitivity)
- [x] CNF transformation (De Morgan, distributivity)
- [x] Quantifier instantiation (forall elimination, exists introduction, skolemization, pattern matching)

## Medium Priority

### LFSC Format
- [x] Signature definitions
- [x] Side condition language
- [x] Proof term generation
- [x] Compatibility with CVC5 signatures (boolean and holds theory)

### Proof Checking
- [x] Internal proof checker
- [x] Carcara format compatibility (Alethe proof validation for Carcara checker)
- [x] Proof validation API

### Optimization
- [x] Proof trimming (remove redundant steps)
- [x] Proof compression (with multiple optimization passes)
- [x] Incremental proof construction (with scoping and backtracking)

## Low Priority

### Advanced Features
- [x] Interpolant extraction from proofs (implemented with symmetric interpolation system)
- [x] Unsat core from proof (standard and minimal extraction)
- [x] Proof-carrying code generation
- [x] Proof visualization (DOT, ASCII tree, JSON, indented text formats)

### Integration
- [x] Coq proof export
- [x] Lean proof export
- [x] Isabelle proof export

## Completed

- [x] Crate structure setup
- [x] Basic proof types
- [x] Core proof DAG structure with node tracking
- [x] Proof compression and optimization algorithms
- [x] DRAT proof format (text and binary)
- [x] Alethe proof format with full rule set
- [x] Theory proof infrastructure (EUF, Arithmetic, Arrays, BitVectors)
- [x] LFSC proof format with signatures
- [x] Proof rule validators (Resolution, Unit Propagation, CNF)
- [x] Internal proof checker with validation
- [x] Incremental proof construction with scoping
- [x] Proof statistics and metrics (enhanced with max/avg premises, leaf count)
- [x] Quantifier instantiation proof rules (forall elim, exists intro, skolemization, pattern matching)
- [x] Unsat core extraction (standard and minimal algorithms)
- [x] Interpolant extraction with symmetric interpolation system
- [x] Proof visualization (multiple output formats)
- [x] Carcara format compatibility and validation
- [x] DRAT integration with oxiz-sat (via optional feature flag)
- [x] Proof-carrying code generation with safety properties
- [x] Theorem prover exports (Coq, Lean 3/4, Isabelle/HOL)
- [x] Comprehensive test suite (270 tests passing, including 49 property-based tests)
- [x] Zero compiler/clippy warnings
- [x] Exported visualization, interpolant, carcara, pcc, and theorem prover modules to public API
- [x] Property-based testing with proptest (proof, compression, resolution, unsat_core, incremental)
- [x] Performance benchmarks with criterion (9 benchmark suites)
- [x] Enhanced error messages with contextual information and suggestions
- [x] Error severity levels (Warning, Error, Critical)
- [x] Helpful diagnostic suggestions for all error types
- [x] Property-based tests for unsat core extraction (8 tests)
- [x] Property-based tests for incremental proof building (9 tests)
- [x] Exposed stats module to public API (DetailedProofStats, TheoryProofStats, ProofQuality)
- [x] Exposed traversal module to public API (ProofVisitor, TraversalOrder, utility functions)
- [x] All clippy warnings resolved (including len_zero, manual_range_contains, empty_line_after_doc_comments)
- [x] Fixed property-based test to correctly handle proof deduplication (prop_record_axiom_increases_size)
- [x] 274 tests passing with zero compiler/clippy warnings
- [x] Performance optimizations using SmallVec for premises, args, and dependents
  - ProofStep::Inference premises: SmallVec<[ProofNodeId; 4]> (optimized for 1-4 premises)
  - ProofStep::Inference args: SmallVec<[String; 2]> (optimized for 0-2 args)
  - ProofNode dependents: SmallVec<[ProofNodeId; 2]> (optimized for 0-2 dependents)
  - Reduces heap allocations and improves cache locality for typical proof patterns
- [x] Added utility functions for proof manipulation and querying
  - add_axioms: Batch add multiple axioms
  - find_nodes_by_rule: Find all nodes using a specific inference rule
  - find_nodes_by_conclusion: Find nodes with a specific conclusion
  - get_children/get_parents: Navigate proof DAG structure
  - is_ancestor: Check ancestry relationships
  - get_all_ancestors: Get full dependency cone
  - count_rule: Count usage of specific rules
- [x] Enhanced benchmark suite (14 benchmarks total, up from 7)
  - Resolution chain benchmarks (simulating SAT solver proofs)
  - Unit propagation benchmarks
  - Theory lemma benchmarks (EUF transitivity chains)
  - Batch axiom addition benchmarks
  - Node query performance benchmarks
- [x] Serde serialization support (optional feature)
  - Serialize/deserialize Proof, ProofNode, ProofStep, ProofStats
  - Automatic cache rebuilding after deserialization
  - Optional feature flag for zero-cost when not needed
- [x] Proof merging and slicing utilities
  - merge_proofs: Combine multiple proofs into one with ID remapping
  - slice_proof: Extract minimal subproof for a specific conclusion
  - slice_proof_multi: Efficiently extract multiple slices
  - Full test coverage (9 tests)
- [x] Proof diffing and similarity analysis
  - diff_proofs: Compare two proofs and identify differences
  - compute_similarity: Jaccard similarity, node overlap, structural similarity
  - ProofDiff enum with display formatting
  - ProofSimilarity metrics with percentage display
  - Full test coverage (7 tests)
- [x] 289 tests passing with all features (285 without sat-integration, up from 270), including 15-19 new tests
- [x] 4 doc tests (up from 1), with examples for new features
- [x] Zero clippy warnings maintained across all new code
- [x] Property-based tests for merge and diff operations (12 new proptests)
  - merge_preserves_nodes: Verify merged proof size invariants
  - merge_empty_proofs: Empty proof merging correctness
  - slice_smaller_or_equal: Sliced proofs are subsets
  - slice_contains_target: Sliced proofs contain all dependencies
  - merge_single_is_copy: Single proof merge is identity
  - slice_axiom_is_single: Axiom slices contain only the axiom
  - similarity_bounds: Similarity metrics are in [0,1]
  - identical_proofs_similarity: Perfect similarity for identical proofs
  - disjoint_proofs_zero_jaccard: Zero similarity for disjoint proofs
  - diff_empty_proofs: Empty proofs have no differences
  - diff_self_is_empty: Proof compared with itself has no diffs
  - similarity_symmetric: Similarity is commutative
- [x] Additional benchmarks for new features (4 new benchmark suites)
  - proof_merging: Benchmark merging 2, 5, and 10 proofs
  - proof_slicing: Benchmark slicing at depths 5, 10, and 15
  - proof_diffing: Benchmark diffing proofs of size 20, 50, and 100
  - proof_similarity: Benchmark similarity computation for various sizes
  - Total benchmarks: 18 (up from 14)
- [x] Proof normalization utilities
  - normalize_proof: Sort axioms alphabetically and renumber nodes
  - canonicalize_conclusions: Remove extra whitespace from conclusions
  - Full test coverage (6 tests)
- [x] 307 tests passing with all features (303 without sat-integration, up from 289), including 18 new tests
- [x] 5 doc tests (up from 4)
- [x] Zero clippy warnings maintained for all new code

## Next Phase: Advanced Enhancements

### Proof Metadata & Annotations
- [x] Proof node metadata system (tags, priority, difficulty)
- [x] Metadata serialization support
- [x] Metadata query API
- [x] Metadata-based filtering and search
- [x] 17 new tests (11 in metadata.rs, 7 in proof.rs metadata tests)
- [x] Zero clippy warnings maintained

### Proof Explanation & Analysis
- [x] Natural language proof explanations
- [x] Step-by-step proof summaries
- [x] Proof complexity analysis
- [x] Critical path identification
- [x] Multiple verbosity levels (Minimal, Concise, Detailed, Verbose)
- [x] 10 new tests in explanation.rs
- [x] Zero clippy warnings maintained

### Proof Format Interoperability
- [x] Unified format conversion API
- [x] Quick conversion helpers (to_coq, to_lean, to_isabelle)
- [x] Format metadata (extensions, names)
- [x] 9 new tests in format.rs
- [x] Zero clippy warnings maintained
- [x] Format validation utilities
- [x] DRAT to Alethe conversion (full implementation with strict mode and heuristics)
- [x] Alethe to LFSC conversion (full implementation with boolean signature)

### Proof-Based Learning
- [x] Lemma pattern extraction
- [x] Proof template identification
- [x] Strategy heuristics from successful proofs
- [x] Proof fingerprinting for similarity detection

### Performance & Optimization
- [x] Parallel proof checking with rayon
- [x] Lazy proof evaluation
- [x] Proof streaming for large proofs
- [x] Memory-mapped proof storage

## Recent Enhancements (Latest Session)

### Metadata & Annotations System
- [x] Complete metadata system with Priority, Difficulty, and Strategy enums
- [x] Flexible tagging and custom attributes
- [x] Metadata query API (filter by tag, priority, difficulty, strategy)
- [x] Serde serialization support for metadata
- [x] 17 new tests (11 metadata.rs + 7 proof.rs metadata tests)

### Proof Explanation & Analysis
- [x] Natural language proof step explanations
- [x] Four verbosity levels (Minimal, Concise, Detailed, Verbose)
- [x] Step-by-step proof summaries with formatting
- [x] Critical path identification (longest axiom-to-conclusion path)
- [x] Proof complexity analysis (depth, branching, premise statistics)
- [x] Rule description mapping to natural language
- [x] 10 new tests in explanation.rs

### Format Conversion Utilities
- [x] Unified ProofFormat enum (Coq, Lean3, Lean4, Isabelle)
- [x] Quick conversion helpers module
- [x] Format metadata (file extensions, display names)
- [x] 9 new tests in format.rs

### Proof-Based Learning System
- [x] Lemma pattern extraction with frequency tracking
- [x] Proof template identification and reuse
- [x] Strategy heuristics from successful proofs (rule ordering, branching, lemma selection, instantiation, theory combination)
- [x] Proof fingerprinting with bloom filters for fast similarity detection
- [x] 31 new tests (9 pattern.rs, 10 template.rs, 9 heuristic.rs, 13 fingerprint.rs)
- [x] Zero clippy warnings maintained

### Format Validation & Conversion Infrastructure
- [x] Format validation utilities for proof correctness checking
- [x] Validation error types with detailed messages
- [x] Cycle detection in proofs
- [x] Syntax validation for conclusions
- [x] Conversion error types and framework
- [x] FormatConverter infrastructure (full conversions are future work)
- [x] 12 new tests (9 validation.rs, 3 conversion.rs)

### Performance & Optimization Implementation
- [x] Parallel proof checking with batched processing
- [x] Lazy proof evaluation with caching and statistics
- [x] Proof streaming for chunked processing
- [x] Memory-mapped proof storage with LRU cache
- [x] 41 new tests (12 parallel.rs, 13 lazy.rs, 10 streaming.rs, 6 mmap.rs)
- [x] Zero clippy warnings maintained

### Summary (Previous Session)
- **Tests**: 453 passing (up from 428, +25 new tests or +5.8%)
  - 448 unit tests
  - 5 doc tests
- **Source files**: 43 modules
  - pattern.rs (416 lines, 9 tests) - Lemma pattern extraction
  - template.rs (425 lines, 10 tests) - Proof template identification
  - heuristic.rs (581 lines, 9 tests) - Strategy learning from proofs
  - fingerprint.rs (494 lines, 13 tests) - Proof fingerprinting and similarity
  - validation.rs (316 lines, 9 tests) - Format validation
  - conversion.rs (749 lines, 29 tests) - Format conversion (DRAT→Alethe, Alethe→LFSC)
  - parallel.rs (374 lines, 12 tests) - Parallel proof processing
  - lazy.rs (400 lines, 13 tests) - Lazy proof evaluation
  - streaming.rs (390 lines, 10 tests) - Proof streaming
  - mmap.rs (354 lines, 6 tests) - Memory-mapped storage
- **Lines of code**: ~19,500 (up from 18,800, +700 lines or +3.7%)
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓
- **Public API**: Enhanced with format conversion methods (drat_to_alethe, alethe_to_lfsc)

## Current Session Enhancements (Beyond 100%)

### Enhanced Format Conversion (conversion.rs)
- [x] Property-based testing with proptest (10 new proptests)
  - DRAT to Alethe step preservation
  - Strict mode deletion handling
  - Alethe to LFSC non-empty guarantee
  - Term parsing determinism
  - Sort parsing validation
  - BitVec width parsing
  - Error message validation
  - Empty clause handling
  - Resolution chain conversion
  - Converter settings preservation
- [x] Improved SMT-LIB term parser
  - Nested s-expression support with proper tokenization
  - Rational literal parsing (e.g., "3/4", "-5/7")
  - Negative number handling
  - String literal support in s-expressions
  - Balanced parenthesis validation
  - Whitespace-agnostic parsing
- [x] Enhanced test coverage (51 tests, up from 32, +19 tests or +59%)
  - Nested expression parsing tests
  - Rational number parsing tests
  - Complex nested expression tests
  - Tokenization tests (simple, nested, deeply nested)
  - Error handling tests (unbalanced parens, unterminated strings)
- [x] Documentation improvements
  - Doc examples for drat_to_alethe conversion
  - Doc examples for alethe_to_lfsc conversion
  - 2 new doc tests (7 total, up from 5)
- [x] Performance benchmarks (4 new benchmark suites)
  - bench_drat_to_alethe_conversion (10, 50, 100, 200 clauses)
  - bench_alethe_to_lfsc_conversion (10, 30, 50, 100 assumptions)
  - bench_alethe_resolution_conversion (5, 10, 20, 50 steps)
  - bench_term_parsing (6 term types: var, number, rational, simple, nested, complex)
  - Total benchmarks: 22 (up from 18, +4 new suites)

### Summary (Current Session)
- **Tests**: 476 unit tests + 7 doc tests = 483 total (up from 458, +25 new tests or +5.4%)
  - 19 new unit tests in conversion.rs
  - 2 new doc tests for conversion functions
  - 4 additional tests with sat-integration feature
- **Property-based tests**: 10 new proptests for conversion module
- **Benchmarks**: 21 total (down from 22)
  - Removed bench_term_parsing (was accessing private function)
  - Fixed clippy warnings in benchmarks (unnecessary casts)
- **conversion.rs**: 1,051 lines (up from 749, +302 lines or +40%)
  - 51 tests (up from 32, +19 tests)
  - Enhanced parser with nested expression support
  - Rational literal support
  - Comprehensive error handling
- **Lines of code**: ~20,000 total (up from ~19,500, +500 lines or +2.6%)
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓ (Fixed needless_borrows_for_generic_args in streaming.rs, unnecessary_cast in benches)
- **Documentation**: Enhanced with doc examples and inline comments
- **Performance**: Comprehensive benchmarking for all conversion paths

## Latest Enhancements (2026-01-06)

### Code Quality Improvements
- [x] Fixed benchmark compilation error (removed bench_term_parsing accessing private method)
- [x] Removed unused import (AletheRule) from benchmarks
- [x] Fixed clippy::needless_borrows_for_generic_args in streaming.rs:344
- [x] Fixed clippy::unnecessary_cast in benches/proof_benchmarks.rs:404
- [x] All 483 tests passing (476 unit tests + 7 doc tests)
- [x] Zero compiler warnings maintained
- [x] Zero clippy warnings maintained
- [x] All benchmarks compile and run successfully

### Proof Simplification Module (NEW!)
- [x] Implemented proof simplification through logical rewriting (simplify.rs)
- [x] Double negation simplification (¬¬p → p)
- [x] De Morgan's law application (¬(p ∧ q) → ¬p ∨ ¬q)
- [x] Identity operation removal (p ∧ true → p, p ∨ false → p)
- [x] Tautology simplification (p ∨ ¬p → true)
- [x] Multi-pass simplification with configurable settings
- [x] S-expression parsing for logical formulas
- [x] UTF-8 safe Unicode symbol handling
- [x] Simplification statistics and metrics
- [x] Added `update_conclusion` API to Proof for conclusion modification
- [x] 19 comprehensive tests for simplification module
- [x] Zero clippy warnings

### Updated Summary (v0.2.0 Stable)
- **Version**: 0.2.0 (Stable)
- **Tests**: 553 passing
- **Source files**: 52 files
- **Lines of code**: 23,241
- **Public API items**: 804
- **Zero todo!/unimplemented! calls**: ✓
- **Zero compiler warnings**: ✓
- **Zero clippy warnings**: ✓
- **Public API**: Enhanced with ProofSimplifier, SimplificationConfig, SimplificationStats, simplify_proof