Expand description
§OxiLean Parser — Surface Syntax to AST
This crate converts concrete OxiLean source syntax (text) into abstract syntax trees (ASTs) that the elaborator can process into kernel-checkable terms.
§Quick Start
§Parsing an Expression
use oxilean_parse::{Lexer, Parser, SurfaceExpr};
let source = "fun (x : Nat) => x + 1";
let lexer = Lexer::new(source);
let tokens = lexer.tokenize()?;
let mut parser = Parser::new(tokens);
let expr = parser.parse_expr()?;§Parsing a Module
use oxilean_parse::Module;
let source = "def double (n : Nat) : Nat := n + n";
let module = Module::parse_source(source)?;§Architecture Overview
The parser is a three-stage pipeline:
Source Text (.oxilean file)
│
▼
┌──────────────────────┐
│ Lexer │ → Tokenizes text
│ (lexer.rs) │ → Handles UTF-8, comments, strings
└──────────────────────┘
│
▼
Token Stream
│
▼
┌──────────────────────┐
│ Parser │ → Builds AST from tokens
│ (parser_impl.rs) │ → Handles precedence, associativity
│ + Helpers │ → Pattern, macro, tactic parsers
└──────────────────────┘
│
▼
Abstract Syntax Tree (AST)
│
└─→ SurfaceExpr, SurfaceDecl, Module, etc.
└─→ Diagnostic information (errors, warnings)
└─→ Source mapping (for IDE integration)§Key Concepts & Terminology
§Tokens
Basic lexical elements:
- Identifiers:
x,Nat,add_comm, etc. - Keywords:
fun,def,theorem,inductive, etc. - Operators:
+,-,:,=>,|, etc. - Literals: Numbers, strings, characters
- Delimiters:
(,),[,],{,}
§Surface Syntax (SurfaceExpr)
Represents OxiLean code before elaboration:
- Applications:
f x y(function calls) - Lambda:
fun x => body - Pi types:
(x : A) -> B - Matches:
match x with | nil => ... | cons h t => ... - Tactics:
by (tac1; tac2) - Attributes:
@[simp] def foo := ...
§AST vs Kernel Expr
- AST (this crate): Surface syntax with implicit info
- Contains
?(implicit args),_(placeholders) - No type annotations required
- Represents user-written code
- Contains
- Kernel Expr (oxilean-kernel): Type-checked terms
- All types explicit
- All implicit args resolved
- Fully elaborated
§Module Organization
§Core Parsing Modules
| Module | Purpose |
|---|---|
lexer | Tokenization: text → tokens |
tokens | Token and TokenKind definitions |
parser_impl | Main parser: tokens → AST |
command | Command parsing (def, theorem, etc.) |
§AST Definition
| Module | Purpose |
|---|---|
ast_impl | Core AST types: SurfaceExpr, SurfaceDecl, etc. |
pattern | Pattern matching syntax |
literal | Number and string literals |
§Specialized Parsers
| Module | Purpose |
|---|---|
tactic_parser | Tactic syntax: intro, apply, rw, etc. |
macro_parser | Macro definition and expansion |
notation_system | Operator precedence and associativity |
§Diagnostics & Source Mapping
| Module | Purpose |
|---|---|
diagnostic | Error and warning collection |
error_impl | Parse error types and messages |
sourcemap | Source position tracking for IDE |
span_util | Source span utilities |
§Utilities
| Module | Purpose |
|---|---|
prettyprint | AST pretty-printing |
module | Module system and imports |
repl_parser | REPL command parsing |
§Parsing Pipeline Details
§Phase 1: Lexical Analysis (Lexer)
Transforms character stream into token stream:
- Handles Unicode identifiers (UTF-8)
- Recognizes keywords vs identifiers
- Tracks line/column positions (for error reporting)
- Supports:
- Single-line comments:
-- comment - Multi-line comments:
/- -/ - String literals:
"hello" - Number literals:
42,0xFF,3.14
- Single-line comments:
§Phase 2: Syntactic Analysis (Parser)
Transforms token stream into AST:
- Recursive descent: For statements and declarations
- Pratt parsing: For expressions (handles precedence)
- Lookahead(1): LL(1) grammar for predictive parsing
- Error recovery: Continues parsing after errors
§Phase 3: Post-Processing
- Notation expansion: Apply infix/prefix operators
- Macro expansion: Expand syntax sugar
- Span assignment: Map AST nodes to source positions
§Usage Examples
§Example 1: Parse and Pretty-Print
use oxilean_parse::{Lexer, Parser, print_expr};
let source = "(x : Nat) -> Nat";
let mut parser = Parser::from_source(source)?;
let expr = parser.parse_expr()?;
println!("{}", print_expr(&expr));§Example 2: Parse a Definition
use oxilean_parse::{Lexer, Parser, Decl};
let source = "def double (n : Nat) : Nat := n + n";
let mut parser = Parser::from_source(source)?;
let decl = parser.parse_decl()?;
assert!(matches!(decl, Decl::Def { .. }));§Example 3: Collect Diagnostics
use oxilean_parse::DiagnosticCollector;
let mut collector = DiagnosticCollector::new();
// ... parse code ...
for diag in collector.diagnostics() {
println!("{:?}", diag);
}§Operator Precedence
Operators are organized by precedence levels (0-100):
- Level 100 (highest): Projections, applications
- Level 90: Power/exponentiation
- Level 70: Multiplication, division
- Level 65: Addition, subtraction
- Level 50: Comparison (
<,>,=, etc.) - Level 40: Conjunction (
and) - Level 35: Disjunction (
or) - Level 25: Implication (
->) - Level 0 (lowest): Binders (
fun,:, etc.)
Associativity (left/right/non-associative) is per-operator.
§Error Handling
Parser errors include:
- Unexpected token: Parser expected a different token
- Expected
typetoken: Specific token was expected but not found - Unclosed delimiter: Missing closing bracket/paren
- Undeclared operator: Unknown infix operator
- Invalid pattern: Malformed pattern in match/fun
All errors carry:
- Source location (span): File, line, column
- Error message: Human-readable description
- Context: Surrounding code snippet (for IDE tooltips)
§Source Mapping & IDE Integration
The parser builds a source map tracking:
- AST node → source location
- Hover information (for IDE hover tooltips)
- Semantic tokens (for syntax highlighting)
- Reference locations (for “go to definition”)
This enables:
- Accurate error reporting
- IDE language server protocol (LSP) support
- Refactoring tools
§Extensibility
§Adding New Operators
Operators are registered in notation_system:
let notation = Notation {
name: "my_op",
kind: NotationKind::Infix,
level: 60,
associativity: Associativity::Left,
};
notation_table.insert(notation);§Adding New Keywords
Keywords are hardcoded in lexer::keyword_of_string().
Add new keyword, then handle in parser.
§Custom Macros
Macros are parsed by macro_parser and expanded during parsing:
syntax "list" ["[", expr, (",", expr)*, "]"] => ...
macro list_to_cons : list => (...)§Integration with Other Crates
§With oxilean-elab
The elaborator consumes this crate’s AST:
Parser: Source → SurfaceExpr
Elaborator: SurfaceExpr → Kernel Expr (with type checking)§With oxilean-kernel
Kernel types (Name, Level, Literal) are re-exported by parser for convenience.
§Performance Considerations
- Linear parsing: O(n) where n = source length
- Minimal allocations: AST nodes are typically smaller than source
- Single pass: No tokenization+parsing phase, done in parallel
§Further Reading
- ARCHITECTURE.md — System architecture
- BLUEPRINT.md — Formal syntax specification
- Module documentation for specific subcomponents
Re-exports§
pub use ast::SurfaceExpr as OldSurfaceExpr;pub use ast_impl::AstNotationKind;pub use ast_impl::AttributeKind;pub use ast_impl::Binder;pub use ast_impl::BinderKind;pub use ast_impl::CalcStep as AstCalcStep;pub use ast_impl::Constructor;pub use ast_impl::Decl;pub use ast_impl::DoAction;pub use ast_impl::FieldDecl;pub use ast_impl::Literal;pub use ast_impl::Located;pub use ast_impl::MatchArm;pub use ast_impl::Pattern;pub use ast_impl::SortKind;pub use ast_impl::SurfaceExpr;pub use ast_impl::TacticRef;pub use ast_impl::WhereClause;pub use command::Command;pub use command::CommandParser;pub use command::NotationKind;pub use command::OpenItem;pub use diagnostic::Diagnostic;pub use diagnostic::DiagnosticCollector;pub use diagnostic::DiagnosticLabel;pub use diagnostic::Severity;pub use error::ParseError as OldParseError;pub use error_impl::ParseError;pub use error_impl::ParseErrorKind;pub use lexer::Lexer;pub use macro_parser::HygieneInfo;pub use macro_parser::MacroDef;pub use macro_parser::MacroError;pub use macro_parser::MacroErrorKind;pub use macro_parser::MacroExpander;pub use macro_parser::MacroParser;pub use macro_parser::MacroRule;pub use macro_parser::MacroToken;pub use macro_parser::SyntaxDef;pub use macro_parser::SyntaxItem;pub use macro_parser::SyntaxKind;pub use module::Module;pub use module::ModuleRegistry;pub use notation_system::Fixity;pub use notation_system::NotationEntry;pub use notation_system::NotationKind as NotationSystemKind;pub use notation_system::NotationPart;pub use notation_system::NotationTable;pub use notation_system::OperatorEntry;pub use parser_impl::Parser;pub use pattern::MatchClause;pub use pattern::PatternCompiler;pub use prettyprint::print_decl;pub use prettyprint::print_expr;pub use prettyprint::ParensMode;pub use prettyprint::PrettyConfig;pub use prettyprint::PrettyPrinter;pub use repl_parser::is_complete;pub use repl_parser::ReplCommand;pub use repl_parser::ReplParser;pub use sourcemap::EntryKind;pub use sourcemap::HoverInfo;pub use sourcemap::SemanticToken;pub use sourcemap::SourceEntry;pub use sourcemap::SourceMap;pub use sourcemap::SourceMapBuilder;pub use span_util::dummy_span;pub use span_util::merge_spans;pub use span_util::span_contains;pub use span_util::span_len;pub use tactic_parser::CalcStep;pub use tactic_parser::CaseArm;pub use tactic_parser::ConvSide;pub use tactic_parser::RewriteRule;pub use tactic_parser::SimpArgs;pub use tactic_parser::TacticExpr;pub use tactic_parser::TacticParser;pub use tokens::Span;pub use tokens::StringPart;pub use tokens::Token;pub use tokens::TokenKind;
Modules§
- ast
- Auto-generated module structure
- ast_
impl - Auto-generated module structure
- command
- Auto-generated module structure
- diagnostic
- Auto-generated module structure
- error
- Auto-generated module structure
- error_
impl - Auto-generated module structure
- expr_
cache - Auto-generated module structure
- formatter_
adv - Advanced formatter with Wadler-Lindig optimal layout. Auto-generated module structure
- incremental
- Auto-generated module structure
- indent_
tracker - Auto-generated module structure
- keywords
- Reserved keywords and special identifiers in OxiLean.
- lexer
- Auto-generated module structure
- macro_
parser - Auto-generated module structure
- module
- Auto-generated module structure
- notation_
system - Auto-generated module structure
- parser
- Auto-generated module structure
- parser_
impl - Auto-generated module structure
- pattern
- Auto-generated module structure
- prec
- Operator precedence constants for the OxiLean grammar.
- prettyprint
- Auto-generated module structure
- repl_
parser - Auto-generated module structure
- roundtrip
- Auto-generated module structure
- sourcemap
- Auto-generated module structure
- span_
util - Auto-generated module structure
- tactic_
parser - Auto-generated module structure
- token
- Auto-generated module structure
- token_
utils - Utility functions for working with tokens.
- tokens
- Auto-generated module structure
- wasm_
source_ map - Auto-generated module structure
Structs§
- Annotated
- A span annotation: associates a value with its source location.
- Compilation
Unit - A compilation unit consisting of source file, AST, and errors.
- Decl
Table - A declaration table mapping names to their source locations.
- Decl
Table Ext2 - A declaration table mapping names to types.
- Import
Graph - A simple import graph.
- Name
Resolution Table Ext - A simple name resolution table.
- Namespace
Resolver - A namespace resolver.
- Parse
Context - A parse context: wraps a source file plus configuration.
- Parse
Context Ext - A parse context carrying source and flags.
- Parse
Diagnostics - A collection of parse diagnostics (errors and warnings).
- Parse
Flags Ext - A parse flag set for controlling parser behaviour.
- Parse
Stats Ext - A simple parse statistics record.
- Parse
Warning - A parse warning — something suspicious that isn’t a hard error.
- Parser
Config - Configuration options for the parser.
- Phase
Timer Ext - A timing record for a pipeline phase.
- Pipeline
Timings Ext - A collection of phase timers.
- Simple
String Pool Ext - A simple string pool that interns unique strings.
- Source
File - A parsed source file with its path and content.
- Source
File Registry - A source file registry that tracks all loaded files.
- Source
Summary - A source file summary.
- Source
Summary Ext2 - A summary of a source file.
- Syntactic
Hole - Represents a syntactic “hole” in the AST: a position where a term is missing.
- Token
Buffer - A simple token buffer for look-ahead parsing.
- Token
Frequency MapExt - A simple token frequency map.
Enums§
- Compile
Phase Ext - A named phase in the compilation pipeline.
Constants§
- PARSER_
VERSION - The current version of the OxiLean parser.
Functions§
- count_
keyword_ occurrences - Count occurrences of a token kind name in a source string (heuristic, not lexer-based).
- is_
ident_ char - Checks whether a string is a valid identifier character.
- is_
qualified_ name - Check whether a string looks like a qualified name.
- join_
qualified_ name - Join namespace parts and a basename into a qualified name.
- lines_
containing - Returns a list of lines in a source string that contain a given keyword.
- most_
common_ word - Returns the most common word in a source string.
- normalize_
whitespace - Normalize whitespace in a source string: collapse runs of whitespace to single spaces.
- parse_
decl_ str - Parse an OxiLean declaration from a source string.
- parse_
expr_ str - Parse an OxiLean expression from a source string.
- parse_
file - Parse all top-level declarations from an OxiLean source string.
- parse_
source_ file - Parse all top-level declarations from an OxiLean source file.
- rough_
ident_ set - Returns the set of unique identifiers in a source string (heuristic).
- rough_
token_ count - Returns the number of tokens in a source string (whitespace-split).
- source_
preview - Returns a preview of the source (first N chars).
- split_
qualified_ name - Split a qualified name like
Foo.Bar.bazinto namespace + basename. - strip_
namespace_ prefix - Strip a namespace prefix from a qualified name.
- word_
frequency - A simple word frequency counter for source analysis.
- word_
frequency_ ext2 - Returns a word frequency map for the given source string.
Type Aliases§
- Parse
Result - The result of a parse operation: either a value or a parse error.