Skip to main content

Crate oxilean_parse

Crate oxilean_parse 

Source
Expand description

§OxiLean Parser — Surface Syntax to AST

This crate converts concrete OxiLean source syntax (text) into abstract syntax trees (ASTs) that the elaborator can process into kernel-checkable terms.

§Quick Start

§Parsing an Expression

use oxilean_parse::{Lexer, Parser, SurfaceExpr};

let source = "fun (x : Nat) => x + 1";
let lexer = Lexer::new(source);
let tokens = lexer.tokenize()?;
let mut parser = Parser::new(tokens);
let expr = parser.parse_expr()?;

§Parsing a Module

use oxilean_parse::Module;

let source = "def double (n : Nat) : Nat := n + n";
let module = Module::parse_source(source)?;

§Architecture Overview

The parser is a three-stage pipeline:

Source Text (.oxilean file)
    │
    ▼
┌──────────────────────┐
│  Lexer               │  → Tokenizes text
│  (lexer.rs)          │  → Handles UTF-8, comments, strings
└──────────────────────┘
    │
    ▼
Token Stream
    │
    ▼
┌──────────────────────┐
│  Parser              │  → Builds AST from tokens
│  (parser_impl.rs)    │  → Handles precedence, associativity
│  + Helpers           │  → Pattern, macro, tactic parsers
└──────────────────────┘
    │
    ▼
Abstract Syntax Tree (AST)
    │
    └─→ SurfaceExpr, SurfaceDecl, Module, etc.
    └─→ Diagnostic information (errors, warnings)
    └─→ Source mapping (for IDE integration)

§Key Concepts & Terminology

§Tokens

Basic lexical elements:

  • Identifiers: x, Nat, add_comm, etc.
  • Keywords: fun, def, theorem, inductive, etc.
  • Operators: +, -, :, =>, |, etc.
  • Literals: Numbers, strings, characters
  • Delimiters: (, ), [, ], {, }

§Surface Syntax (SurfaceExpr)

Represents OxiLean code before elaboration:

  • Applications: f x y (function calls)
  • Lambda: fun x => body
  • Pi types: (x : A) -> B
  • Matches: match x with | nil => ... | cons h t => ...
  • Tactics: by (tac1; tac2)
  • Attributes: @[simp] def foo := ...

§AST vs Kernel Expr

  • AST (this crate): Surface syntax with implicit info
    • Contains ? (implicit args), _ (placeholders)
    • No type annotations required
    • Represents user-written code
  • Kernel Expr (oxilean-kernel): Type-checked terms
    • All types explicit
    • All implicit args resolved
    • Fully elaborated

§Module Organization

§Core Parsing Modules

ModulePurpose
lexerTokenization: text → tokens
tokensToken and TokenKind definitions
parser_implMain parser: tokens → AST
commandCommand parsing (def, theorem, etc.)

§AST Definition

ModulePurpose
ast_implCore AST types: SurfaceExpr, SurfaceDecl, etc.
patternPattern matching syntax
literalNumber and string literals

§Specialized Parsers

ModulePurpose
tactic_parserTactic syntax: intro, apply, rw, etc.
macro_parserMacro definition and expansion
notation_systemOperator precedence and associativity

§Diagnostics & Source Mapping

ModulePurpose
diagnosticError and warning collection
error_implParse error types and messages
sourcemapSource position tracking for IDE
span_utilSource span utilities

§Utilities

ModulePurpose
prettyprintAST pretty-printing
moduleModule system and imports
repl_parserREPL command parsing

§Parsing Pipeline Details

§Phase 1: Lexical Analysis (Lexer)

Transforms character stream into token stream:

  • Handles Unicode identifiers (UTF-8)
  • Recognizes keywords vs identifiers
  • Tracks line/column positions (for error reporting)
  • Supports:
    • Single-line comments: -- comment
    • Multi-line comments: /- -/
    • String literals: "hello"
    • Number literals: 42, 0xFF, 3.14

§Phase 2: Syntactic Analysis (Parser)

Transforms token stream into AST:

  • Recursive descent: For statements and declarations
  • Pratt parsing: For expressions (handles precedence)
  • Lookahead(1): LL(1) grammar for predictive parsing
  • Error recovery: Continues parsing after errors

§Phase 3: Post-Processing

  • Notation expansion: Apply infix/prefix operators
  • Macro expansion: Expand syntax sugar
  • Span assignment: Map AST nodes to source positions

§Usage Examples

§Example 1: Parse and Pretty-Print

use oxilean_parse::{Lexer, Parser, print_expr};

let source = "(x : Nat) -> Nat";
let mut parser = Parser::from_source(source)?;
let expr = parser.parse_expr()?;
println!("{}", print_expr(&expr));

§Example 2: Parse a Definition

use oxilean_parse::{Lexer, Parser, Decl};

let source = "def double (n : Nat) : Nat := n + n";
let mut parser = Parser::from_source(source)?;
let decl = parser.parse_decl()?;
assert!(matches!(decl, Decl::Def { .. }));

§Example 3: Collect Diagnostics

use oxilean_parse::DiagnosticCollector;

let mut collector = DiagnosticCollector::new();
// ... parse code ...
for diag in collector.diagnostics() {
    println!("{:?}", diag);
}

§Operator Precedence

Operators are organized by precedence levels (0-100):

  • Level 100 (highest): Projections, applications
  • Level 90: Power/exponentiation
  • Level 70: Multiplication, division
  • Level 65: Addition, subtraction
  • Level 50: Comparison (<, >, =, etc.)
  • Level 40: Conjunction (and)
  • Level 35: Disjunction (or)
  • Level 25: Implication (->)
  • Level 0 (lowest): Binders (fun, :, etc.)

Associativity (left/right/non-associative) is per-operator.

§Error Handling

Parser errors include:

  • Unexpected token: Parser expected a different token
  • Expected type token: Specific token was expected but not found
  • Unclosed delimiter: Missing closing bracket/paren
  • Undeclared operator: Unknown infix operator
  • Invalid pattern: Malformed pattern in match/fun

All errors carry:

  • Source location (span): File, line, column
  • Error message: Human-readable description
  • Context: Surrounding code snippet (for IDE tooltips)

§Source Mapping & IDE Integration

The parser builds a source map tracking:

  • AST node → source location
  • Hover information (for IDE hover tooltips)
  • Semantic tokens (for syntax highlighting)
  • Reference locations (for “go to definition”)

This enables:

  • Accurate error reporting
  • IDE language server protocol (LSP) support
  • Refactoring tools

§Extensibility

§Adding New Operators

Operators are registered in notation_system:

let notation = Notation {
    name: "my_op",
    kind: NotationKind::Infix,
    level: 60,
    associativity: Associativity::Left,
};
notation_table.insert(notation);

§Adding New Keywords

Keywords are hardcoded in lexer::keyword_of_string(). Add new keyword, then handle in parser.

§Custom Macros

Macros are parsed by macro_parser and expanded during parsing:

syntax "list" ["[", expr, (",", expr)*, "]"] => ...
macro list_to_cons : list => (...)

§Integration with Other Crates

§With oxilean-elab

The elaborator consumes this crate’s AST:

Parser: Source → SurfaceExpr
Elaborator: SurfaceExpr → Kernel Expr (with type checking)

§With oxilean-kernel

Kernel types (Name, Level, Literal) are re-exported by parser for convenience.

§Performance Considerations

  • Linear parsing: O(n) where n = source length
  • Minimal allocations: AST nodes are typically smaller than source
  • Single pass: No tokenization+parsing phase, done in parallel

§Further Reading

Re-exports§

pub use ast::SurfaceExpr as OldSurfaceExpr;
pub use ast_impl::AstNotationKind;
pub use ast_impl::AttributeKind;
pub use ast_impl::Binder;
pub use ast_impl::BinderKind;
pub use ast_impl::CalcStep as AstCalcStep;
pub use ast_impl::Constructor;
pub use ast_impl::Decl;
pub use ast_impl::DoAction;
pub use ast_impl::FieldDecl;
pub use ast_impl::Literal;
pub use ast_impl::Located;
pub use ast_impl::MatchArm;
pub use ast_impl::Pattern;
pub use ast_impl::SortKind;
pub use ast_impl::SurfaceExpr;
pub use ast_impl::TacticRef;
pub use ast_impl::WhereClause;
pub use command::Command;
pub use command::CommandParser;
pub use command::NotationKind;
pub use command::OpenItem;
pub use diagnostic::Diagnostic;
pub use diagnostic::DiagnosticCollector;
pub use diagnostic::DiagnosticLabel;
pub use diagnostic::Severity;
pub use error::ParseError as OldParseError;
pub use error_impl::ParseError;
pub use error_impl::ParseErrorKind;
pub use lexer::Lexer;
pub use macro_parser::HygieneInfo;
pub use macro_parser::MacroDef;
pub use macro_parser::MacroError;
pub use macro_parser::MacroErrorKind;
pub use macro_parser::MacroExpander;
pub use macro_parser::MacroParser;
pub use macro_parser::MacroRule;
pub use macro_parser::MacroToken;
pub use macro_parser::SyntaxDef;
pub use macro_parser::SyntaxItem;
pub use macro_parser::SyntaxKind;
pub use module::Module;
pub use module::ModuleRegistry;
pub use notation_system::Fixity;
pub use notation_system::NotationEntry;
pub use notation_system::NotationKind as NotationSystemKind;
pub use notation_system::NotationPart;
pub use notation_system::NotationTable;
pub use notation_system::OperatorEntry;
pub use parser_impl::Parser;
pub use pattern::MatchClause;
pub use pattern::PatternCompiler;
pub use prettyprint::print_decl;
pub use prettyprint::print_expr;
pub use prettyprint::ParensMode;
pub use prettyprint::PrettyConfig;
pub use prettyprint::PrettyPrinter;
pub use repl_parser::is_complete;
pub use repl_parser::ReplCommand;
pub use repl_parser::ReplParser;
pub use sourcemap::EntryKind;
pub use sourcemap::HoverInfo;
pub use sourcemap::SemanticToken;
pub use sourcemap::SourceEntry;
pub use sourcemap::SourceMap;
pub use sourcemap::SourceMapBuilder;
pub use span_util::dummy_span;
pub use span_util::merge_spans;
pub use span_util::span_contains;
pub use span_util::span_len;
pub use tactic_parser::CalcStep;
pub use tactic_parser::CaseArm;
pub use tactic_parser::ConvSide;
pub use tactic_parser::RewriteRule;
pub use tactic_parser::SimpArgs;
pub use tactic_parser::TacticExpr;
pub use tactic_parser::TacticParser;
pub use tokens::Span;
pub use tokens::StringPart;
pub use tokens::Token;
pub use tokens::TokenKind;

Modules§

ast
Auto-generated module structure
ast_impl
Auto-generated module structure
command
Auto-generated module structure
diagnostic
Auto-generated module structure
error
Auto-generated module structure
error_impl
Auto-generated module structure
expr_cache
Auto-generated module structure
formatter_adv
Advanced formatter with Wadler-Lindig optimal layout. Auto-generated module structure
incremental
Auto-generated module structure
indent_tracker
Auto-generated module structure
keywords
Reserved keywords and special identifiers in OxiLean.
lexer
Auto-generated module structure
macro_parser
Auto-generated module structure
module
Auto-generated module structure
notation_system
Auto-generated module structure
parser
Auto-generated module structure
parser_impl
Auto-generated module structure
pattern
Auto-generated module structure
prec
Operator precedence constants for the OxiLean grammar.
prettyprint
Auto-generated module structure
repl_parser
Auto-generated module structure
roundtrip
Auto-generated module structure
sourcemap
Auto-generated module structure
span_util
Auto-generated module structure
tactic_parser
Auto-generated module structure
token
Auto-generated module structure
token_utils
Utility functions for working with tokens.
tokens
Auto-generated module structure
wasm_source_map
Auto-generated module structure

Structs§

Annotated
A span annotation: associates a value with its source location.
CompilationUnit
A compilation unit consisting of source file, AST, and errors.
DeclTable
A declaration table mapping names to their source locations.
DeclTableExt2
A declaration table mapping names to types.
ImportGraph
A simple import graph.
NameResolutionTableExt
A simple name resolution table.
NamespaceResolver
A namespace resolver.
ParseContext
A parse context: wraps a source file plus configuration.
ParseContextExt
A parse context carrying source and flags.
ParseDiagnostics
A collection of parse diagnostics (errors and warnings).
ParseFlagsExt
A parse flag set for controlling parser behaviour.
ParseStatsExt
A simple parse statistics record.
ParseWarning
A parse warning — something suspicious that isn’t a hard error.
ParserConfig
Configuration options for the parser.
PhaseTimerExt
A timing record for a pipeline phase.
PipelineTimingsExt
A collection of phase timers.
SimpleStringPoolExt
A simple string pool that interns unique strings.
SourceFile
A parsed source file with its path and content.
SourceFileRegistry
A source file registry that tracks all loaded files.
SourceSummary
A source file summary.
SourceSummaryExt2
A summary of a source file.
SyntacticHole
Represents a syntactic “hole” in the AST: a position where a term is missing.
TokenBuffer
A simple token buffer for look-ahead parsing.
TokenFrequencyMapExt
A simple token frequency map.

Enums§

CompilePhaseExt
A named phase in the compilation pipeline.

Constants§

PARSER_VERSION
The current version of the OxiLean parser.

Functions§

count_keyword_occurrences
Count occurrences of a token kind name in a source string (heuristic, not lexer-based).
is_ident_char
Checks whether a string is a valid identifier character.
is_qualified_name
Check whether a string looks like a qualified name.
join_qualified_name
Join namespace parts and a basename into a qualified name.
lines_containing
Returns a list of lines in a source string that contain a given keyword.
most_common_word
Returns the most common word in a source string.
normalize_whitespace
Normalize whitespace in a source string: collapse runs of whitespace to single spaces.
parse_decl_str
Parse an OxiLean declaration from a source string.
parse_expr_str
Parse an OxiLean expression from a source string.
parse_file
Parse all top-level declarations from an OxiLean source string.
parse_source_file
Parse all top-level declarations from an OxiLean source file.
rough_ident_set
Returns the set of unique identifiers in a source string (heuristic).
rough_token_count
Returns the number of tokens in a source string (whitespace-split).
source_preview
Returns a preview of the source (first N chars).
split_qualified_name
Split a qualified name like Foo.Bar.baz into namespace + basename.
strip_namespace_prefix
Strip a namespace prefix from a qualified name.
word_frequency
A simple word frequency counter for source analysis.
word_frequency_ext2
Returns a word frequency map for the given source string.

Type Aliases§

ParseResult
The result of a parse operation: either a value or a parse error.