Expand description
Santiago is a lexing and parsing toolkit for Rust. It provides you:
- A library for defining any context-free grammar,
- A Lexical analysis module,
- And facilities for building interpreters or compilers of the language.
With Santiago you have everything that is needed to build your own programming language!
We are the Rust alternative to GNU Bison, Yacc and Flex.
Usage
This crate is on crates.io
and can be used by adding santiago
to your dependencies in your project’s Cargo.toml
[dependencies]
santiago = "*"
Examples
Calculator
For this example we are interested in lexing and parsing the addition of integer numbers like:
10 + 20 + 30
And evaluating it to a single value: 60
.
In the process we will create an Abstract Syntax Tree like:
BinaryOperation(vec![
BinaryOperation(vec![
Int(10),
OperatorAdd,
Int(20),
]),
OperatorAdd,
Int(30),
])
So let’s start with a lexer to:
- Group the digits into integers called
"INT"
- Capture the plus sign (
+
) and name it"PLUS"
- Ignore all whitespace
In code this would be:
use santiago::lexer::LexerRules;
pub fn lexer_rules() -> LexerRules {
santiago::lexer_rules!(
// One more sequential digits from 0 to 9 will be mapped to an "INT"
"DEFAULT" | "INT" = pattern r"[0-9]+";
// A literal "+" will be mapped to "PLUS"
"DEFAULT" | "PLUS" = string "+";
// Whitespace " " will be skipped
"DEFAULT" | "WS" = pattern r"\s" => |lexer| lexer.skip();
)
}
Once we have our rules defined, we can start lexing:
let input = "10 + 20 + 30";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
A Lexeme gives us information like:
- Token kind
- Contents
- Position (line and column number)
In this case we have two kinds of tokens, an INT
and a PLUS
:
INT "10" (1, 1)
PLUS "+" (1, 4)
INT "20" (1, 6)
PLUS "+" (1, 9)
INT "30" (1, 11)
At this point all we are missing is creating a parser.
Let’s create a grammar to recognize the addition of integer numbers:
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
"sum" => rules "sum" "plus" "sum";
"sum" => lexemes "INT";
"plus" => lexemes "PLUS";
)
}
Now we can generate a Parse Tree!
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
Which looks like:
---
Γ := rules "sum"
sum := rules "sum" "plus" "sum"
sum := rules "sum" "plus" "sum"
sum := lexemes "INT"
INT "10" (1, 1)
plus := lexemes "PLUS"
PLUS "+" (1, 4)
sum := lexemes "INT"
INT "20" (1, 6)
plus := lexemes "PLUS"
PLUS "+" (1, 9)
sum := lexemes "INT"
INT "30" (1, 11)
---
Γ := rules "sum"
sum := rules "sum" "plus" "sum"
sum := lexemes "INT"
INT "10" (1, 1)
plus := lexemes "PLUS"
PLUS "+" (1, 4)
sum := rules "sum" "plus" "sum"
sum := lexemes "INT"
INT "20" (1, 6)
plus := lexemes "PLUS"
PLUS "+" (1, 9)
sum := lexemes "INT"
INT "30" (1, 11)
Notice that we obtained 2 possible abstract syntax trees, since we can understand the input as:
(11 + 22) + 33
11 + (22 + 33)
This happens because we created an
ambiguous grammar,
but this is no problem for Santiago!
We can remove the ambiguities
by adding associativity constraints to the “plus” rule,
in order to select (11 + 22) + 33
as our source of truth.
In code, we only need to add one line at the end of our previous grammar:
use santiago::grammar::Associativity;
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
"sum" => rules "sum" "plus" "sum";
"sum" => lexemes "INT";
"plus" => lexemes "PLUS";
Associativity::Left => rules "plus";
)
}
And parse again!
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
This time our grammar is deterministic and we will always have a single unambiguous Parse Tree:
---
Γ := rules "sum"
sum := rules "sum" "plus" "sum"
sum := rules "sum" "plus" "sum"
sum := lexemes "INT"
INT "10" (1, 1)
plus := lexemes "PLUS"
PLUS "+" (1, 4)
sum := lexemes "INT"
INT "20" (1, 6)
plus := lexemes "PLUS"
PLUS "+" (1, 9)
sum := lexemes "INT"
INT "30" (1, 11)
All we are missing now is evaluating the addition. For this let’s modify the grammar so that each time a rule matches we produce an amenable data-structure that can be turned into an Abstract Syntax Tree:
use santiago::grammar::Associativity;
use santiago::grammar::Grammar;
#[derive(Debug, PartialEq)]
pub enum AST {
// A single integer.
Int(isize),
// A binary operation with three arguments: `left`, `op`, and `right`.
BinaryOperation(Vec<AST>),
// The binary operator for addition (a.k.a. `+`).
OperatorAdd,
}
pub fn grammar() -> Grammar<AST> {
santiago::grammar!(
"sum" => rules "sum" "plus" "sum" =>
AST::BinaryOperation;
"sum" => lexemes "INT" => |lexemes| {
// &str to isize conversion
let value = str::parse::<isize>(&lexemes[0].raw).unwrap();
AST::Int(value)
};
"plus" => lexemes "PLUS" =>
|_| AST::OperatorAdd;
Associativity::Left => rules "plus";
)
}
Now, the next time we parse,
we can transform our Parse Tree into
an Abstract Syntax Tree
by calling Santiago’s builtin-function as_abstract_syntax_tree()
.
use AST::*;
let ast = parse_tree.as_abstract_syntax_tree();
assert_eq!(
ast,
BinaryOperation(vec![
BinaryOperation(vec![
Int(10),
OperatorAdd,
Int(20),
]),
OperatorAdd,
Int(30),
]),
)
And now we can traverse this data-structure and compute a result:
pub fn eval(value: &AST) -> isize {
match value {
AST::Int(int) => *int,
AST::BinaryOperation(args) => match &args[1] {
AST::OperatorAdd => eval(&args[0]) + eval(&args[2]),
_ => unreachable!(),
},
_ => unreachable!(),
}
}
Like this:
let ast = parse_tree.as_abstract_syntax_tree();
assert_eq!(eval(&ast), 60);
How nice is that?
We just created:
- Our own programming language (a calculator)
- An interpreter for our language!
Technical details
Lexical Analysis
A Lexer splits an input of characters into small groups of characters with related meaning, while discarding irrelevant characters like whitespace.
For example: 1 + 2
is transformed into: [INT, PLUS, INT]
.
A lexer analyzes its input by looking for strings which match any of its active rules:
- If it finds more than one match, it takes the one matching the most text.
- If it finds two or more matches of the same length, the rule listed first is chosen.
- A rule is considered active if any of its applicable states matches the current state.
Once the match is determined the corresponding rule action is executed, which can in turn:
- Retrieve the current matched string with matched.
- Manipulate the states stack with push_state and pop_state.
- And finally take, skip, take_and_retry, skip_and_retry, the current match, or signal an error.
For convenience, the stack of states is initially populated with "DEFAULT"
.
Grammars
A Grammar
is a simple way of describing a language,
like JSON
, TOML
, YAML
, Python
, Go
, or Rust
.
They are commonly described in
Backus–Naur form.
Grammars are composed of grammar rules, which define how a rule can be produce other rules or Lexemes, for example, a full name is composed of a given name and a family name:
"full_name" => rules "given_name" "family_name"
And a given name can be “Jane” or “Kevin”, and so on:
"given_name" => lexemes "Jane"
"given_name" => lexemes "Kevin"
"given_name" => lexemes "..."
Examples
In this section we explore a few more full examples, ordered by complexity.
Smallest lexer possible
This lexer will copy char by char the input:
use santiago::lexer::LexerRules;
pub fn lexer_rules() -> LexerRules {
santiago::lexer_rules!(
"DEFAULT" | "CHAR" = pattern ".";
)
}
For example:
let input = "abcd";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
Which outputs:
CHAR "a" (1, 1)
CHAR "b" (1, 2)
CHAR "c" (1, 3)
CHAR "d" (1, 4)
And we can build a grammar to recognize a sequence of characters:
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
// A rule for 0 characters
"chars" => empty;
// A rule that maps to itself plus one character (recursion)
"chars" => rules "chars" "char";
// A char comes from the lexeme "CHAR"
"char" => lexemes "CHAR";
)
}
And parse!
let input = "abcd";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
Which outputs:
---
Γ := rules "chars"
chars := rules "chars" "char"
chars := rules "chars" "char"
chars := rules "chars" "char"
chars := rules "chars" "char"
chars := rules
char := lexemes "CHAR"
CHAR "a" (1, 1)
char := lexemes "CHAR"
CHAR "b" (1, 2)
char := lexemes "CHAR"
CHAR "c" (1, 3)
char := lexemes "CHAR"
CHAR "d" (1, 4)
Calculator with four operations
This lexer can handle integer arithmetic in the form:
1 + 2 * 3 / 6 - 7
Similar to those you find in a basic calculator.
use santiago::lexer::LexerRules;
pub fn lexer_rules() -> LexerRules {
santiago::lexer_rules!(
"DEFAULT" | "INT" = pattern r"[0-9]+";
"DEFAULT" | "+" = string "+";
"DEFAULT" | "-" = string "-";
"DEFAULT" | "*" = string "*";
"DEFAULT" | "/" = string "/";
"DEFAULT" | "WS" = pattern r"\s" => |lexer| lexer.skip();
)
}
For example:
let input = "1 + 2 * 3 / 6 - 7";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
Which outputs:
INT "1" (1, 1)
+ "+" (1, 3)
INT "2" (1, 5)
* "*" (1, 7)
INT "3" (1, 9)
/ "/" (1, 11)
INT "6" (1, 13)
- "-" (1, 15)
INT "7" (1, 17)
Now let’s build a Parse Tree:
use santiago::grammar::Associativity;
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
"expr" => rules "int";
"expr" => rules "expr" "add" "expr";
"expr" => rules "expr" "subtract" "expr";
"expr" => rules "expr" "multiply" "expr";
"expr" => rules "expr" "divide" "expr";
"int" => lexemes "INT";
"add" => lexemes "+";
"subtract" => lexemes "-";
"multiply" => lexemes "*";
"divide" => lexemes "/";
Associativity::Left => rules "add" "subtract";
Associativity::Left => rules "multiply" "divide";
)
}
And parse!
let input = "1 + 2 * 3 / 6 - 7";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
Which outputs:
---
Γ := rules "expr"
expr := rules "expr" "subtract" "expr"
expr := rules "expr" "add" "expr"
expr := rules "int"
int := lexemes "INT"
INT "1" (1, 1)
add := lexemes "+"
+ "+" (1, 3)
expr := rules "expr" "divide" "expr"
expr := rules "expr" "multiply" "expr"
expr := rules "int"
int := lexemes "INT"
INT "2" (1, 5)
multiply := lexemes "*"
* "*" (1, 7)
expr := rules "int"
int := lexemes "INT"
INT "3" (1, 9)
divide := lexemes "/"
/ "/" (1, 11)
expr := rules "int"
int := lexemes "INT"
INT "6" (1, 13)
subtract := lexemes "-"
- "-" (1, 15)
expr := rules "int"
int := lexemes "INT"
INT "7" (1, 17)
We can also create an interpreter that performs the indicated additions, subtractions, multiplications and divisions.
For this let’s create a more complete grammar:
use santiago::grammar::Associativity;
use santiago::grammar::Grammar;
#[derive(Debug)]
pub enum AST {
Int(isize),
BinaryOperation(Vec<AST>),
OperatorAdd,
OperatorSubtract,
OperatorMultiply,
OperatorDivide,
}
pub fn grammar() -> Grammar<AST> {
santiago::grammar!(
"expr" => rules "int";
"expr" => rules "expr" "add" "expr" =>
AST::BinaryOperation;
"expr" => rules "expr" "subtract" "expr" =>
AST::BinaryOperation;
"expr" => rules "expr" "multiply" "expr" =>
AST::BinaryOperation;
"expr" => rules "expr" "divide" "expr" =>
AST::BinaryOperation;
"add" => lexemes "+" =>
|_| AST::OperatorAdd;
"subtract" => lexemes "-" =>
|_| AST::OperatorSubtract;
"multiply" => lexemes "*" =>
|_| AST::OperatorMultiply;
"divide" => lexemes "/" =>
|_| AST::OperatorDivide;
"int" => lexemes "INT" =>
|lexemes| {
let value = str::parse(&lexemes[0].raw).unwrap();
AST::Int(value)
};
Associativity::Left => rules "add" "subtract";
Associativity::Left => rules "multiply" "divide";
)
}
And a function to perform the arithmetic:
pub fn eval(value: &AST) -> isize {
match value {
AST::Int(int) => *int,
AST::BinaryOperation(args) => match &args[1] {
AST::OperatorAdd => eval(&args[0]) + eval(&args[2]),
AST::OperatorSubtract => eval(&args[0]) - eval(&args[2]),
AST::OperatorMultiply => eval(&args[0]) * eval(&args[2]),
AST::OperatorDivide => eval(&args[0]) / eval(&args[2]),
_ => unreachable!(),
},
_ => unreachable!(),
}
}
Now the interpreter can be used like:
let input = "1 + 2 * 3 / 6 - 7";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
let grammar = grammar();
let parse_tree = &santiago::parser::parse(&grammar, &lexemes).unwrap()[0];
let ast = parse_tree.as_abstract_syntax_tree();
assert_eq!(eval(&ast), -5);
JavaScript string interpolations
This lexer can handle strings interpolations in the form:
'Hello ${ name }, your age is: ${ age }.'
Similar to those you find in many programming languages.
use santiago::lexer::LexerRules;
pub fn lexer_rules() -> LexerRules {
santiago::lexer_rules!(
// If the current state is "DEFAULT",
// associate a "'" with the beginning of the string,
// and make the current state be "INSIDE_STRING".
"DEFAULT" | "STRING_START" = string "'" => |lexer| {
lexer.push_state("INSIDE_STRING");
lexer.take()
};
// If the current state is "INSIDE_STRING"
// associate "${" with nothing,
// make the current state be "INSIDE_STRING_INTERPOLATION"
// and skip the current match.
"INSIDE_STRING" | "" = string "${" => |lexer| {
lexer.push_state("INSIDE_STRING_INTERPOLATION");
lexer.skip()
};
// If the current state is "INSIDE_STRING_INTERPOLATION"
// associate one or more latin letters to a variable.
"INSIDE_STRING_INTERPOLATION" | "VAR" = pattern "[a-z]+";
// If the current state is "INSIDE_STRING_INTERPOLATION"
// associate a "}" with nothing,
// and skip the current match.
"INSIDE_STRING_INTERPOLATION" | "STR" = string "}" => |lexer| {
lexer.pop_state();
lexer.skip()
};
// If the current state is "INSIDE_STRING",
// associate a "'" with the end of the string
// and go back to the previous state.
"INSIDE_STRING" | "STRING_END" = string "'" => |lexer| {
lexer.pop_state();
lexer.take()
};
// If the current state is "INSIDE_STRING"
// associate anything with a "STR".
//
// Note how the "'" in the previous rule takes precedence over this one.
"INSIDE_STRING" | "STR" = pattern ".";
// If the current state is "DEFAULT" or "INSIDE_STRING_INTERPOLATION"
// associate a " " with whitespace, and skip it.
"DEFAULT" "INSIDE_STRING_INTERPOLATION" | "WS" = string " " => |lexer| {
lexer.skip()
};
)
}
For example:
let input = "'a${ b }c${ d }e'";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
Which outputs:
STRING_START "'" (1, 1)
STR "a" (1, 2)
VAR "b" (1, 6)
STR "c" (1, 9)
VAR "d" (1, 13)
STR "e" (1, 16)
STRING_END "'" (1, 17)
Now let’s build a Parse Tree:
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
// A string in the form: `str_content`
"string" => rules "string_start" "str_content" "string_end";
// Either empty
// or followed by a "str"
// or followed by a "var"
"str_content" => empty;
"str_content" => rules "str_content" "str";
"str_content" => rules "str_content" "var";
// Map rules to their corresponding Lexemes
"str" => lexemes "STR";
"string_start" => lexemes "STRING_START";
"string_end" => lexemes "STRING_END";
"var" => lexemes "VAR";
)
}
And parse!
let input = "'a${ b }c${ d }e'";
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
Which outputs:
---
Γ := rules "string"
string := rules "string_start" "str_content" "string_end"
string_start := lexemes "STRING_START"
STRING_START "'" (1, 1)
str_content := rules "str_content" "str"
str_content := rules "str_content" "var"
str_content := rules "str_content" "str"
str_content := rules "str_content" "var"
str_content := rules "str_content" "str"
str_content := rules
str := lexemes "STR"
STR "a" (1, 2)
var := lexemes "VAR"
VAR "b" (1, 6)
str := lexemes "STR"
STR "c" (1, 9)
var := lexemes "VAR"
VAR "d" (1, 13)
str := lexemes "STR"
STR "e" (1, 16)
string_end := lexemes "STRING_END"
STRING_END "'" (1, 17)
Nix Expression Language
This lexer can handle the Nix expression language, whose original lexer and parser is written in Flex and GNU Bison:
- https://github.com/NixOS/nix/blob/9174d884d750b7b49a571bd55275f0883c2dabda/src/libexpr/lexer.l.
- https://github.com/NixOS/nix/blob/9174d884d750b7b49a571bd55275f0883c2dabda/src/libexpr/parser.y.
use santiago::lexer::LexerRules;
santiago::def!(ANY, r"(?:.|\n)");
santiago::def!(ID, r"[a-zA-Z_][a-zA-Z0-9_'\-]*");
santiago::def!(INT, r"[0-9]+");
santiago::def!(
FLOAT,
r"(([1-9][0-9]*\.[0-9]*)|(0?\.[0-9]+))([Ee][+-]?[0-9]+)?"
);
santiago::def!(PATH_CHAR, r"[a-zA-Z0-9\._\-\+]");
santiago::def!(PATH, concat!(PATH_CHAR!(), r"*(/", PATH_CHAR!(), r"+)+/?"));
santiago::def!(PATH_SEG, concat!(PATH_CHAR!(), r"*/"));
santiago::def!(HPATH, concat!(r"\~(/", PATH_CHAR!(), r"+)+/?"));
santiago::def!(HPATH_START, r"\~/");
santiago::def!(
SPATH,
concat!(r"<", PATH_CHAR!(), r"+(/", PATH_CHAR!(), r"+)*>")
);
santiago::def!(
URI,
r"[a-zA-Z][a-zA-Z0-9\+\-\.]*:[a-zA-Z0-9%/\?:@\&=\+\$,\-_\.!\~\*']+"
);
pub fn lexer_rules() -> LexerRules {
santiago::lexer_rules!(
"DEFAULT" | "IF" = string "if";
"DEFAULT" | "THEN" = string "then";
"DEFAULT" | "ELSE" = string "else";
"DEFAULT" | "ASSERT" = string "assert";
"DEFAULT" | "WITH" = string "with";
"DEFAULT" | "LET" = string "let";
"DEFAULT" | "IN" = string "in";
"DEFAULT" | "REC" = string "rec";
"DEFAULT" | "INHERIT" = string "inherit";
"DEFAULT" | "OR_KW" = string "or";
"DEFAULT" | "ELLIPSIS" = string "...";
"DEFAULT" | "EQ" = string "==";
"DEFAULT" | "NEQ" = string "!=";
"DEFAULT" | "LEQ" = string "<=";
"DEFAULT" | "GEQ" = string ">=";
"DEFAULT" | "AND" = string "&&";
"DEFAULT" | "OR" = string "||";
"DEFAULT" | "IMPL" = string "->";
"DEFAULT" | "UPDATE" = string "//";
"DEFAULT" | "CONCAT" = string "++";
"DEFAULT" | "ID" = pattern ID!();
"DEFAULT" | "INT" = pattern INT!();
"DEFAULT" | "FLOAT" = pattern FLOAT!();
"DEFAULT" | "DOLLAR_CURLY" = string "${" => |lexer| {
lexer.push_state("DEFAULT");
lexer.take()
};
"DEFAULT" | "}" = string "}" => |lexer| {
lexer.pop_state();
lexer.take()
};
"DEFAULT" | "{" = string "{" => |lexer| {
lexer.push_state("DEFAULT");
lexer.take()
};
"DEFAULT" | "\"" = string "\"" => |lexer| {
lexer.push_state("STRING");
lexer.take()
};
"STRING" | "STR"
= pattern concat!(
r#"([^\$"\\]|\$[^\{"\\]|\\"#,
ANY!(),
r"|\$\\",
ANY!(),
r#")*\$""#
)
=> |lexer| {
lexer.current_match_len -= 1;
lexer.take_and_map(unescape_string)
};
"STRING" | "STR"
= pattern concat!(
r#"([^\$"\\]|\$[^\{"\\]|\\"#,
ANY!(),
r"|\$\\",
ANY!(),
r")+"
)
=> |lexer| lexer.take_and_map(unescape_string);
"STRING" | "DOLLAR_CURLY" = string "${" => |lexer| {
lexer.push_state("DEFAULT");
lexer.take()
};
"STRING" | "\"" = string "\"" => |lexer| {
lexer.pop_state();
lexer.take()
};
"STRING" | "STR" = pattern r"\$|\\|\$\\";
"DEFAULT" | "IND_STRING_OPEN" = pattern r"''( *\n)?" => |lexer| {
lexer.push_state("IND_STRING");
lexer.take()
};
"IND_STRING" | "IND_STR" = pattern r"([^\$']|\$[^\{']|'[^'\$])+";
"IND_STRING" | "IND_STR" = string "''$" => |lexer| {
lexer.take_and_map(|_| "$".to_string())
};
"IND_STRING" | "IND_STR" = string "$";
"IND_STRING" | "IND_STR" = string "'''" => |lexer| {
lexer.take_and_map(|_| "''".to_string())
};
"IND_STRING" | "IND_STR" = pattern concat!(r"''\\", ANY!()) => |lexer| {
lexer.take_and_map(|matched| unescape_string(&matched[2..]))
};
"IND_STRING" | "DOLLAR_CURLY" = string "${" => |lexer| {
lexer.push_state("DEFAULT");
lexer.take()
};
"IND_STRING" | "IND_STRING_CLOSE" = string "''" => |lexer| {
lexer.pop_state();
lexer.take()
};
"IND_STRING" | "IND_STR" = string "'";
"DEFAULT" | "SKIP" = string concat!(PATH_SEG!(), "${") => |lexer| {
lexer.push_state("PATH_START");
lexer.skip_and_retry()
};
"DEFAULT" | "SKIP" = string concat!(HPATH_START!(), "${") => |lexer| {
lexer.push_state("PATH_START");
lexer.skip_and_retry()
};
"PATH_START" | "PATH" = pattern PATH_SEG!() => |lexer| {
lexer.pop_state();
lexer.push_state("INPATH_SLASH");
lexer.take()
};
"PATH_START" | "HPATH" = pattern HPATH_START!() => |lexer| {
lexer.pop_state();
lexer.push_state("INPATH_SLASH");
lexer.take()
};
"DEFAULT" | "PATH" = pattern PATH!() => |lexer| {
let matched = lexer.matched();
if &matched[matched.len() - 1..] == "/" {
lexer.push_state("INPATH_SLASH");
} else {
lexer.push_state("INPATH");
}
lexer.take()
};
"DEFAULT" | "HPATH" = pattern HPATH!() => |lexer| {
let matched = lexer.matched();
if &matched[matched.len() - 1..] == "/" {
lexer.push_state("INPATH_SLASH");
} else {
lexer.push_state("INPATH");
}
lexer.take()
};
"INPATH" "INPATH_SLASH" | "DOLLAR_CURLY" = string "${" => |lexer| {
lexer.pop_state();
lexer.push_state("INPATH");
lexer.push_state("DEFAULT");
lexer.take()
};
"INPATH" "INPATH_SLASH" | "STR"
= pattern concat!(PATH!(), "|", PATH_SEG!(), "|", PATH_CHAR!(), "+")
=> |lexer| {
let matched = lexer.matched();
if &matched[matched.len() - 1..] == "/" {
lexer.pop_state();
lexer.push_state("INPATH_SLASH");
} else {
lexer.pop_state();
lexer.push_state("INPATH");
}
lexer.take()
};
"INPATH" | "PATH_END" = pattern concat!(ANY!(), "|$") => |lexer| {
lexer.pop_state();
lexer.take_and_retry()
};
"INPATH_SLASH" | "ERROR" = pattern concat!(ANY!(), "|$") => |lexer| {
lexer.error("Path has a trailing slash")
};
"DEFAULT" | "SPATH" = pattern SPATH!();
"DEFAULT" | "URI" = pattern URI!();
"DEFAULT" | "WS" = pattern r"[ \t\r\n]+" => |lexer| lexer.skip();
"DEFAULT" | "COMMENT" = pattern r"\#[^\r\n]*" => |lexer| lexer.skip();
"DEFAULT" | "COMMENT" = pattern r"/\*([^*]|\*+[^*/])*\*+/" => |lexer| {
lexer.skip()
};
//
"DEFAULT" | "*" = string "*";
"DEFAULT" | ":" = string ":";
"DEFAULT" | "." = string ".";
"DEFAULT" | "=" = string "=";
"DEFAULT" | "-" = string "-";
"DEFAULT" | "!" = string "!";
"DEFAULT" | "(" = string "(";
"DEFAULT" | ")" = string ")";
"DEFAULT" | "+" = string "+";
"DEFAULT" | ";" = string ";";
"DEFAULT" | "/" = string "/";
"DEFAULT" | "[" = string "[";
"DEFAULT" | "]" = string "]";
"DEFAULT" | "@" = string "@";
"DEFAULT" | "<" = string "<";
"DEFAULT" | ">" = string ">";
"DEFAULT" | "?" = string "?";
"DEFAULT" | "," = string ",";
//
"DEFAULT" | "ANY" = pattern ANY!() => |lexer| {
lexer.error("Unexpected input")
};
)
}
fn unescape_string(input: &str) -> String {
let mut input_chars = input.chars().peekable();
let mut output = String::new();
loop {
let input_char = input_chars.next();
if input_char.is_none() {
break;
}
let mut input_char = input_char.unwrap();
match input_char {
'\\' => {
input_char = input_chars.next().unwrap();
if input_char == 'n' {
output.push('\n');
} else if input_char == 'r' {
output.push('\r');
} else if input_char == 't' {
output.push('\t');
} else {
output.push(input_char);
}
}
'\r' => {
output.push('\n');
input_chars.next_if(|s| *s == '\n');
}
c => {
output.push(c);
}
}
}
output
}
Example input:
{
lib,
rustPlatform,
fetchFromGitHub,
testVersion,
alejandra,
}:
rustPlatform.buildRustPackage rec {
pname = "alejandra";
version = "1.1.0";
src = fetchFromGitHub {
owner = "kamadorueda";
repo = "alejandra";
rev = version;
sha256 = "sha256-vkFKYnSmhPPXtc3AH7iRtqRRqxhj0o5WySqPT+klDWU=";
};
cargoSha256 = "sha256-MsXaanznE4UtZMj54EDq86aJ2t4xT8O5ziTpa/KCwBw=";
passthru.tests = {
version = testVersion {package = alejandra;};
};
meta = with lib; {
description = "The Uncompromising Nix Code Formatter";
homepage = "https://github.com/kamadorueda/alejandra";
changelog = "https://github.com/kamadorueda/alejandra/blob/${version}/CHANGELOG.md";
license = licenses.unlicense;
maintainers = with maintainers; [_0x4A6F kamadorueda];
};
}
Let’s perform lexical analysis:
let input = include_str!("../tests/nix/cases/pkg/input");
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
Which outputs:
{ "{" (1, 1)
ID "lib" (2, 3)
, "," (2, 6)
ID "rustPlatform" (3, 3)
, "," (3, 15)
ID "fetchFromGitHub" (4, 3)
, "," (4, 18)
ID "testVersion" (5, 3)
, "," (5, 14)
ID "alejandra" (6, 3)
, "," (6, 12)
} "}" (7, 1)
: ":" (7, 2)
ID "rustPlatform" (8, 1)
. "." (8, 13)
ID "buildRustPackage" (8, 14)
REC "rec" (8, 31)
{ "{" (8, 35)
ID "pname" (9, 3)
= "=" (9, 9)
" "\"" (9, 11)
STR "alejandra" (9, 12)
" "\"" (9, 21)
; ";" (9, 22)
ID "version" (10, 3)
= "=" (10, 11)
" "\"" (10, 13)
STR "1.1.0" (10, 14)
" "\"" (10, 19)
; ";" (10, 20)
ID "src" (12, 3)
= "=" (12, 7)
ID "fetchFromGitHub" (12, 9)
{ "{" (12, 25)
ID "owner" (13, 5)
= "=" (13, 11)
" "\"" (13, 13)
STR "kamadorueda" (13, 14)
" "\"" (13, 25)
; ";" (13, 26)
ID "repo" (14, 5)
= "=" (14, 10)
" "\"" (14, 12)
STR "alejandra" (14, 13)
" "\"" (14, 22)
; ";" (14, 23)
ID "rev" (15, 5)
= "=" (15, 9)
ID "version" (15, 11)
; ";" (15, 18)
ID "sha256" (16, 5)
= "=" (16, 12)
" "\"" (16, 14)
STR "sha256-vkFKYnSmhPPXtc3AH7iRtqRRqxhj0o5WySqPT+klDWU=" (16, 15)
" "\"" (16, 66)
; ";" (16, 67)
} "}" (17, 3)
; ";" (17, 4)
ID "cargoSha256" (19, 3)
= "=" (19, 15)
" "\"" (19, 17)
STR "sha256-MsXaanznE4UtZMj54EDq86aJ2t4xT8O5ziTpa/KCwBw=" (19, 18)
" "\"" (19, 69)
; ";" (19, 70)
ID "passthru" (21, 3)
. "." (21, 11)
ID "tests" (21, 12)
= "=" (21, 18)
{ "{" (21, 20)
ID "version" (22, 5)
= "=" (22, 13)
ID "testVersion" (22, 15)
{ "{" (22, 27)
ID "package" (22, 28)
= "=" (22, 36)
ID "alejandra" (22, 38)
; ";" (22, 47)
} "}" (22, 48)
; ";" (22, 49)
} "}" (23, 3)
; ";" (23, 4)
ID "meta" (25, 3)
= "=" (25, 8)
WITH "with" (25, 10)
ID "lib" (25, 15)
; ";" (25, 18)
{ "{" (25, 20)
ID "description" (26, 5)
= "=" (26, 17)
" "\"" (26, 19)
STR "The Uncompromising Nix Code Formatter" (26, 20)
" "\"" (26, 57)
; ";" (26, 58)
ID "homepage" (27, 5)
= "=" (27, 14)
" "\"" (27, 16)
STR "https://github.com/kamadorueda/alejandra" (27, 17)
" "\"" (27, 57)
; ";" (27, 58)
ID "changelog" (28, 5)
= "=" (28, 15)
" "\"" (28, 17)
STR "https://github.com/kamadorueda/alejandra/blob/" (28, 18)
DOLLAR_CURLY "${" (28, 64)
ID "version" (28, 66)
} "}" (28, 73)
STR "/CHANGELOG.md" (28, 74)
" "\"" (28, 87)
; ";" (28, 88)
ID "license" (29, 5)
= "=" (29, 13)
ID "licenses" (29, 15)
. "." (29, 23)
ID "unlicense" (29, 24)
; ";" (29, 33)
ID "maintainers" (30, 5)
= "=" (30, 17)
WITH "with" (30, 19)
ID "maintainers" (30, 24)
; ";" (30, 35)
[ "[" (30, 37)
ID "_0x4A6F" (30, 38)
ID "kamadorueda" (30, 46)
] "]" (30, 57)
; ";" (30, 58)
} "}" (31, 3)
; ";" (31, 4)
} "}" (32, 1)
Now let’s build a Parse Tree:
use santiago::grammar::Associativity;
use santiago::grammar::Grammar;
pub fn grammar() -> Grammar<()> {
santiago::grammar!(
"expr" => rules "expr_function";
"expr_function" => rules "ID" ":" "expr_function";
"expr_function" => rules "{" "formals" "}" ":" "expr_function";
"expr_function" => rules "{" "formals" "}" "@" "ID" ":" "expr_function";
"expr_function" => rules "ID" "@" "{" "formals" "}" ":" "expr_function";
"expr_function" => rules "ASSERT" "expr" ";" "expr_function";
"expr_function" => rules "WITH" "expr" ";" "expr_function";
"expr_function" => rules "LET" "binds" "IN" "expr_function";
"expr_function" => rules "expr_if";
"expr_if" => rules "IF" "expr" "THEN" "expr" "ELSE" "expr";
"expr_if" => rules "expr_op";
"expr_op" => rules "NOT" "expr_op";
"expr_op" => rules "NEGATE" "expr_op";
"expr_op" => rules "expr_op" "EQ" "expr_op";
"expr_op" => rules "expr_op" "NEQ" "expr_op";
"expr_op" => rules "expr_op" "<" "expr_op";
"expr_op" => rules "expr_op" "LEQ" "expr_op";
"expr_op" => rules "expr_op" ">" "expr_op";
"expr_op" => rules "expr_op" "GEQ" "expr_op";
"expr_op" => rules "expr_op" "AND" "expr_op";
"expr_op" => rules "expr_op" "OR" "expr_op";
"expr_op" => rules "expr_op" "IMPL" "expr_op";
"expr_op" => rules "expr_op" "UPDATE" "expr_op";
"expr_op" => rules "expr_op" "?" "attrpath";
"expr_op" => rules "expr_op" "+" "expr_op";
"expr_op" => rules "expr_op" "-" "expr_op";
"expr_op" => rules "expr_op" "*" "expr_op";
"expr_op" => rules "expr_op" "/" "expr_op";
"expr_op" => rules "expr_op" "CONCAT" "expr_op";
"expr_op" => rules "expr_app";
"expr_app" => rules "expr_app" "expr_select";
"expr_app" => rules "expr_select";
"expr_select" => rules "expr_simple" "." "attrpath";
"expr_select" => rules "expr_simple" "." "attrpath" "OR_KW" "expr_select";
"expr_select" => rules "expr_simple" "OR_KW";
"expr_select" => rules "expr_simple";
"expr_simple" => rules "ID";
"expr_simple" => rules "INT";
"expr_simple" => rules "FLOAT";
"expr_simple" => rules "\"" "string_parts" "\"";
"expr_simple" => rules "IND_STRING_OPEN" "ind_string_parts" "IND_STRING_CLOSE";
"expr_simple" => rules "path_start" "PATH_END";
"expr_simple" => rules "path_start" "string_parts_interpolated" "PATH_END";
"expr_simple" => rules "SPATH";
"expr_simple" => rules "URI";
"expr_simple" => rules "(" "expr" ")";
"expr_simple" => rules "LET" "{" "binds" "}";
"expr_simple" => rules "REC" "{" "binds" "}";
"expr_simple" => rules "{" "binds" "}";
"expr_simple" => rules "[" "expr_list" "]";
"string_parts" => rules "STR";
"string_parts" => rules "string_parts_interpolated";
"string_parts" => empty;
"string_parts_interpolated" => rules "string_parts_interpolated" "STR";
"string_parts_interpolated" => rules "string_parts_interpolated" "DOLLAR_CURLY" "expr" "}";
"string_parts_interpolated" => rules "DOLLAR_CURLY" "expr" "}";
"string_parts_interpolated" => rules "STR" "DOLLAR_CURLY" "expr" "}";
"path_start" => rules "PATH";
"path_start" => rules "HPATH";
"ind_string_parts" => rules "ind_string_parts" "IND_STR";
"ind_string_parts" => rules "ind_string_parts" "DOLLAR_CURLY" "expr" "}";
"ind_string_parts" => empty;
"binds" => rules "binds" "attrpath" "=" "expr" ";";
"binds" => rules "binds" "INHERIT" "attrs" ";";
"binds" => rules "binds" "INHERIT" "(" "expr" ")" "attrs" ";";
"binds" => empty;
"attrs" => rules "attrs" "attr";
"attrs" => rules "attrs" "string_attr";
"attrs" => empty;
"attrpath" => rules "attrpath" "." "attr";
"attrpath" => rules "attrpath" "." "string_attr";
"attrpath" => rules "attr";
"attrpath" => rules "string_attr";
"attr" => rules "ID";
"attr" => rules "OR_KW";
"string_attr" => rules "\"" "string_parts" "\"";
"string_attr" => rules "DOLLAR_CURLY" "expr" "}";
"expr_list" => rules "expr_list" "expr_select";
"expr_list" => empty;
"formals" => rules "formal" "," "formals";
"formals" => rules "formal";
"formals" => rules "ELLIPSIS";
"formals" => empty;
"formal" => rules "ID";
"formal" => rules "ID" "?" "expr";
// All lexemes
"!" => lexemes "!";
"\"" => lexemes "\"";
"(" => lexemes "(";
")" => lexemes ")";
"*" => lexemes "*";
"+" => lexemes "+";
"," => lexemes ",";
"." => lexemes ".";
"/" => lexemes "/";
":" => lexemes ":";
";" => lexemes ";";
"<" => lexemes "<";
"=" => lexemes "=";
">" => lexemes ">";
"?" => lexemes "?";
"@" => lexemes "@";
"[" => lexemes "[";
"]" => lexemes "]";
"{" => lexemes "{";
"}" => lexemes "}";
"AND" => lexemes "AND";
"ANY" => lexemes "ANY";
"ASSERT" => lexemes "ASSERT";
"COMMENT" => lexemes "COMMENT";
"CONCAT" => lexemes "CONCAT";
"DOLLAR_CURLY" => lexemes "DOLLAR_CURLY";
"ELLIPSIS" => lexemes "ELLIPSIS";
"ELSE" => lexemes "ELSE";
"EQ" => lexemes "EQ";
"ERROR" => lexemes "ERROR";
"FLOAT" => lexemes "FLOAT";
"GEQ" => lexemes "GEQ";
"HPATH" => lexemes "HPATH";
"ID" => lexemes "ID";
"IF" => lexemes "IF";
"IMPL" => lexemes "IMPL";
"IN" => lexemes "IN";
"IND_STR" => lexemes "IND_STR";
"IND_STRING_CLOSE" => lexemes "IND_STRING_CLOSE";
"IND_STRING_OPEN" => lexemes "IND_STRING_OPEN";
"INHERIT" => lexemes "INHERIT";
"INT" => lexemes "INT";
"LEQ" => lexemes "LEQ";
"LET" => lexemes "LET";
"NEQ" => lexemes "NEQ";
"OR" => lexemes "OR";
"OR_KW" => lexemes "OR_KW";
"PATH" => lexemes "PATH";
"PATH_END" => lexemes "PATH_END";
"REC" => lexemes "REC";
"SKIP" => lexemes "SKIP";
"SPATH" => lexemes "SPATH";
"STR" => lexemes "STR";
"THEN" => lexemes "THEN";
"UPDATE" => lexemes "UPDATE";
"URI" => lexemes "URI";
"WITH" => lexemes "WITH";
"WS" => lexemes "WS";
"NOT" => lexemes "!";
"NEGATE" => lexemes "-";
"-" => lexemes "-";
Associativity::Right => rules "IMPL";
Associativity::Left => rules "OR";
Associativity::Left => rules "AND";
Associativity::None => rules "EQ" "NEQ";
Associativity::None => rules "<" ">" "LEQ" "GEQ";
Associativity::Right => rules "UPDATE";
Associativity::Left => rules "NOT";
Associativity::Left => rules "+" "-";
Associativity::Left => rules "*" "/";
Associativity::Right => rules "CONCAT";
Associativity::None => rules "?";
Associativity::None => rules "NEGATE";
)
}
And parse!
let input = include_str!("../tests/nix/cases/pkg/input");
let lexer_rules = lexer_rules();
let lexemes = santiago::lexer::lex(&lexer_rules, &input).unwrap();
let grammar = grammar();
let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
Which outputs:
---
Γ := rules "expr"
expr := rules "expr_function"
expr_function := rules "{" "formals" "}" ":" "expr_function"
{ := lexemes "{"
{ "{" (1, 1)
formals := rules "formal" "," "formals"
formal := rules "ID"
ID := lexemes "ID"
ID "lib" (2, 3)
, := lexemes ","
, "," (2, 6)
formals := rules "formal" "," "formals"
formal := rules "ID"
ID := lexemes "ID"
ID "rustPlatform" (3, 3)
, := lexemes ","
, "," (3, 15)
formals := rules "formal" "," "formals"
formal := rules "ID"
ID := lexemes "ID"
ID "fetchFromGitHub" (4, 3)
, := lexemes ","
, "," (4, 18)
formals := rules "formal" "," "formals"
formal := rules "ID"
ID := lexemes "ID"
ID "testVersion" (5, 3)
, := lexemes ","
, "," (5, 14)
formals := rules "formal" "," "formals"
formal := rules "ID"
ID := lexemes "ID"
ID "alejandra" (6, 3)
, := lexemes ","
, "," (6, 12)
formals := rules
} := lexemes "}"
} "}" (7, 1)
: := lexemes ":"
: ":" (7, 2)
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_app" "expr_select"
expr_app := rules "expr_select"
expr_select := rules "expr_simple" "." "attrpath"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "rustPlatform" (8, 1)
. := lexemes "."
. "." (8, 13)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "buildRustPackage" (8, 14)
expr_select := rules "expr_simple"
expr_simple := rules "REC" "{" "binds" "}"
REC := lexemes "REC"
REC "rec" (8, 31)
{ := lexemes "{"
{ "{" (8, 35)
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "pname" (9, 3)
= := lexemes "="
= "=" (9, 9)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (9, 11)
string_parts := rules "STR"
STR := lexemes "STR"
STR "alejandra" (9, 12)
" := lexemes "\""
" "\"" (9, 21)
; := lexemes ";"
; ";" (9, 22)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "version" (10, 3)
= := lexemes "="
= "=" (10, 11)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (10, 13)
string_parts := rules "STR"
STR := lexemes "STR"
STR "1.1.0" (10, 14)
" := lexemes "\""
" "\"" (10, 19)
; := lexemes ";"
; ";" (10, 20)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "src" (12, 3)
= := lexemes "="
= "=" (12, 7)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_app" "expr_select"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "fetchFromGitHub" (12, 9)
expr_select := rules "expr_simple"
expr_simple := rules "{" "binds" "}"
{ := lexemes "{"
{ "{" (12, 25)
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "owner" (13, 5)
= := lexemes "="
= "=" (13, 11)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (13, 13)
string_parts := rules "STR"
STR := lexemes "STR"
STR "kamadorueda" (13, 14)
" := lexemes "\""
" "\"" (13, 25)
; := lexemes ";"
; ";" (13, 26)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "repo" (14, 5)
= := lexemes "="
= "=" (14, 10)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (14, 12)
string_parts := rules "STR"
STR := lexemes "STR"
STR "alejandra" (14, 13)
" := lexemes "\""
" "\"" (14, 22)
; := lexemes ";"
; ";" (14, 23)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "rev" (15, 5)
= := lexemes "="
= "=" (15, 9)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "version" (15, 11)
; := lexemes ";"
; ";" (15, 18)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "sha256" (16, 5)
= := lexemes "="
= "=" (16, 12)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (16, 14)
string_parts := rules "STR"
STR := lexemes "STR"
STR "sha256-vkFKYnSmhPPXtc3AH7iRtqRRqxhj0o5WySqPT+klDWU=" (16, 15)
" := lexemes "\""
" "\"" (16, 66)
; := lexemes ";"
; ";" (16, 67)
} := lexemes "}"
} "}" (17, 3)
; := lexemes ";"
; ";" (17, 4)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "cargoSha256" (19, 3)
= := lexemes "="
= "=" (19, 15)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (19, 17)
string_parts := rules "STR"
STR := lexemes "STR"
STR "sha256-MsXaanznE4UtZMj54EDq86aJ2t4xT8O5ziTpa/KCwBw=" (19, 18)
" := lexemes "\""
" "\"" (19, 69)
; := lexemes ";"
; ";" (19, 70)
attrpath := rules "attrpath" "." "attr"
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "passthru" (21, 3)
. := lexemes "."
. "." (21, 11)
attr := rules "ID"
ID := lexemes "ID"
ID "tests" (21, 12)
= := lexemes "="
= "=" (21, 18)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "{" "binds" "}"
{ := lexemes "{"
{ "{" (21, 20)
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "version" (22, 5)
= := lexemes "="
= "=" (22, 13)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_app" "expr_select"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "testVersion" (22, 15)
expr_select := rules "expr_simple"
expr_simple := rules "{" "binds" "}"
{ := lexemes "{"
{ "{" (22, 27)
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "package" (22, 28)
= := lexemes "="
= "=" (22, 36)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "alejandra" (22, 38)
; := lexemes ";"
; ";" (22, 47)
} := lexemes "}"
} "}" (22, 48)
; := lexemes ";"
; ";" (22, 49)
} := lexemes "}"
} "}" (23, 3)
; := lexemes ";"
; ";" (23, 4)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "meta" (25, 3)
= := lexemes "="
= "=" (25, 8)
expr := rules "expr_function"
expr_function := rules "WITH" "expr" ";" "expr_function"
WITH := lexemes "WITH"
WITH "with" (25, 10)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "lib" (25, 15)
; := lexemes ";"
; ";" (25, 18)
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "{" "binds" "}"
{ := lexemes "{"
{ "{" (25, 20)
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules "binds" "attrpath" "=" "expr" ";"
binds := rules
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "description" (26, 5)
= := lexemes "="
= "=" (26, 17)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (26, 19)
string_parts := rules "STR"
STR := lexemes "STR"
STR "The Uncompromising Nix Code Formatter" (26, 20)
" := lexemes "\""
" "\"" (26, 57)
; := lexemes ";"
; ";" (26, 58)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "homepage" (27, 5)
= := lexemes "="
= "=" (27, 14)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (27, 16)
string_parts := rules "STR"
STR := lexemes "STR"
STR "https://github.com/kamadorueda/alejandra" (27, 17)
" := lexemes "\""
" "\"" (27, 57)
; := lexemes ";"
; ";" (27, 58)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "changelog" (28, 5)
= := lexemes "="
= "=" (28, 15)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "\"" "string_parts" "\""
" := lexemes "\""
" "\"" (28, 17)
string_parts := rules "string_parts_interpolated"
string_parts_interpolated := rules "string_parts_interpolated" "STR"
string_parts_interpolated := rules "STR" "DOLLAR_CURLY" "expr" "}"
STR := lexemes "STR"
STR "https://github.com/kamadorueda/alejandra/blob/" (28, 18)
DOLLAR_CURLY := lexemes "DOLLAR_CURLY"
DOLLAR_CURLY "${" (28, 64)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "version" (28, 66)
} := lexemes "}"
} "}" (28, 73)
STR := lexemes "STR"
STR "/CHANGELOG.md" (28, 74)
" := lexemes "\""
" "\"" (28, 87)
; := lexemes ";"
; ";" (28, 88)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "license" (29, 5)
= := lexemes "="
= "=" (29, 13)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple" "." "attrpath"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "licenses" (29, 15)
. := lexemes "."
. "." (29, 23)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "unlicense" (29, 24)
; := lexemes ";"
; ";" (29, 33)
attrpath := rules "attr"
attr := rules "ID"
ID := lexemes "ID"
ID "maintainers" (30, 5)
= := lexemes "="
= "=" (30, 17)
expr := rules "expr_function"
expr_function := rules "WITH" "expr" ";" "expr_function"
WITH := lexemes "WITH"
WITH "with" (30, 19)
expr := rules "expr_function"
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "maintainers" (30, 24)
; := lexemes ";"
; ";" (30, 35)
expr_function := rules "expr_if"
expr_if := rules "expr_op"
expr_op := rules "expr_app"
expr_app := rules "expr_select"
expr_select := rules "expr_simple"
expr_simple := rules "[" "expr_list" "]"
[ := lexemes "["
[ "[" (30, 37)
expr_list := rules "expr_list" "expr_select"
expr_list := rules "expr_list" "expr_select"
expr_list := rules
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "_0x4A6F" (30, 38)
expr_select := rules "expr_simple"
expr_simple := rules "ID"
ID := lexemes "ID"
ID "kamadorueda" (30, 46)
] := lexemes "]"
] "]" (30, 57)
; := lexemes ";"
; ";" (30, 58)
} := lexemes "}"
} "}" (31, 3)
; := lexemes ";"
; ";" (31, 4)
} := lexemes "}"
} "}" (32, 1)
Next steps
This tutorial ends here, you should now have everything to lex and parse the world, and build your own programming languages, compilers and interpreters!
You can checkout more examples in the tests folder:
We hope you find Santiago useful!
And don’t forget to give us a star ⭐
Cheers ❤️
Modules
Create grammars that are validated for correctness automatically.
Transform an input of characters into groups of characters with related meaning.
Build a data structure representing the input.
Macros
Create reusable definitions.
Declarative utility for creating LexerRules.