cljrs-reader
Lexer (tokenizer) and recursive-descent parser for the clojurust language.
Turns raw source text into a Form AST that the evaluator and compiler consume.
Phase: 2 — lexer and parser fully implemented.
File layout
src/
lib.rs — module declarations and re-exports
token.rs — Token enum: one variant per Clojure lexical form
lexer.rs — Lexer struct: byte-oriented, UTF-8-safe tokenizer
form.rs — Form struct + FormKind enum: the reader AST
parser.rs — Parser struct: recursive-descent parser + Iterator impl
Public API
token::Token
Every distinct lexical form the reader can produce:
| Variant | Clojure source | Notes |
|---|---|---|
Nil |
nil |
|
Bool(bool) |
true / false |
|
Int(i64) |
42, -7, 16rFF, 2r1010 |
decimal or radix literal that fits i64 |
BigInt(String) |
42N, overflowing radix |
decimal digits; sign included when negative |
Float(f64) |
3.14, 1e10, 1.5e-3 |
|
BigDecimal(String) |
3.14M |
raw text without trailing M |
Ratio(String) |
3/4, -1/2 |
full text including / |
Char(char) |
\a, \newline, \u0041 |
named chars and \uXXXX resolved |
Str(String) |
"hello\n" |
escape sequences fully processed |
Symbol(String) |
foo, ns/name, /, .. |
|
Keyword(String) |
:foo, :ns/name |
leading : stripped |
AutoKeyword(String) |
::foo, ::ns/alias |
leading :: stripped |
LParen / RParen |
( / ) |
|
LBracket / RBracket |
[ / ] |
|
LBrace / RBrace |
{ / } |
|
Quote |
' |
|
SyntaxQuote |
` |
|
Unquote |
~ |
|
UnquoteSplice |
~@ |
|
Deref |
@ |
|
Meta |
^ |
|
HashFn |
#( |
|
HashSet |
#{ |
|
HashVar |
#' |
|
HashDiscard |
#_ |
|
Regex(String) |
#"[a-z]+" |
raw pattern; no escape processing |
ReaderCond |
#? |
|
ReaderCondSplice |
#?@ |
|
Symbolic(String) |
##Inf, ##NaN |
stores suffix after ## |
TaggedLiteral(String) |
#inst, #uuid |
stores tag name without # |
Eof |
— | end-of-file sentinel |
lexer::Lexer
A byte-oriented, UTF-8-safe tokenizer. Tracks byte position, 1-based line, and
1-based byte column so every token carries a precise Span.
Whitespace and comment handling
- ASCII spaces, tabs, carriage returns, newlines, and commas are skipped.
;through end-of-line is a line comment.#!at the very start of the file (byte offset 0) is a shebang; the rest of that line is skipped.
Number parsing rules
+/-are only routed to the number path when immediately followed by an ASCII digit; otherwise they lex as symbols.3/foolexes asInt(3)thenSymbol("/foo"), not a ratio — the/is only consumed as part of a ratio when the character immediately after it is a digit.- Radix literals:
NNrDIGITSwhereNNis 2–36. Overflow ofi64yieldsBigInt.
form::Form / form::FormKind
The reader AST. Every Form carries a Span for diagnostics.
PartialEq on Form ignores spans — equality tests compare only FormKind.
parser::Parser
A recursive-descent parser that consumes (Token, Span) pairs from a Lexer
and produces Form nodes.
#_ discard semantics
#_ consumes itself plus the next form and produces nothing. Discards can be
chained: [#_ #_ 1 2 3] → [2, 3] (outer #_ discards the #_ 1 group,
leaving 2 and 3).
Reader conditionals
All branches of #?(…) and #?@(…) are parsed and stored as
FormKind::ReaderCond { splicing, clauses } with a flat clauses vec. The
evaluator is responsible for filtering by :rust.
Error construction
On any read or parse error the crate produces a CljxError::ReadError
containing the offending Span and the full source text, which miette uses to
render a pointed diagnostic in the terminal.
Re-exports from lib.rs
pub use ;
pub use Lexer;
pub use Parser;
pub use Token;
Dependencies
| Crate | Role |
|---|---|
cljrs-types (workspace) |
Span, CljxError, CljxResult |
miette (workspace) |
NamedSource used in error construction |