cljrs-reader

Lexer (tokenizer) and recursive-descent parser for the clojurust language. Turns raw source text into a Form AST that the evaluator and compiler consume.

Phase: 2 — lexer and parser fully implemented.

File layout

src/
  lib.rs      — module declarations and re-exports
  token.rs    — Token enum: one variant per Clojure lexical form
  lexer.rs    — Lexer struct: byte-oriented, UTF-8-safe tokenizer
  form.rs     — Form struct + FormKind enum: the reader AST
  parser.rs   — Parser struct: recursive-descent parser + Iterator impl

Public API

`token::Token`

Every distinct lexical form the reader can produce:

Variant	Clojure source	Notes
`Nil`	`nil`
`Bool(bool)`	`true` / `false`
`Int(i64)`	`42`, `-7`, `16rFF`, `2r1010`	decimal or radix literal that fits `i64`
`BigInt(String)`	`42N`, overflowing radix	decimal digits; sign included when negative
`Float(f64)`	`3.14`, `1e10`, `1.5e-3`
`BigDecimal(String)`	`3.14M`	raw text without trailing `M`
`Ratio(String)`	`3/4`, `-1/2`	full text including `/`
`Char(char)`	`\a`, `\newline`, `\u0041`	named chars and `\uXXXX` resolved
`Str(String)`	`"hello\n"`	escape sequences fully processed
`Symbol(String)`	`foo`, `ns/name`, `/`, `..`
`Keyword(String)`	`:foo`, `:ns/name`	leading `:` stripped
`AutoKeyword(String)`	`::foo`, `::ns/alias`	leading `::` stripped
`LParen` / `RParen`	`(` / `)`
`LBracket` / `RBracket`	`[` / `]`
`LBrace` / `RBrace`	`{` / `}`
`Quote`	`'`
`SyntaxQuote`	`
`Unquote`	`~`
`UnquoteSplice`	`~@`
`Deref`	`@`
`Meta`	`^`
`HashFn`	`#(`
`HashSet`	`#{`
`HashVar`	`#'`
`HashDiscard`	`#_`
`Regex(String)`	`#"[a-z]+"`	raw pattern; no escape processing
`ReaderCond`	`#?`
`ReaderCondSplice`	`#?@`
`Symbolic(String)`	`##Inf`, `##NaN`	stores suffix after `##`
`TaggedLiteral(String)`	`#inst`, `#uuid`	stores tag name without `#`
`Eof`	—	end-of-file sentinel

`lexer::Lexer`

A byte-oriented, UTF-8-safe tokenizer. Tracks byte position, 1-based line, and 1-based byte column so every token carries a precise Span.

pub struct Lexer { /* private */ }

impl Lexer {
    /// Create a new lexer for `source` text from `file` (path or `"<repl>"`).
    pub fn new(source: String, file: String) -> Self

    /// Return the next `(Token, Span)` pair.
    /// Returns `Ok((Token::Eof, _))` at end of input.
    /// Returns `Err(CljxError::ReadError { … })` on invalid input.
    pub fn next_token(&mut self) -> CljxResult<(Token, Span)>

    pub fn source(&self) -> &Arc<String>
    pub fn file(&self) -> &Arc<String>
}

impl Iterator for Lexer {
    type Item = CljxResult<(Token, Span)>;
    // Yields None when next_token returns Token::Eof.
}

Whitespace and comment handling

ASCII spaces, tabs, carriage returns, newlines, and commas are skipped.
; through end-of-line is a line comment.
#! at the very start of the file (byte offset 0) is a shebang; the rest of that line is skipped.

Number parsing rules

+/- are only routed to the number path when immediately followed by an ASCII digit; otherwise they lex as symbols.
3/foo lexes as Int(3) then Symbol("/foo"), not a ratio — the / is only consumed as part of a ratio when the character immediately after it is a digit.
Radix literals: NNrDIGITS where NN is 2–36. Overflow of i64 yields BigInt.

`form::Form` / `form::FormKind`

The reader AST. Every Form carries a Span for diagnostics.

PartialEq on Form ignores spans — equality tests compare only FormKind.

#[derive(Debug, Clone)]
pub struct Form {
    pub kind: FormKind,
    pub span: Span,
}

impl Form {
    pub fn new(kind: FormKind, span: Span) -> Self
}

#[derive(Debug, Clone, PartialEq)]
pub enum FormKind {
    // Atoms
    Nil,
    Bool(bool),
    Int(i64),
    BigInt(String),
    Float(f64),
    BigDecimal(String),
    Ratio(String),
    Char(char),
    Str(String),
    Regex(String),
    Symbolic(f64),        // ##Inf→INFINITY  ##-Inf→NEG_INFINITY  ##NaN→NAN

    // Identifiers
    Symbol(String),
    Keyword(String),
    AutoKeyword(String),

    // Collections
    List(Vec<Form>),
    Vector(Vec<Form>),
    Map(Vec<Form>),       // flat key/value pairs; length always even
    Set(Vec<Form>),

    // Wrapping reader macros
    Quote(Box<Form>),
    SyntaxQuote(Box<Form>),
    Unquote(Box<Form>),
    UnquoteSplice(Box<Form>),
    Deref(Box<Form>),
    Var(Box<Form>),                      // #'symbol
    Meta(Box<Form>, Box<Form>),          // raw meta-form, annotated-form

    // Dispatch forms
    AnonFn(Vec<Form>),                   // #(...)
    TaggedLiteral(String, Box<Form>),    // #tag form

    // Reader conditionals — all branches kept; evaluator filters by :rust
    // clauses is flat: [keyword, form, keyword, form, …]
    ReaderCond { splicing: bool, clauses: Vec<Form> },
}

`parser::Parser`

A recursive-descent parser that consumes (Token, Span) pairs from a Lexer and produces Form nodes.

pub struct Parser { /* private */ }

impl Parser {
    /// Create a parser for `source` text labelled with `file`.
    pub fn new(source: String, file: String) -> Self

    /// Return the next form (skipping `#_` discards).
    /// Returns `Ok(None)` at EOF.
    pub fn parse_one(&mut self) -> CljxResult<Option<Form>>

    /// Parse all forms until EOF.
    pub fn parse_all(&mut self) -> CljxResult<Vec<Form>>
}

impl Iterator for Parser {
    type Item = CljxResult<Form>;
    // Yields None at EOF, Err on parse error.
}

`#_` discard semantics

#_ consumes itself plus the next form and produces nothing. Discards can be chained: [#_ #_ 1 2 3] → [2, 3] (outer #_ discards the #_ 1 group, leaving 2 and 3).

Reader conditionals

All branches of #?(…) and #?@(…) are parsed and stored as FormKind::ReaderCond { splicing, clauses } with a flat clauses vec. The evaluator is responsible for filtering by :rust.

Error construction

On any read or parse error the crate produces a CljxError::ReadError containing the offending Span and the full source text, which miette uses to render a pointed diagnostic in the terminal.

Re-exports from `lib.rs`

pub use form::{Form, FormKind};
pub use lexer::Lexer;
pub use parser::Parser;
pub use token::Token;

Dependencies

Crate	Role
`cljrs-types` (workspace)	`Span`, `CljxError`, `CljxResult`
`miette` (workspace)	`NamedSource` used in error construction

cljrs-reader 0.1.43