java-lang
A Java 25 AST parser written in Rust, with a syn-style API. Parses Java source code into a typed Abstract Syntax Tree following the Java Language Specification (JLS) SE 25.
Features
- Syn-style API —
Parsetrait andParseStreammodeled after Rust'ssyncrate - Complete Java 25 coverage:
- Records, sealed classes, pattern matching (type + record patterns)
- Switch expressions, text blocks
- Lambda expressions, method references
- Try-with-resources,
vartype inference - Module system (JPMS)
- Annotations with all element value forms
- Generics with wildcards and bounds
- Unnamed variables
- Comment parsing — doc comments (
/** */,///) attached to declarations, regular comments (//,/* */) attached to statements, all comments available onCompilationUnit - Span tracking — every AST node carries byte-offset span information
- Zero-copy identifiers —
Identwith efficient string-only equality - Minimal dependencies — only
thiserrorandunicode-xid
Quick Start
Add to your Cargo.toml:
[]
= "0.1.0"
Parse a Java source file:
use CompilationUnit;
use parse_str;
let unit: CompilationUnit = parse_str.unwrap;
if let Some = &unit.package
for item in &unit.type_decls
Comments
Doc comments (/** ... */ and /// ...) are attached to the declaration they precede via a doc_comment: Vec<Comment> field. Regular comments (// ... and /* ... */) are attached to the statement they precede via leading_comments: Vec<Comment>. All comments (both doc and regular) are also collected in CompilationUnit.comments for full-source querying.
use CompilationUnit;
use parse_str;
let unit: CompilationUnit = parse_str.unwrap;
// Doc comment on the class
if let Some = unit.type_decls.first
// All comments in the file
for comment in &unit.comments
AST Traversal
Walk the parsed tree by matching on AST node variants:
use ;
for type_decl in &unit.type_decls
API Overview
Entry Points
| Function | Description |
|---|---|
parse_str<T: Parse>(s: &str) -> Result<T> |
Parse a string, error on trailing tokens |
parse<T: Parse>(s: &str) -> Result<T> |
Parse a string, ignore trailing tokens |
parse_file<T: Parse>(path: &Path) -> Result<T> |
Parse a Java file from disk |
Key Types
| Type | Description |
|---|---|
CompilationUnit |
Top-level AST node for a Java source file |
ParseStream |
Token cursor with combinators (parse_ident, parse_braced, etc.) |
Parse |
Trait — implement fn parse(input: &ParseStream) -> Result<Self> |
Ident |
Identifier with span; equality compares only the name string |
Span |
(start, end) byte offsets carried by every AST node |
Error |
Parse error with message and span |
AST Module Structure
| Module | Key Types |
|---|---|
ast |
Comment, CommentKind |
ast::compilation_unit |
CompilationUnit, PackageDecl, ImportDecl |
ast::item |
TypeDecl, ClassDecl, InterfaceDecl, EnumDecl, RecordDecl, MethodDecl, ConstructorDecl, Modifier |
ast::expr |
Expr (20 variants), MethodCallExpr, LambdaExpr, SwitchExpr |
ast::stmt |
Stmt (19 variants), Block, IfStmt, ForStmt, TryStmt, SwitchStmt |
ast::ty |
Type, PrimitiveType, ReferenceType, ArrayType |
ast::path |
Path, PathSegment, TypeArguments |
ast::lit |
Lit — Int, Float, Bool, Char, Str, Null |
ast::op |
BinOp, AssignOp, UnaryOp |
ast::pat |
Pattern, TypePattern, RecordPattern, Guard |
ast::attribute |
Annotation, ElementValuePair, ElementValue |
ast::generics |
TypeParameters, TypeParameter, TypeBound |
Architecture
The crate has three layers:
Source text → Lexer → Vec<Token> → ParseStream → AST nodes
(lexer.rs) (parse.rs) (ast/*)
- Lexing —
lexer.rstokenizes Java source intoVec<Token>. Each token holds aTokenKindand aSpan. Comment tokens (LineComment,BlockComment,DocLineComment,DocBlockComment) are emitted into the token stream. - Parsing infrastructure —
parse.rsdefines theParsetrait andParseStream(interior-mutable token cursor). Supports speculative parsing viatry_parsewith save/restore state.ParseStreamtransparently skips comment tokens inpeek()/advance(), buffering them for later collection viacollect_pending_doc_comments()andcollect_pending_comments(). - AST types —
src/ast/contains all typed AST nodes. The parser (parser.rs, ~4000 lines) is a recursive descent parser with a Pratt parser for expressions (10 precedence levels).
Design Decisions
- Numeric literals are stored as raw strings (preserving hex, binary, octal, underscores, suffixes) to avoid precision loss.
>>/>>>splitting —ParseStream.pending_gtscounter handles nested generics likeList<Map<String, Integer>>.- Non-sealed modifier — lexed as three tokens (
Ident("non"),Minus,Sealed), handled by the modifier parser with lookahead. - Contextual keywords (
record,sealed,var,yield,permits,non-sealed,when) can be used as identifiers in most contexts.
Testing
Run all tests:
Run a single test:
Run the example:
Android SDK integration test (requires ANDROID_HOME env var):
License
Licensed under either of Apache License, Version 2.0 or MIT License at your option.