legacy-basic-tree-sitter
Tree-sitter grammars for 80s BASIC dialects: Microsoft BASIC 2.0, GW-BASIC, and QBasic.
Built for code analysis of legacy BASIC programs — migration, transpilation, documentation, and static analysis.
Grammars
| Crate | Dialect | Line Numbers | Key Features |
|---|---|---|---|
tree-sitter-msbasic2 |
Microsoft BASIC 2.0 | Required | Core statements, GOTO/GOSUB, FOR/NEXT, DEF FN |
tree-sitter-gwbasic |
GW-BASIC | Required | File I/O, graphics, sound, ON ERROR, WHILE/WEND, event trapping |
tree-sitter-qbasic |
QBasic | Optional | SUB/FUNCTION, block IF, SELECT CASE, DO/LOOP, TYPE, labels |
Each dialect is a strict superset of the previous: MS BASIC 2.0 < GW-BASIC < QBasic.
Installation
Rust (crates.io)
# Pick the dialect(s) you need:
[]
= "0.1"
= "0.1"
= "0.1"
Usage
use LANGUAGE;
let mut parser = new;
parser.set_language.expect;
let source = r#"
SUB Hello (name$)
PRINT "Hello, "; name$
END SUB
CALL Hello("World")
"#;
let tree = parser.parse.unwrap;
let root = tree.root_node;
assert_eq!;
assert!;
Node.js
The package.json at the repo root registers all three grammars for the tree-sitter CLI.
Building from Source
Prerequisites
- Rust (1.70+)
- Node.js (18+)
- tree-sitter CLI:
npm install -g tree-sitter-cli
Commands
# Single dialect
&& &&
Repository Structure
legacy-basic-tree-sitter/
grammars/
common/common.js # Shared grammar rules (expressions, core statements)
msbasic2/grammar.js # MS BASIC 2.0 grammar
gwbasic/grammar.js # GW-BASIC grammar
qbasic/grammar.js # QBasic grammar
crates/
tree-sitter-msbasic2/ # Rust crate for MS BASIC 2.0
tree-sitter-gwbasic/ # Rust crate for GW-BASIC
tree-sitter-qbasic/ # Rust crate for QBasic
examples/ # Sample programs per dialect
Cargo.toml # Workspace root
Makefile # Build automation
Each grammar directory contains:
grammar.js— Grammar definition (imports shared rules fromcommon.js)tree-sitter.json— Tree-sitter CLI configurationqueries/highlights.scm— Syntax highlighting queriestest/corpus/*.txt— Test cases in tree-sitter's native formatsrc/— Generated C parser (produced bytree-sitter generate)
Supported Language Features
All Dialects (Common Core)
- Expressions: Full operator precedence (14 levels) — arithmetic, string concatenation, relational, logical (
AND,OR,NOT,XOR,EQV,IMP) - Literals: Integers, floats (with
!/#/Dsuffixes), hex (&H), octal (&O), strings - Variables: Type sigils (
$,%,!,#), array subscripts - Statements:
PRINT/?,LET,INPUT,READ/DATA/RESTORE,GOTO,GOSUB/RETURN,ON...GOTO/GOSUB,FOR/NEXT,IF/THEN/ELSE,DIM,DEF FN,POKE,REM,END,STOP - Functions: 30+ built-in functions (
LEFT$,MID$,INT,RND,CHR$,ASC,LEN,VAL,ABS,SQR, trig functions, etc.), user-definedFNfunctions - Structure: Multi-statement lines with
:separator, case-insensitive keywords
GW-BASIC Additions
- File I/O:
OPEN,CLOSE,GET,PUT,PRINT#,INPUT#,LINE INPUT#,WRITE#,FIELD,LSET,RSET - Graphics:
SCREEN,LINE,CIRCLE,DRAW,PAINT,PSET/PRESET,COLOR,PALETTE,VIEW,WINDOW - Sound:
PLAY,SOUND,BEEP - Screen:
KEY,LOCATE,CLS,WIDTH - Memory:
DEF SEG,BLOAD,BSAVE - Error Handling:
ON ERROR GOTO,RESUME - Control Flow:
WHILE/WEND,SWAP,LINE INPUT - Event Trapping:
ON TIMER GOSUB,ON KEY GOSUB - Comments:
'(apostrophe) asREMalias
QBasic Additions
- Structured Control Flow: Block
IF/ELSEIF/ELSE/END IF,SELECT CASE,DO/LOOP(withWHILE/UNTIL),EXIT FOR/DO/SUB/FUNCTION - Procedures:
SUB/END SUB,FUNCTION/END FUNCTION,DECLARE,CALL,STATICmodifier - User-Defined Types:
TYPE/END TYPE, member access with. - Scoping:
CONST,SHARED,STATIC,DIM SHARED,DIM x AS type - Labels: Named labels (
MyLabel:) asGOTO/GOSUBtargets (line numbers optional)
Error Recovery
All three grammars use tree-sitter's built-in error recovery. Unrecognized tokens produce ERROR nodes while surrounding valid code still parses into a proper AST. This makes the grammars suitable for parsing incomplete, hand-typed, or OCR-scanned programs.
Publishing to crates.io
License
MIT