tree-sitter-qbasic 0.2.0

Tree-sitter grammar for QBasic
Documentation

legacy-basic-tree-sitter

Tree-sitter grammars for 80s BASIC dialects: Microsoft BASIC 2.0, GW-BASIC, and QBasic.

Built for code analysis of legacy BASIC programs — migration, transpilation, documentation, and static analysis.

Grammars

Crate Dialect Line Numbers Key Features
tree-sitter-msbasic2 Microsoft BASIC 2.0 Required Core statements, GOTO/GOSUB, FOR/NEXT, DEF FN
tree-sitter-gwbasic GW-BASIC Required File I/O, graphics, sound, ON ERROR, WHILE/WEND, event trapping
tree-sitter-qbasic QBasic Optional SUB/FUNCTION, block IF, SELECT CASE, DO/LOOP, TYPE, labels

Each dialect is a strict superset of the previous: MS BASIC 2.0 < GW-BASIC < QBasic.

Installation

Rust (crates.io)

# Pick the dialect(s) you need:
[dependencies]
tree-sitter-msbasic2 = "0.1"
tree-sitter-gwbasic = "0.1"
tree-sitter-qbasic = "0.1"

Usage

use tree_sitter_qbasic::LANGUAGE;

let mut parser = tree_sitter::Parser::new();
parser.set_language(&LANGUAGE.into()).expect("Error loading QBasic grammar");

let source = r#"
SUB Hello (name$)
  PRINT "Hello, "; name$
END SUB

CALL Hello("World")
"#;

let tree = parser.parse(source, None).unwrap();
let root = tree.root_node();
assert_eq!(root.kind(), "program");
assert!(!root.has_error());

Node.js

npm install tree-sitter-cli

The package.json at the repo root registers all three grammars for the tree-sitter CLI.

Building from Source

Prerequisites

Commands

make all               # Generate parsers + run all tests
make generate          # Generate C parsers from grammar.js
make test-grammar      # Run tree-sitter test corpus
make build             # Generate + cargo build
make test              # Full test suite (tree-sitter + cargo)
make clean             # Remove generated files and build artifacts

# Single dialect
cd grammars/qbasic && tree-sitter generate && tree-sitter test
cargo test -p tree-sitter-qbasic

Repository Structure

legacy-basic-tree-sitter/
  grammars/
    common/common.js            # Shared grammar rules (expressions, core statements)
    msbasic2/grammar.js         # MS BASIC 2.0 grammar
    gwbasic/grammar.js          # GW-BASIC grammar
    qbasic/grammar.js           # QBasic grammar
  crates/
    tree-sitter-msbasic2/       # Rust crate for MS BASIC 2.0
    tree-sitter-gwbasic/        # Rust crate for GW-BASIC
    tree-sitter-qbasic/         # Rust crate for QBasic
  examples/                     # Sample programs per dialect
  Cargo.toml                    # Workspace root
  Makefile                      # Build automation

Each grammar directory contains:

  • grammar.js — Grammar definition (imports shared rules from common.js)
  • tree-sitter.json — Tree-sitter CLI configuration
  • queries/highlights.scm — Syntax highlighting queries
  • test/corpus/*.txt — Test cases in tree-sitter's native format
  • src/ — Generated C parser (produced by tree-sitter generate)

Supported Language Features

All Dialects (Common Core)

  • Expressions: Full operator precedence (14 levels) — arithmetic, string concatenation, relational, logical (AND, OR, NOT, XOR, EQV, IMP)
  • Literals: Integers, floats (with !/#/D suffixes), hex (&H), octal (&O), strings
  • Variables: Type sigils ($, %, !, #), array subscripts
  • Statements: PRINT/?, LET, INPUT, READ/DATA/RESTORE, GOTO, GOSUB/RETURN, ON...GOTO/GOSUB, FOR/NEXT, IF/THEN/ELSE, DIM, DEF FN, POKE, REM, END, STOP
  • Functions: 30+ built-in functions (LEFT$, MID$, INT, RND, CHR$, ASC, LEN, VAL, ABS, SQR, trig functions, etc.), user-defined FN functions
  • Structure: Multi-statement lines with : separator, case-insensitive keywords

GW-BASIC Additions

  • File I/O: OPEN, CLOSE, GET, PUT, PRINT#, INPUT#, LINE INPUT#, WRITE#, FIELD, LSET, RSET
  • Graphics: SCREEN, LINE, CIRCLE, DRAW, PAINT, PSET/PRESET, COLOR, PALETTE, VIEW, WINDOW
  • Sound: PLAY, SOUND, BEEP
  • Screen: KEY, LOCATE, CLS, WIDTH
  • Memory: DEF SEG, BLOAD, BSAVE
  • Error Handling: ON ERROR GOTO, RESUME
  • Control Flow: WHILE/WEND, SWAP, LINE INPUT
  • Event Trapping: ON TIMER GOSUB, ON KEY GOSUB
  • Comments: ' (apostrophe) as REM alias

QBasic Additions

  • Structured Control Flow: Block IF/ELSEIF/ELSE/END IF, SELECT CASE, DO/LOOP (with WHILE/UNTIL), EXIT FOR/DO/SUB/FUNCTION
  • Procedures: SUB/END SUB, FUNCTION/END FUNCTION, DECLARE, CALL, STATIC modifier
  • User-Defined Types: TYPE/END TYPE, member access with .
  • Scoping: CONST, SHARED, STATIC, DIM SHARED, DIM x AS type
  • Labels: Named labels (MyLabel:) as GOTO/GOSUB targets (line numbers optional)

Error Recovery

All three grammars use tree-sitter's built-in error recovery. Unrecognized tokens produce ERROR nodes while surrounding valid code still parses into a proper AST. This makes the grammars suitable for parsing incomplete, hand-typed, or OCR-scanned programs.

Publishing to crates.io

make publish-prep    # Copies generated C sources into each crate
cargo publish -p tree-sitter-msbasic2 --dry-run
cargo publish -p tree-sitter-gwbasic --dry-run
cargo publish -p tree-sitter-qbasic --dry-run

License

MIT