poshtree
Lossless PowerShell parsing for Rust. Tokenize or parse a script, walk or rewrite the result, and get the exact source back.
poshtree keeps every byte of the input. The lexer attaches whitespace,
newlines, and comments to the tokens as trivia, so reconstructing the token
stream returns the original source byte-for-byte, malformed input included.
A native recursive-descent parser sits on top and builds a tree whose every
node carries a byte span and a token range. Broken input becomes error nodes
instead of a failed parse, so there is always a tree to work with. That
combination makes it a practical base for formatters, linters, codemods, and
editor tooling. It has no dependencies.
Install
[]
= "0.1"
Or point at a local checkout:
[]
= { = "../poshtree" }
Items live under the v2 module and are used path-qualified; nothing is
re-exported at the crate root.
Lossless tokens
Whitespace, newlines, and comments ride along as trivia on the tokens, and
reconstruct glues them back into the original source.
use ;
let src = "get-wmiobject Win32_BIOS # keep this comment\n";
let out = lex;
assert_eq!; // byte-for-byte
// Minimal-diff rewriting: patch one token, leave the rest alone.
let edits: = out.tokens.iter
.filter
.map
.collect;
let fixed = apply_edits.unwrap;
assert_eq!;
Every token and trivia carries a byte Span, and a LineIndex maps an offset
to line and column. --% is handled in the lexer: the rest of the line
becomes one raw VerbatimArgs token. A few constructs lex more cohesively
than you might expect, with a path like C:\tmp or a dotted run like a.b.c
staying a single token; the module docs spell those out.
Parse and walk
parse returns the script tree plus any recoverable errors. Each node carries
a byte Span and a TokenRange, so a node can be sliced straight back to its
source.
use ;
let out = parse;
assert!;
out.script.walk;
The grammar covers pipelines and &&/|| chains, commands with
parameter-argument binding and redirections, every control-flow statement,
function/filter/workflow, class, enum, using,
trap/data/dynamicparam, the full expression layer, double-quoted string
interpolation parts, and Add-Type C# extraction (it pulls [DllImport]
signatures out of the inline C#, following a string through a variable
assignment when it has to). It runs against a broad corpus and is fuzzed, so
adversarial input recovers into error nodes rather than panicking.
Formatting
format_source is a width-aware formatter built on the lossless tokens.
let pretty = format_source?;
// "if ($x) {\n ls\n}\n"
It normalizes indentation, spacing, blank lines, backtick continuations, and
over-long lines, breaking them at pipes, chain operators, commas, and
brackets. Comments, here-strings, --% arguments, and token adjacency stay
byte-for-byte. It refuses input that has syntax errors, and before returning
it re-lexes and re-parses its own output to confirm the program is unchanged.
If that check fails you get an error instead of altered code.
Example: pascalize
The pascalize example is a small codemod. It parses with parse, finds
command names, and rewrites each to PascalCase through apply_edits, touching
only the name tokens and leaving comments, strings, arguments, and layout
intact.
$ cargo run --example pascalize # built-in demo
$ cargo run --example pascalize -- file.ps1
$ cat file.ps1 | cargo run --example pascalize -- -
Versioning
Breaking changes to the token or tree types ship as a new sibling version
module rather than mutating what is already published, so pinned code keeps
compiling. The current module is v2.
License
MIT. See LICENSE.