poshtree
Lossless PowerShell parsing for Rust. Tokenize or parse a script, walk or rewrite the result, and get the exact source back.
poshtree keeps every byte of the input. The lexer attaches whitespace,
newlines, and comments to the tokens as trivia, so reconstructing the token
stream returns the original source byte-for-byte, malformed input included.
A native recursive-descent parser sits on top and builds a tree whose every
node carries a byte span and a token range. Broken input becomes error nodes
instead of a failed parse, so there is always a tree to work with. That
combination makes it a practical base for formatters, linters, codemods, and
editor tooling. It has no dependencies.
Install
[]
= "0.2.2"
Or point at a local checkout:
[]
= { = "../poshtree" }
Items live under the v2 module and are used path-qualified; nothing is
re-exported at the crate root.
Lossless tokens
Whitespace, newlines, and comments ride along as trivia on the tokens, and
reconstruct glues them back into the original source.
use ;
let src = "get-wmiobject Win32_BIOS # keep this comment\n";
let out = lex;
assert_eq!; // byte-for-byte
// Minimal-diff rewriting: patch one token, leave the rest alone.
let edits: = out.tokens.iter
.filter
.map
.collect;
let fixed = apply_edits.unwrap;
assert_eq!;
Every token and trivia carries a byte Span, and a LineIndex maps an offset
to line and column. --% is handled in the lexer: the rest of the line
becomes one raw VerbatimArgs token. A few constructs lex more cohesively
than you might expect, with a path like C:\tmp or a dotted run like a.b.c
staying a single token; the module docs spell those out.
Parse and walk
parse returns the script tree plus any recoverable errors. Each node carries
a byte Span and a TokenRange, so a node can be sliced straight back to its
source.
use ;
let out = parse;
assert!;
out.script.walk;
The grammar covers pipelines and &&/|| chains, commands with
parameter-argument binding and redirections, every control-flow statement,
function/filter/workflow, class, enum, using,
trap/data/dynamicparam, the full expression layer, double-quoted string
interpolation parts, and Add-Type C# extraction (it pulls [DllImport]
signatures out of the inline C#, following a string through a variable
assignment when it has to). It runs against a broad corpus and is fuzzed, so
adversarial input recovers into error nodes rather than panicking.
C# in Add-Type
Add-Type embeds C# inside a PowerShell string, and a type or method defined
there is used back in PowerShell as ordinary syntax: [Win32],
[Win32]::Beep(...), New-Object Win32. The optional csharp feature parses
that C# into its own lossless tree, resolves it (scopes, shadowing, and
references), and connects the two languages, so a rename moves both sides at
once.
[]
= { = "0.2.2", = ["csharp"] }
use ;
use rename_type;
let src = "Add-Type -TypeDefinition @'\npublic class Win32 { }\n'@\n[Win32]::Beep(800, 200)\n$h = New-Object Win32\n";
let out = parse;
// Renames the C# declaration and every PowerShell use in one pass.
let edits = rename_type;
let fixed = apply_edits.unwrap;
// class NativeMethods ... [NativeMethods]::Beep(800, 200) ... New-Object NativeMethods
rename_member does the same for a member and its [Type]::Member call sites,
and rename_csharp_field, _method, _local, and _parameter rename within
the C# alone. Resolution is single-file and case-correct, since C# is
case-sensitive and PowerShell is not. A member access is renamed only when its
receiver can mean that member: this.Length, or a static Type.Length, but
never an unrelated other.Length whose type is unknown. With the feature on,
the [DllImport] extraction above also reads from this parse rather than the
fallback scanner. It adds no dependencies.
Formatting
format_source is a width-aware formatter built on the lossless tokens.
let pretty = format_source?;
// "if ($x) {\n ls\n}\n"
It normalizes indentation, spacing, blank lines, backtick continuations, and
over-long lines, breaking them at pipes, chain operators, commas, and
brackets. Comments, here-strings, --% arguments, and token adjacency stay
byte-for-byte. It refuses input that has syntax errors, and before returning
it re-lexes and re-parses its own output to confirm the program is unchanged.
If that check fails you get an error instead of altered code.
Examples
The examples/ directory has three runnable programs.
pascalize is a small codemod on the v2 layer. It parses with parse, finds
command names, and rewrites each to PascalCase through apply_edits, touching
only the name tokens and leaving comments, strings, arguments, and layout
intact.
$ cargo run --example pascalize # built-in demo
$ cargo run --example pascalize -- file.ps1
$ cat file.ps1 | cargo run --example pascalize -- -
pinvoke-report uses the csharp feature to read the C# in each Add-Type
block and print its [DllImport] signatures and declared types, methods, and
fields. It only reads the script.
$ cargo run --features csharp --example pinvoke-report # built-in demo
$ cargo run --features csharp --example pinvoke-report -- file.ps1
rename-native uses the csharp feature to rename a C# type or member across
both the Add-Type block and its PowerShell call sites in one pass.
$ cargo run --features csharp --example rename-native # built-in demo
$ cat file.ps1 | cargo run --features csharp --example rename-native -- type Win32 NativeApi
$ cat file.ps1 | cargo run --features csharp --example rename-native -- member Win32 MessageBox ShowMessage
Versioning
Breaking changes to the token or tree types ship as a new sibling version
module rather than mutating what is already published, so pinned code keeps
compiling. The current module is v2.
License
MIT. See LICENSE.