mdbook-treesitter
An mdBook preprocessor that uses tree-sitter to extract code snippets directly from source files and embed them in your book.
Directives in your Markdown are replaced with the text extracted by the named query — wrap them in a fenced code block to get syntax highlighting:
```rust
{{ #treesitter src/lib.rs#doc_comment?name=MyStruct }}
```
Installation
cargo
Nix flakes
Add this repository as a flake input and include the package in your dev shell:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
mdbook-ts.url = "https://codeberg.org/landre/mdbook-treesitter/archive/v0.2.0.tar.gz";
};
outputs = { nixpkgs, mdbook-ts, ... }: let
system = "x86_64-linux";
pkgs = nixpkgs.legacyPackages.${system};
in {
devShells.${system}.default = pkgs.mkShell {
packages = [ pkgs.mdbook mdbook-ts.packages.${system}.default ];
};
};
}
Alternatively, use the provided overlay to make pkgs.mdbook-treesitter available:
nixpkgs.overlays = [ mdbook-ts.overlays.default ];
Then add the preprocessor to your book.toml:
[]
Development
To iterate on queries without reinstalling, point command at cargo run:
[]
= "cargo run --manifest-path /path/to/mdbook-treesitter/Cargo.toml --"
Language support
Rust, TOML, and Markdown parsers are bundled out of the box.
Additional parsers can be loaded from a shared library:
[]
= "/path/to/tree-sitter-python.so" # absolute or relative to book.toml
Defining queries
Queries live entirely in book.toml — no recompile needed when you change them.
Tree-sitter queries
A plain string is treated as a tree-sitter S-expression query. Captures whose names match directive parameters are used as filters; the remaining captures are the output.
[]
# Captures doc-comment lines immediately before a struct (0 or 1 #[derive(…)]).
= """
[
((line_comment)+ @doc_comment
.
(struct_item name: (type_identifier) @name))
((line_comment)+ @doc_comment
.
(attribute_item)
.
(struct_item name: (type_identifier) @name))
]"""
# Full struct declaration.
= "(struct_item name: (type_identifier) @name) @struct"
# Individual field declarations — no surrounding `pub struct Name { }`.
= """
(struct_item
name: (type_identifier) @name
body: (field_declaration_list
(field_declaration) @field))
"""
Strip regex
Add strip to remove a regex pattern from every output line — useful for
stripping comment delimiters to get plain prose:
[]
= """
[
((line_comment)+ @doc_comment
.
(struct_item name: (type_identifier) @name))
((line_comment)+ @doc_comment
.
(attribute_item)
.
(struct_item name: (type_identifier) @name))
]"""
= "^///? ?"
jq queries
For complex extractions, write a jq filter applied to the tree-sitter AST serialised as JSON. The filter receives:
[]
= "jq"
= """
.params.name as $target_name |
.children as $all |
([$all | to_entries[] |
select(
.value.type == "struct_item" and
(.value.children[]? | select(.type == "type_identifier") | .text) == $target_name
)
] | .[0].key) as $idx |
if $idx == null then error("struct not found")
else
([$all[0:$idx] | to_entries[] |
select(.value.type != "line_comment" and .value.type != "attribute_item")
] | if length > 0 then last.key else -1 end) as $last_gap |
[$all[($last_gap+1):$idx][] |
select(.type == "line_comment") | .text | rtrimstr("\\n")] |
join("\\n")
end
"""
Directive syntax
{{ #treesitter <path>[#<query>][?<param>=<value>[&…]] }}
| Part | Description |
|---|---|
<path> |
Path to the source file, relative to the chapter's directory |
#<query> |
Named query from book.toml. Omit to embed the whole file. |
?<param>=<value> |
Parameters forwarded to the query (e.g. ?name=MyStruct) |
Prefix the directive with \ to emit it literally without expansion:
\{{ #treesitter src/lib.rs#doc_comment?name=Foo }}