mdbook-ts 0.2.6

An mdBook preprocessor that uses tree-sitter to extract code snippets from source files
Documentation

mdbook-treesitter

Crates.io docs.rs status-badge License

An mdBook preprocessor that uses tree-sitter to extract code snippets directly from source files and embed them in your book.

Directives in your Markdown are replaced with the text extracted by the named query — wrap them in a fenced code block to get syntax highlighting:

```rust
{{ #treesitter src/lib.rs#doc_comment?name=MyStruct }}
```

Installation

cargo

cargo install mdbook-ts

Nix flakes

Add this repository as a flake input and include the package in your dev shell:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
    mdbook-ts.url = "https://codeberg.org/landre/mdbook-treesitter/archive/v0.2.0.tar.gz";
  };

  outputs = { nixpkgs, mdbook-ts, ... }: let
    system = "x86_64-linux";
    pkgs = nixpkgs.legacyPackages.${system};
  in {
    devShells.${system}.default = pkgs.mkShell {
      packages = [ pkgs.mdbook mdbook-ts.packages.${system}.default ];
    };
  };
}

Alternatively, use the provided overlay to make pkgs.mdbook-treesitter available:

nixpkgs.overlays = [ mdbook-ts.overlays.default ];

Then add the preprocessor to your book.toml:

[preprocessor.treesitter]

Development

To iterate on queries without reinstalling, point command at cargo run:

[preprocessor.treesitter]
command = "cargo run --manifest-path /path/to/mdbook-treesitter/Cargo.toml --"

Language support

Rust, TOML, and Markdown parsers are bundled out of the box.

Additional parsers can be loaded from a shared library:

[preprocessor.treesitter.python]
parser = "/path/to/tree-sitter-python.so"  # absolute or relative to book.toml

Defining queries

Queries live entirely in book.toml — no recompile needed when you change them.

Tree-sitter queries

A plain string is treated as a tree-sitter S-expression query. Captures whose names match directive parameters are used as filters; the remaining captures are the output.

[preprocessor.treesitter.rust.queries]
# Captures doc-comment lines immediately before a struct (0 or 1 #[derive(…)]).
doc_comment = """
[
  ((line_comment)+ @doc_comment
   .
   (struct_item name: (type_identifier) @name))
  ((line_comment)+ @doc_comment
   .
   (attribute_item)
   .
   (struct_item name: (type_identifier) @name))
]"""

# Full struct declaration.
struct = "(struct_item name: (type_identifier) @name) @struct"

# Individual field declarations — no surrounding `pub struct Name { }`.
struct_fields = """
(struct_item
  name: (type_identifier) @name
  body: (field_declaration_list
    (field_declaration) @field))
"""

Strip regex

Add strip to remove a regex pattern from every output line — useful for stripping comment delimiters to get plain prose:

[preprocessor.treesitter.rust.queries.comment_text]
query = """
[
  ((line_comment)+ @doc_comment
   .
   (struct_item name: (type_identifier) @name))
  ((line_comment)+ @doc_comment
   .
   (attribute_item)
   .
   (struct_item name: (type_identifier) @name))
]"""
strip = "^///? ?"

jq queries

For complex extractions, write a jq filter applied to the tree-sitter AST serialised as JSON. The filter receives:

{ "params": { "name": "Foo",  }, "source": "", "children": [] }
[preprocessor.treesitter.rust.queries.doc_comment_jq]
format = "jq"
query = """
.params.name as $target_name |
.children as $all |
([$all | to_entries[] |
  select(
    .value.type == "struct_item" and
    (.value.children[]? | select(.type == "type_identifier") | .text) == $target_name
  )
] | .[0].key) as $idx |
if $idx == null then error("struct not found")
else
  ([$all[0:$idx] | to_entries[] |
    select(.value.type != "line_comment" and .value.type != "attribute_item")
  ] | if length > 0 then last.key else -1 end) as $last_gap |
  [$all[($last_gap+1):$idx][] |
    select(.type == "line_comment") | .text | rtrimstr("\\n")] |
  join("\\n")
end
"""

Directive syntax

{{ #treesitter <path>[#<query>][?<param>=<value>[&…]] }}
Part Description
<path> Path to the source file, relative to the chapter's directory
#<query> Named query from book.toml. Omit to embed the whole file.
?<param>=<value> Parameters forwarded to the query (e.g. ?name=MyStruct)

Prefix the directive with \ to emit it literally without expansion:

\{{ #treesitter src/lib.rs#doc_comment?name=Foo }}