Expand description
A library for computing syntax-aware diffs using tree-sitter.
This crate provides a structural diff algorithm inspired by Difftastic. Unlike line-based diffs, it understands code structure and produces more meaningful results when comparing source files.
§Overview
The primary types in this crate are:
diff_trees: Computes a diff between two syntax trees, returning byte ranges of changed regions.build_tree: Converts a tree-sitter parse tree into aSyntaxTree.SyntaxTree: A syntax tree optimized for diffing, stored in preorder traversal.SyntaxDiffOptions: Configuration for the diff algorithm.
§Example: basic diffing
This example shows how to diff two Rust code snippets:
use syndiff::{build_tree, diff_trees};
let old_source = "fn add(a: i32, b: i32) -> i32 { a + b }";
let new_source = "fn add(x: i32, y: i32) -> i32 { x + y }";
// Parse with tree-sitter
let mut parser = tree_sitter::Parser::new();
parser.set_language(&tree_sitter_rust::LANGUAGE.into()).unwrap();
let old_ts_tree = parser.parse(old_source, None).unwrap();
let new_ts_tree = parser.parse(new_source, None).unwrap();
// Convert to syndiff trees
let old_tree = build_tree(old_ts_tree.walk(), old_source);
let new_tree = build_tree(new_ts_tree.walk(), new_source);
// Compute the diff
let (old_ranges, new_ranges) = diff_trees(
&old_tree,
&new_tree,
None,
None,
None,
).expect("diff succeeded");
// The ranges indicate which bytes changed
assert!(!old_ranges.is_empty());
assert!(!new_ranges.is_empty());
// Extract the changed text
for range in &old_ranges {
println!("removed: {:?}", &old_source[range.clone()]);
}
for range in &new_ranges {
println!("added: {:?}", &new_source[range.clone()]);
}§Example: limiting graph size
For very different files, the diff graph can grow large. Use
SyntaxDiffOptions to set a limit:
use syndiff::{diff_trees, SyntaxDiffOptions};
let options = SyntaxDiffOptions {
graph_limit: 100_000, // Lower limit for faster failure
};
match diff_trees(&old_tree, &new_tree, None, None, Some(options)) {
Some((old_ranges, new_ranges)) => {
println!("diff computed successfully");
}
None => {
println!("diff exceeded graph limit, files too different");
}
}§Example: partial diffs with range bounds
When diffing a specific region within larger files, provide byte range bounds. The returned ranges will be clipped and made relative to the bounds:
use syndiff::{build_tree, diff_trees};
let old_source = "fn foo() { 1 }\nfn bar() { 2 }";
let new_source = "fn foo() { 1 }\nfn bar() { 3 }";
let old_tree = build_tree(parser.parse(old_source, None).unwrap().walk(), old_source);
let new_tree = build_tree(parser.parse(new_source, None).unwrap().walk(), new_source);
// Only diff the second function (bytes 15 onwards)
let (old_ranges, new_ranges) = diff_trees(
&old_tree,
&new_tree,
Some(15..old_source.len()),
Some(15..new_source.len()),
None,
).unwrap();
// Ranges are relative to the bounds (starting from 0)
for range in &old_ranges {
let absolute_range = (range.start + 15)..(range.end + 15);
println!("changed in old: {:?}", &old_source[absolute_range]);
}§How it works
The algorithm is based on the approach described in the Difftastic manual, modeling tree diffing as a shortest-path problem through an implicit graph.
Each vertex represents a position in both syntax trees. Edges represent diff operations (unchanged, added, removed) with associated costs tuned to prefer matching identical subtrees. The algorithm finds the minimum-cost path using Dijkstra’s algorithm.
§Structural hashing
Each node stores a 64-bit hash of its structure: the grammar kind, content (for leaves), and children’s hashes. This enables O(1) subtree comparison - if two nodes have the same hash, their entire subtrees are identical and can be skipped in one step.
§Complexity
Let n and m be the number of nodes in the left and right trees.
- Time: O(n × m × log(n × m)) worst case
- Space: O(n × m)
In practice, structural hashing makes similar trees much faster - closer to O(n + m) - since large unchanged subtrees are skipped entirely.
The graph_limit parameter bounds memory usage by limiting vertices
explored, returning None if exceeded.
Structs§
- Syntax
Diff Options - Configuration for the diff algorithm.
- Syntax
Tree - A syntax tree stored as a Vec of nodes in preorder.
Functions§
- build_
tree - Builds a
SyntaxTreefrom a tree-sitter parse tree and source text. - diff_
trees - Computes a syntax-aware diff between two syntax trees.