sem diff
┌─ src/auth/login.ts ──────────────────────────────────
│
│ ⊕ function validateToken [added]
│ ∆ function authenticateUser [modified]
│ ⊖ function legacyAuth [deleted]
│
└──────────────────────────────────────────────────────
┌─ config/database.yml ─────────────────────────────────
│
│ ∆ property production.pool_size [modified]
│ - 5
│ + 20
│
└──────────────────────────────────────────────────────
Summary: 1 added, 1 modified, 1 deleted across 2 files
Install
Or build from source (requires Rust):
Or grab a binary from GitHub Releases.
Or run via Docker:
Commands
Works in any Git repo. No setup required. Also works outside Git for arbitrary file comparison.
sem diff
Entity-level diff with rename detection, structural hashing, and word-level inline highlights.
# Semantic diff of working changes
# Staged changes only
# Specific commit
# Commit range
# Verbose mode (word-level inline diffs for each entity)
# Plain text output (git status style)
# JSON output (for AI agents, CI pipelines)
# Markdown output (for PRs, reports)
# Compare any two files (no git repo needed)
# Read file changes from stdin (no git repo needed)
|
# Only specific file types
sem impact
Cross-file dependency graph shows what breaks if an entity changes.
# Full impact analysis
# Direct dependencies only
# Direct dependents only
# Affected tests only
# JSON output
# Disambiguate by file
sem blame
Entity-level blame showing who last modified each function, class, or method.
# JSON output
sem log
Track how a single entity evolved through git history.
# Verbose mode (show content diff between versions)
# Limit commits scanned
# JSON output
sem entities
List all entities in a file with their types and line ranges.
# JSON output
sem context
Token-budgeted context for LLMs: the entity, its dependencies, and its dependents, fitted to a token budget.
# Custom token budget
# JSON output
Use as default Git diff
Replace git diff output with entity-level diffs. Agents and humans get sem output automatically without changing any commands.
Now git diff shows entity-level changes instead of line-level. No prompts, no agent configuration needed. Everything that calls git diff gets sem output automatically. Also installs a pre-commit hook that shows entity-level blast radius of staged changes.
To disable and go back to normal git diff:
What it parses
21 programming languages with full entity extraction via tree-sitter:
| Language | Extensions | Entities |
|---|---|---|
| TypeScript | .ts .tsx .mts .cts |
functions, classes, interfaces, types, enums, exports |
| JavaScript | .js .jsx .mjs .cjs |
functions, classes, variables, exports |
| Python | .py |
functions, classes, decorated definitions |
| Go | .go |
functions, methods, types, vars, consts |
| Rust | .rs |
functions, structs, enums, impls, traits, mods, consts |
| Java | .java |
classes, methods, interfaces, enums, fields, constructors |
| C | .c .h |
functions, structs, enums, unions, typedefs |
| C++ | .cpp .cc .hpp |
functions, classes, structs, enums, namespaces, templates |
| C# | .cs |
classes, methods, interfaces, enums, structs, properties |
| Ruby | .rb |
methods, classes, modules |
| PHP | .php |
functions, classes, methods, interfaces, traits, enums |
| Swift | .swift |
functions, classes, protocols, structs, enums, properties |
| Elixir | .ex .exs |
modules, functions, macros, guards, protocols |
| Bash | .sh |
functions |
| HCL/Terraform | .hcl .tf .tfvars |
blocks, attributes (qualified names for nested blocks) |
| Kotlin | .kt .kts |
classes, interfaces, objects, functions, properties, companion objects |
| Fortran | .f90 .f95 .f |
functions, subroutines, modules, programs |
| Vue | .vue |
template/script/style blocks + inner TS/JS entities |
| XML | .xml .plist .svg .csproj |
elements (nested, tag-name identity) |
| ERB | .erb .html.erb |
blocks, expressions, code tags |
| Svelte | .svelte .svelte.js .svelte.ts |
component blocks + rune JS/TS modules |
Plus structured data formats:
| Format | Extensions | Entities |
|---|---|---|
| JSON | .json |
properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml |
sections, properties (dot paths) |
| TOML | .toml |
sections, properties |
| CSV | .csv .tsv |
rows (first column as identity) |
| Markdown | .md .mdx |
heading-based sections |
Everything else falls back to chunk-based diffing.
How matching works
Three-phase entity matching:
- Exact ID match — same entity in before/after = modified or unchanged
- Structural hash match — same AST structure, different name = renamed or moved (ignores whitespace/comments)
- Fuzzy similarity — >80% token overlap = probable rename
This means sem detects renames and moves, not just additions and deletions. Structural hashing also distinguishes cosmetic changes (whitespace, formatting) from real logic changes.
MCP Server
sem includes an MCP server with 6 tools for AI agents: sem_entities, sem_diff, sem_blame, sem_impact, sem_log, sem_context. These mirror the CLI commands exactly.
Install the MCP binary:
JSON output
As a library
sem-core can be used as a Rust library dependency:
[]
= { = "https://github.com/Ataraxy-Labs/sem", = "0.3" }
Used by weave (semantic merge driver) and inspect (entity-level code review).
Architecture
- tree-sitter for code parsing (native Rust, not WASM)
- git2 for Git operations
- rayon for parallel file processing
- xxhash for structural hashing
- Plugin system for adding new languages and formats
Contributing
Want to add a new language? See CONTRIBUTING.md for a step-by-step guide.
Star History
License
MIT OR Apache-2.0