semver-analyzer-llm
LLM-based behavioral analysis for the semver-analyzer. Implements the BehaviorAnalyzer trait from semver-analyzer-core to detect breaking behavioral changes that static analysis cannot catch -- changes in what functions do, not just their type signatures.
The crate is agent-agnostic: it shells out to any external LLM CLI tool (e.g., goose, opencode) via a configurable command string. It does not embed or link against any LLM SDK.
Architecture
The system has four pillars:
- Agent-agnostic invocation -- any CLI that accepts a prompt as its final argument
- Template-constrained prompts -- structured JSON schemas reduce hallucination
- Tier 1 structural comparison -- deterministic spec comparison without LLM
- Tier 2 LLM fallback -- for ambiguous cases that structural comparison cannot resolve
Usage
use LlmBehaviorAnalyzer;
use BehaviorAnalyzer;
let analyzer = new;
// Infer a function's behavioral spec
let spec = analyzer.infer_spec?;
// Infer with test context (higher confidence)
let spec = analyzer.infer_spec_with_test_context?;
// Compare two specs for breaking changes
let verdict = analyzer.specs_are_breaking?;
// Check if a break propagates through a caller
let propagates = analyzer.check_propagation?;
How Invocation Works
- The
llm_commandstring (e.g.,"goose run --no-session -q -t") is split on whitespace - The prompt is appended as the final argument
- The command is executed as a subprocess with a configurable timeout (default 120s)
- JSON is extracted from the response using three fallback strategies:
- Fenced JSON blocks (
```json ... ```) - Largest valid JSON object found via regex
- Manual brace-matching parser
- Fenced JSON blocks (
Spec Comparison (Tier 1)
The deterministic structural comparison runs first, avoiding an LLM call when possible:
| Spec Field | Breaking If... | Not Breaking If... |
|---|---|---|
| Preconditions | New precondition added; condition tightened | Precondition removed (more permissive) |
| Postconditions | Postcondition removed; return value changed | New postcondition added |
| Error behavior | Error type changed; new error case added | Error case removed |
| Side effects | Side effect removed; action changed | New side effect added |
If Tier 1 detects a break (confidence >= 0.80), it returns immediately. Otherwise, if both specs have non-empty notes fields, it falls through to Tier 2 LLM comparison.
Additional Analysis Capabilities
Beyond the core BehaviorAnalyzer trait, the analyzer provides:
| Method | Purpose |
|---|---|
analyze_file_diff |
File-level behavioral + API analysis (1 LLM call per file) |
analyze_composition_patterns |
Detect JSX nesting structure changes |
infer_constant_renames |
Identify regex-based constant rename patterns |
infer_interface_renames |
Map removed interfaces to their replacements |
infer_component_hierarchy |
Infer parent-child composition hierarchy |
infer_suffix_renames |
Identify CSS physical-to-logical property suffix renames |
Prompt System
All prompts produce structured JSON output matching exact schemas. Key prompts:
- Spec inference -- forces
FunctionSpecJSON (preconditions, postconditions, error_behavior, side_effects) - Spec inference with tests -- grounds the spec with test assertion diffs as truth
- File behavioral analysis -- categorizes changes into 8 behavioral + 6 API change types
- Propagation check -- determines if a callee's break propagates through a caller
- Constant/interface rename inference -- identifies systematic rename patterns
- Hierarchy inference -- infers component parent-child composition from source
Dependencies
| Crate | Purpose |
|---|---|
semver-analyzer-core |
Core traits (BehaviorAnalyzer) and types (FunctionSpec, BreakingVerdict, etc.) |
serde, serde_json |
JSON serialization for prompts and response parsing |
anyhow |
Error handling |
regex |
JSON extraction from free-text LLM output |
tracing |
Structured logging |
License
Apache-2.0