MicroResolve
A pre-LLM reflex layer for AI agents — intent routing and tool prefilter in tens of microseconds. Embedded library with optional HTTP server.
v0.1 — early release. API may change before 1.0. Pin exact versions in production.
Documentation · Benchmarks & methodology · Changelog · Contributing
"the dispute on order 1834 came back invalid — refund the customer,
mark the dispute resolved, then post in #ops that we lost it"
→ create_refund (confirmed, high)
→ update_dispute (confirmed, high)
→ slack_send_message (confirmed, medium)
→ relation: sequential
~92 µs · narrowed 129 tools to top-3
for the LLM, with no LLM call yet
Your LLM is System 2 — slow, deliberate, expensive. MicroResolve is System 1 — fast, learned, cheap. Pattern-match the easy 80% of queries in microseconds; let the LLM reason about the hard 20%. MicroResolve learns continuously from LLM-judged corrections — every misroute the LLM fixes flows back into the index, so the candidate set sharpens from production traffic alone.
Use it for:
- LLM-agent tool prefiltering — narrow 100+ tools to the top 3 the LLM actually needs to see this turn. Cuts prompt tokens, cuts latency, scales to large catalogs.
- Customer-support triage — route incoming tickets / chat messages to the right queue or workflow before the LLM gets involved (or without an LLM at all for the easy cases).
- Intent classification — single-utterance bucketing for conversational interfaces, IVR-style menus, search-query routing.
- Slash / chat command routing — recognize the user's command from free-form phrasing without retraining a model every time the catalog changes.
- Workflow / decision routing — multi-intent decomposition with relation detection (sequential / conditional / parallel / negation) for steps that need to fan out or chain.
- Permission and risk gating — classify a request into a risk tier before paying for an LLM round-trip.
| LLM classification | Embedding model | MicroResolve | |
|---|---|---|---|
| Latency | 200–2000 ms | 10–50 ms | 30–90 µs |
| Cost / query | per-token | GPU / API | $0 |
| Setup | prompt engineering | training data + GPU | seed phrases or one OpenAPI / MCP import |
| Continuous learning | retrain pipeline | full retrain | incremental, in place |
| Multi-intent | prompt-dependent | separate model | native |
| Dependencies | API key | PyTorch / ONNX | none at runtime — Rust core with Python and Node bindings |
[!IMPORTANT] MicroResolve is a prefilter, not an absolute classifier. It returns ranked candidates — the LLM is the final picker. Your agent prompt must include a
confirm_full_catalogfallback tool so the LLM can reach the full catalog when none of the candidates fit. That fallback is part of the design, not an optional safety net. See Confirm-turn pattern.
Benchmarks
Agent tool routing — 129 real tools imported from 5 production MCP servers (Stripe / Linear / Notion / Slack / Shopify), single namespace, scored as a prefilter for an LLM picker:
| Stage | Single top-1 | Single top-3 | Multi F1 | p50 |
|---|---|---|---|---|
| Cold start (LLM-seeded import) | 76.5 % | 88.2 % | 40.9 % | 87 µs |
| + auto-learn from corrections | 88.2 % | 97.1 % | 76.6 % | 64 µs |
+ auto-learn = incremental Hebbian + LLM-judged phrase ingestion from
production corrections. No retraining pipeline; the data updates in place.
Reproduce: python3 benchmarks/agent_tools_bench.py (~$0.55 / ~10 min on
Haiku 4.5).
Methodology, datasets, and reproduction scripts live in
benchmarks/. What's not yet benchmarked: out-of-scope rejection (see Confirm-turn pattern), adversarial robustness, drift over multi-week production traffic.
Single-utterance intent classification (academic baselines):
| Dataset | Intents | Seeds | Top-1 (cold) | Top-1 (+ learning) | Top-3 | p50 latency |
|---|---|---|---|---|---|---|
| CLINC150 | 150 | 50 | 84.0 % | 85.8 % | 94.2 % | 22 µs |
| CLINC150 | 150 | 100 | 87.5 % | 96.4 % | 95.9 % | 24 µs |
| BANKING77 | 77 | 50 | 81.9 % | 85.0 % | 94.1 % | 21 µs |
| BANKING77 | 77 | 130 | 85.5 % | 92.8 % | 96.0 % | 23 µs |
Quick start
Embedded (Rust):
use Engine;
let engine = open?;
let ns = engine.namespace?;
ns.add_intent?;
let r = ns.resolve;
// r.ranked[0].id == "cancel_subscription"
HTTP server (with the optional Studio UI):
&& &&
Bootstrap intents from an existing spec:
Bindings: microresolve on crates.io,
microresolve on PyPI,
microresolve on npm.
How it works
A query passes through five layers, all in-process, all in-memory:
raw query
→ L0 typo correction (character n-gram Jaccard)
→ L1 vocabulary bridging (morphology / abbreviation, OOV-gated)
→ L2 intent scoring (IDF-weighted sparse term graph)
→ L3 confidence + disposition
→ L4 cross-provider tiebreak
→ result
failures → review queue → auto-learn worker → L1 + L2 updated
L0 fixes typos against the known vocabulary. L1 substitutes morphological
variants (canceling → cancel) and abbreviations (pr → pull request)
only when the source word is out-of-vocabulary, so distinctive user
input is never rewritten. L2 is the core scorer: each intent has a learned
sparse vector of IDF-weighted terms; multi-intent uses a token-consumption
pass to confirm one intent then re-score the rest. L3 classifies the score
distribution as confident / low_confidence / escalate. L4 breaks ties
when the same action exists across multiple providers.
Every confirmed routing reinforces term weights. Every correction shifts weights away from the wrong intent. No retraining; the data updates in place.
Confirm-turn pattern (System 1 → System 2)
MicroResolve returns ranked candidates; the LLM picks one or falls back
to the full catalog when nothing fits. Always include a
confirm_full_catalog fallback tool in your agent prompt — without it,
out-of-scope queries and novel phrasing route to the wrong tool:
Candidate tools (from the prefilter):
- tool_a (...description...)
- tool_b (...description...)
- tool_c (...description...)
If one of these clearly applies, call it. Otherwise — unsure, out of
scope, or candidates look unrelated — call `confirm_full_catalog` to
receive every tool, then pick.
The prefilter shrinks 150 tools to 3 in ~60 µs; the LLM is the final picker. Every confirmed call flows back as a corrected example, so the candidate set sharpens over time.
HTTP API
Send X-Namespace-ID: my-namespace to isolate intents per namespace. The
core endpoints are /api/route_multi, /api/intents, /api/training/{review,apply},
and /api/import/mcp/{search,fetch,apply}. Full reference in the
server API docs.
Multi-intent, projection, and multilingual
Native multi-intent — a single query can confirm several intents in one call, with the relation between them detected:
"cancel my order and update my address"
→ cancel_order (confirmed, high)
→ update_address (confirmed, medium)
→ relation: parallel
Detected relations: sequential, conditional, negation, parallel.
Projected context — co-occurrence tracking discovers what auxiliary intents typically fire alongside the primary one, so your orchestrator can pre-fetch them in parallel without an extra LLM round-trip:
"I want a refund"
→ refund_order (confirmed, high)
→ projected: check_balance (21 %), warranty_lookup (13 %)
These relationships emerge from accumulated routing — they're not configured.
Multilingual — Latin scripts via whitespace tokenization, CJK (Chinese / Japanese / Korean) via Aho-Corasick automaton + character bigrams, all in the same namespace. Per-language seed phrases per intent. Multi-intent decomposition runs after tokenization, so a Chinese query like "取消订阅然后退款" decomposes the same way an English query does.
Import formats
- MCP tools/list (and Smithery registry passthrough)
- OpenAPI / Swagger — each endpoint becomes an intent
- OpenAI function-calling schemas
- LangChain tools
License
Dual-licensed under MIT or Apache-2.0 at your option — the standard Rust ecosystem licensing. Both are fully permissive and allow commercial use.
- LICENSE-MIT
- LICENSE-APACHE — adds an explicit patent grant
Contribution
Unless you state otherwise, any contribution intentionally submitted for inclusion in this work shall be dual-licensed as above, without any additional terms or conditions.
Commercial support
Maintained by Gladius Thayalarajan — consulting on custom integrations, multilingual tuning, and agent-stack tool routing: gladius.thayalarajan@gmail.com.