Routing engine — match incoming requests against per-org rules to pick a target model (or pass through unchanged).
Mirrors the shape used by tt-plan-core's replay-time matcher so a Plan
projection and the live Gateway agree on which route would fire for a
given request. Differences from plan-core:
- Input is the canonical [
ChatCompletionRequest] + [RequestContext] (live runtime), not a historicalRequestLog. - Token-count conditions use
input_tokensestimated from the request (caller supplies; the engine never tokenizes itself — that's a hot-path responsibility owned by the caller's tokenizer cache).
Rules are stored sorted descending by priority. First match wins.