Crate trusty_embedder

Expand description

Shared text-embedding abstraction for trusty-* projects.

Why: trusty-memory and trusty-search both shipped near-identical Embedder traits and FastEmbedder implementations, with subtle drift (cache vs no-cache, sync vs async warmup, dim() vs dimension()). Centralising fixes one bug in one place and lets future consumers pick up the embedder for free.

What: an async Embedder trait with embed_batch as the single primitive (single-text embed is a free helper), plus a production FastEmbedder (fastembed-rs, all-MiniLM-L6-v2, 384-d) with LRU caching and ORT warmup, and a MockEmbedder test double behind the test-support feature.

Test: cargo test -p trusty-embedder covers shape, cache hits, and the mock embedder. ONNX-backed tests are #[ignore] to keep CI under one cargo-feature umbrella.

Structs§

FastEmbedder: Local CPU embedder backed by fastembed-rs (ONNX runtime, all-MiniLM-L6-v2).

Enums§

ExecutionProvider: Identifier for the execution provider an embedder is actually using.

Constants§

DEFAULT_CACHE_CAPACITY: Default LRU cache capacity. Picked to be large enough to keep the hot working set of repeat queries in memory but small enough that the cache itself fits well inside L2/L3 on a typical developer machine.
EMBED_DIM: Output dimension of the all-MiniLM-L6-v2 model.

Traits§

Embedder: Abstraction over embedding backends.

Functions§

embed_one: Convenience helper: embed a single text via embed_batch and return the lone vector.