Expand description
BERT-family encoder: wraps the existing embed_all pipeline.
Bridges the VectorEncoder trait surface to the
embed::embed_all function that powers
--model bert and --model modernbert. Owns the
Vec<Box<dyn EmbedBackend>> (one per detected GPU/CPU backend) and the
HuggingFace tokenizer, both required by the streaming pipeline.
§Why a thin wrapper, not a refactor
The embed_all body — walk → chunk → tokenize → embed via streaming
pipeline — is non-trivial and rich in edge cases (rayon clones for CPU,
ring-buffer for GPU, sort-by-length batching, file-count threshold for
streaming vs batch). Wrapping rather than relocating keeps that battle-
tested code intact and reduces P0.3 to a small adapter, deferring any
deeper refactor until benefits emerge.
See docs/PLAN.md:P0.3 for the acceptance predicates.
Structs§
- Bert
Encoder - BERT-family encoder implementation of
VectorEncoder.