Skip to main content

Module bert

Module bert 

Source
Expand description

BERT-family encoder: wraps the existing embed_all pipeline.

Bridges the VectorEncoder trait surface to the embed::embed_all function that powers --model bert and --model modernbert. Owns the Vec<Box<dyn EmbedBackend>> (one per detected GPU/CPU backend) and the HuggingFace tokenizer, both required by the streaming pipeline.

§Why a thin wrapper, not a refactor

The embed_all body — walk → chunk → tokenize → embed via streaming pipeline — is non-trivial and rich in edge cases (rayon clones for CPU, ring-buffer for GPU, sort-by-length batching, file-count threshold for streaming vs batch). Wrapping rather than relocating keeps that battle- tested code intact and reduces P0.3 to a small adapter, deferring any deeper refactor until benefits emerge.

See docs/PLAN.md:P0.3 for the acceptance predicates.

Structs§

BertEncoder
BERT-family encoder implementation of VectorEncoder.