pub trait VectorExtractor: Send + Sync {
// Required method
fn extract_document(
&self,
base_path: &Path,
file_path: &Path,
frontmatter: &Value,
content: &str,
) -> Result<VectorDocument>;
// Provided methods
fn content_glob(&self) -> &str { ... }
fn name(&self) -> &str { ... }
}Expand description
Trait for extracting vector documents from domain-specific content.
Each knowledge domain (music theory, math, etc.) implements this trait
to define how its markdown files with frontmatter are transformed into
VectorDocument instances. The key responsibility is text composition:
deciding what content should be embedded.
§Lifecycle
For each content file, VectorIndexBuilder calls:
extract_document()— Parse file and compose text for embedding
The returned VectorDocument.text is what gets embedded by the
EmbeddingProvider.
Required Methods§
Sourcefn extract_document(
&self,
base_path: &Path,
file_path: &Path,
frontmatter: &Value,
content: &str,
) -> Result<VectorDocument>
fn extract_document( &self, base_path: &Path, file_path: &Path, frontmatter: &Value, content: &str, ) -> Result<VectorDocument>
Extract a vector document from a content file.
§Arguments
base_path- Root directory for contentfile_path- Full path to the file being processedfrontmatter- Parsed YAML frontmatter as generic Valuecontent- Markdown body (after frontmatter)
§Text Composition
The implementation should compose the text field with all content
that should influence semantic similarity. A common pattern is:
title | description | key terms | body contentProvided Methods§
Sourcefn content_glob(&self) -> &str
fn content_glob(&self) -> &str
Returns the content glob pattern for this domain.
Used by VectorIndexBuilder to discover content files.
Default: "**/*.md" (all markdown files recursively).