Skip to main content

VectorExtractor

Trait VectorExtractor 

Source
pub trait VectorExtractor: Send + Sync {
    // Required method
    fn extract_document(
        &self,
        base_path: &Path,
        file_path: &Path,
        frontmatter: &Value,
        content: &str,
    ) -> Result<VectorDocument>;

    // Provided methods
    fn content_glob(&self) -> &str { ... }
    fn name(&self) -> &str { ... }
}
Expand description

Trait for extracting vector documents from domain-specific content.

Each knowledge domain (music theory, math, etc.) implements this trait to define how its markdown files with frontmatter are transformed into VectorDocument instances. The key responsibility is text composition: deciding what content should be embedded.

§Lifecycle

For each content file, VectorIndexBuilder calls:

  1. extract_document() — Parse file and compose text for embedding

The returned VectorDocument.text is what gets embedded by the EmbeddingProvider.

Required Methods§

Source

fn extract_document( &self, base_path: &Path, file_path: &Path, frontmatter: &Value, content: &str, ) -> Result<VectorDocument>

Extract a vector document from a content file.

§Arguments
  • base_path - Root directory for content
  • file_path - Full path to the file being processed
  • frontmatter - Parsed YAML frontmatter as generic Value
  • content - Markdown body (after frontmatter)
§Text Composition

The implementation should compose the text field with all content that should influence semantic similarity. A common pattern is:

title | description | key terms | body content

Provided Methods§

Source

fn content_glob(&self) -> &str

Returns the content glob pattern for this domain.

Used by VectorIndexBuilder to discover content files. Default: "**/*.md" (all markdown files recursively).

Source

fn name(&self) -> &str

Returns the name of this extractor for logging/debugging.

Implementors§