Skip to main content

Module multi_vector

Module multi_vector 

Source
Expand description

Multi-Vector Documents with Stable Aggregation Semantics (Task 5)

This module enables documents to have multiple vectors (e.g., for chunks/paragraphs) with deterministic aggregation during search.

§Design

Document (doc_id=123)
├── Chunk 0 → Vector 0 (internal_id=1000)
├── Chunk 1 → Vector 1 (internal_id=1001)
├── Chunk 2 → Vector 2 (internal_id=1002)
└── Chunk 3 → Vector 3 (internal_id=1003)

Search: query → [1001, 1003, 1002] (internal IDs with scores)
       → Aggregate by doc_id → doc_123: max(score(1001), score(1003), score(1002))

§Aggregation Methods

  • Max: Use the best-matching chunk’s score (ColBERT-like late interaction)
  • Mean: Average all chunk scores (good for comprehensive coverage)
  • First: Use the first chunk’s score (for ordered content)

§API

// Insert multi-vector document
collection.insert_multi(
    doc_id="doc_123",
    vectors=[v1, v2, v3, v4],
    metadata={...},
)

// Search with aggregation
collection.search(
    query,
    aggregate="max",  // max|mean|first
)

Structs§

DocumentScore
Aggregate scores for a document
MultiVectorAggregator
Aggregates search results from vector level to document level
MultiVectorConfig
Configuration for multi-vector storage
MultiVectorDocument
Multi-vector document for insertion
MultiVectorMapping
Mapping from internal vector IDs to document IDs and chunk indices

Enums§

AggregationMethod
Aggregation method for multi-vector documents
MultiVectorError
Errors for multi-vector operations

Type Aliases§

ChunkIndex
Chunk/part index within a document
DocId
Document ID (user-provided, stable identifier)
InternalId
Internal vector ID (storage-assigned)