Expand description
Multi-Vector Documents with Stable Aggregation Semantics (Task 5)
This module enables documents to have multiple vectors (e.g., for chunks/paragraphs) with deterministic aggregation during search.
§Design
Document (doc_id=123)
├── Chunk 0 → Vector 0 (internal_id=1000)
├── Chunk 1 → Vector 1 (internal_id=1001)
├── Chunk 2 → Vector 2 (internal_id=1002)
└── Chunk 3 → Vector 3 (internal_id=1003)
Search: query → [1001, 1003, 1002] (internal IDs with scores)
→ Aggregate by doc_id → doc_123: max(score(1001), score(1003), score(1002))§Aggregation Methods
- Max: Use the best-matching chunk’s score (ColBERT-like late interaction)
- Mean: Average all chunk scores (good for comprehensive coverage)
- First: Use the first chunk’s score (for ordered content)
§API
ⓘ
// Insert multi-vector document
collection.insert_multi(
doc_id="doc_123",
vectors=[v1, v2, v3, v4],
metadata={...},
)
// Search with aggregation
collection.search(
query,
aggregate="max", // max|mean|first
)Structs§
- Document
Score - Aggregate scores for a document
- Multi
Vector Aggregator - Aggregates search results from vector level to document level
- Multi
Vector Config - Configuration for multi-vector storage
- Multi
Vector Document - Multi-vector document for insertion
- Multi
Vector Mapping - Mapping from internal vector IDs to document IDs and chunk indices
Enums§
- Aggregation
Method - Aggregation method for multi-vector documents
- Multi
Vector Error - Errors for multi-vector operations
Type Aliases§
- Chunk
Index - Chunk/part index within a document
- DocId
- Document ID (user-provided, stable identifier)
- Internal
Id - Internal vector ID (storage-assigned)