Expand description
§Shardex - High-Performance Vector Search Engine
Shardex provides a memory-mapped vector search engine with the ApiThing pattern for consistent, composable, and type-safe operations.
§Architecture
The library is built around three core concepts:
ShardexContext: Shared state and resource management- Operations: Types implementing
ApiOperationtrait - Parameters: Type-safe input objects for each operation
§Core Operations
§Index Management
CreateIndex- Create new index
§Document Operations
AddPostings- Add vector postingsStoreDocumentText- Store document textBatchStoreDocumentText- Batch text storage
§Search Operations
Search- Vector similarity searchGetDocumentText- Retrieve document textExtractSnippet- Extract text snippets
§Maintenance Operations
Flush- Flush pending operationsGetStats- Index statisticsGetPerformanceStats- Performance metrics
§Usage Patterns
All operations follow the same pattern:
use apithing::ApiOperation;
let result = OperationType::execute(&mut context, ¶meters)?;§Quick Start
use shardex::api::{
ShardexContext, CreateIndex, AddPostings, Search,
CreateIndexParams, AddPostingsParams, SearchParams
};
use shardex::{DocumentId, Posting};
use apithing::ApiOperation;
// Create context and index
let mut context = ShardexContext::new();
let create_params = CreateIndexParams::builder()
.directory_path("./my_index".into())
.vector_size(384)
.shard_size(10000)
.batch_write_interval_ms(100)
.build()?;
CreateIndex::execute(&mut context, &create_params)?;
// Add postings
let postings = vec![Posting {
document_id: DocumentId::from_raw(1),
start: 0,
length: 100,
vector: vec![0.1; 384],
}];
AddPostings::execute(&mut context, &AddPostingsParams::new(postings)?)?;
// Search
let results = Search::execute(&mut context, &SearchParams::builder()
.query_vector(vec![0.1; 384])
.k(10)
.build()?
)?;§Examples
The examples/ directory contains comprehensive examples:
basic_usage.rs- Basic operationsconfiguration.rs- Configuration optionsbatch_operations.rs- Batch processingdocument_text_basic.rs- Text storagemonitoring.rs- Performance monitoring
Run examples with:
cargo run --example basic_usage§Features
- Consistent API: All operations use the ApiThing pattern
- Type Safety: Parameter objects prevent errors
- Shared Context: Efficient resource management
- Memory-mapped storage for zero-copy operations and fast startup
- ACID transactions via write-ahead logging (WAL)
- Incremental updates without full index rebuilds
- Document text storage with snippet extraction
- Performance monitoring and detailed statistics
- Dynamic shard management with automatic splitting
- Concurrent reads during write operations
- Bloom filter optimization for efficient document deletion
- Crash recovery from unexpected shutdowns
§Development Guidelines
§Struct Definition Standards
§Default Implementation Rules
-
PREFER
#[derive(Default)]for structs with all zero/empty defaults:#[derive(Debug, Clone, Default)] pub struct SimpleMetrics { pub count: u64, pub total: u64, } -
USE manual
impl Defaultonly when:- Non-zero defaults are needed
- Complex initialization is required
- Fields contain non-Default types
use std::time::Instant; #[derive(Debug, Clone)] pub struct ComplexMetrics { pub start_time: Instant, pub threshold: f64, } impl Default for ComplexMetrics { fn default() -> Self { Self { start_time: Instant::now(), // Can't derive this threshold: 0.95, // Non-zero default } } } -
AVOID redundant patterns like:
// DON'T DO THIS - just derive Default instead #[derive(Debug, Clone)] pub struct SomeStruct { pub count: u64, } impl SomeStruct { pub fn new() -> Self { Self { count: 0 } } } impl Default for SomeStruct { fn default() -> Self { Self::new() // If new() just sets zero/empty values } }
§Struct Size Guidelines
- MAXIMUM 15 fields per struct (prefer 10 or fewer)
- BREAK DOWN large structs into logical sub-structures:
// Instead of one large struct with 30+ fields: #[derive(Debug, Clone, Default)] pub struct DocumentMetrics { pub documents_stored: u64, pub total_size: u64, pub average_latency: f64, } - GROUP related fields into cohesive types
- USE composition over large flat structures
§Derive Attribute Ordering
Always use consistent ordering for derive attributes:
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Hash, Serialize, Deserialize)]Order: Debug, Clone, Copy (if applicable), Default, PartialEq, Eq, Hash, Serialize, Deserialize
§Builder Pattern Usage
Use builder patterns for:
- Configuration structs with many optional parameters
- Complex initialization sequences
- Structs with validation requirements
#[derive(Debug, Clone, Default)]
pub struct MyConfig {
pub timeout: u64,
}
impl MyConfig {
pub fn new() -> Self { Self::default() }
pub fn with_timeout(mut self, timeout: u64) -> Self {
self.timeout = timeout;
self
}
}Re-exports§
pub use error::ShardexError;pub use identifiers::DocumentId;pub use identifiers::ShardId;pub use identifiers::TransactionId;pub use structures::IndexStats;pub use structures::Posting;pub use structures::SearchResult;pub use api::AddPostings;pub use api::AddPostingsParams;pub use api::BatchAddPostings;pub use api::BatchAddPostingsParams;pub use api::BatchDocumentTextStats;pub use api::BatchStats;pub use api::BatchStoreDocumentText;pub use api::BatchStoreDocumentTextParams;pub use api::CreateIndex;pub use api::CreateIndexParams;pub use api::DetailedIndexStats;pub use api::DetailedPerformanceMetrics;pub use api::DocumentTextEntry;pub use api::ExtractSnippet;pub use api::ExtractSnippetParams;pub use api::Flush;pub use api::FlushParams;pub use api::GetDocumentText;pub use api::GetDocumentTextParams;pub use api::GetPerformanceStats;pub use api::GetPerformanceStatsParams;pub use api::GetStats;pub use api::GetStatsParams;pub use api::IncrementalAdd;pub use api::IncrementalAddParams;pub use api::IncrementalStats;pub use api::PerformanceStats;pub use api::RemovalStats;pub use api::RemoveDocuments;pub use api::RemoveDocumentsParams;pub use api::Search;pub use api::SearchParams;pub use api::SearchResultWithText;pub use api::ShardexContext;pub use api::StoreDocumentText;pub use api::StoreDocumentTextParams;pub use shardex::Shardex;pub use shardex::ShardexImpl;pub use config::ShardexConfig;
Modules§
- api
- API module for Shardex using the ApiThing pattern
- async_
document_ text_ storage - Asynchronous I/O support for document text storage
- concurrent
- Concurrent read/write coordination for Shardex operations
- concurrent_
document_ text_ storage - Concurrent document text storage with reader-writer optimization
- config
- Configuration structures for Shardex
- cow_
index - Copy-on-Write Index Implementation
- crash_
recovery - WAL-Based Crash Recovery for Shardex
- document_
text_ entry - Document text storage data structures for Shardex
- document_
text_ storage - Document text storage implementation for Shardex
- error
- Error types for Shardex operations
- identifiers
- Identifier types for Shardex
- layout
- Directory structure and file layout management for Shardex indexes
- monitoring
- Monitoring and Statistics Collection
- posting_
storage - Memory-mapped posting storage for efficient posting metadata management
- shard
- Individual shard creation and management
- shardex
- Main Shardex API implementation
- shardex_
index - In-Memory Index (Shardex) Structure
- structures
- Core data structures for Shardex
- text_
memory_ pool - Memory pool for text operations to reduce allocation overhead
- transactions
- Transaction recording and batching for WAL operations
- vector_
storage - Memory-mapped vector storage using Arrow arrays for high-performance vector operations
- wal
- Write-Ahead Log segment management for Shardex
- wal_
replay - WAL Replay for Recovery
Type Aliases§
- Result
- Type alias for Results using ShardexError