Crate shardex

Crate shardex 

Source
Expand description

§Shardex - High-Performance Vector Search Engine

Shardex provides a memory-mapped vector search engine with the ApiThing pattern for consistent, composable, and type-safe operations.

§Architecture

The library is built around three core concepts:

  • ShardexContext: Shared state and resource management
  • Operations: Types implementing ApiOperation trait
  • Parameters: Type-safe input objects for each operation

§Core Operations

§Index Management

§Document Operations

§Search Operations

§Maintenance Operations

§Usage Patterns

All operations follow the same pattern:

use apithing::ApiOperation;

let result = OperationType::execute(&mut context, &parameters)?;

§Quick Start

use shardex::api::{
    ShardexContext, CreateIndex, AddPostings, Search,
    CreateIndexParams, AddPostingsParams, SearchParams
};
use shardex::{DocumentId, Posting};
use apithing::ApiOperation;

// Create context and index
let mut context = ShardexContext::new();
let create_params = CreateIndexParams::builder()
    .directory_path("./my_index".into())
    .vector_size(384)
    .shard_size(10000)
    .batch_write_interval_ms(100)
    .build()?;
     
CreateIndex::execute(&mut context, &create_params)?;

// Add postings
let postings = vec![Posting {
    document_id: DocumentId::from_raw(1),
    start: 0,
    length: 100,
    vector: vec![0.1; 384],
}];
AddPostings::execute(&mut context, &AddPostingsParams::new(postings)?)?;

// Search
let results = Search::execute(&mut context, &SearchParams::builder()
    .query_vector(vec![0.1; 384])
    .k(10)
    .build()?
)?;

§Examples

The examples/ directory contains comprehensive examples:

  • basic_usage.rs - Basic operations
  • configuration.rs - Configuration options
  • batch_operations.rs - Batch processing
  • document_text_basic.rs - Text storage
  • monitoring.rs - Performance monitoring

Run examples with:

cargo run --example basic_usage

§Features

  • Consistent API: All operations use the ApiThing pattern
  • Type Safety: Parameter objects prevent errors
  • Shared Context: Efficient resource management
  • Memory-mapped storage for zero-copy operations and fast startup
  • ACID transactions via write-ahead logging (WAL)
  • Incremental updates without full index rebuilds
  • Document text storage with snippet extraction
  • Performance monitoring and detailed statistics
  • Dynamic shard management with automatic splitting
  • Concurrent reads during write operations
  • Bloom filter optimization for efficient document deletion
  • Crash recovery from unexpected shutdowns

§Development Guidelines

§Struct Definition Standards

§Default Implementation Rules

  1. PREFER #[derive(Default)] for structs with all zero/empty defaults:

    #[derive(Debug, Clone, Default)]
    pub struct SimpleMetrics {
        pub count: u64,
        pub total: u64,
    }
  2. USE manual impl Default only when:

    • Non-zero defaults are needed
    • Complex initialization is required
    • Fields contain non-Default types
    use std::time::Instant;
     
    #[derive(Debug, Clone)]
    pub struct ComplexMetrics {
        pub start_time: Instant,
        pub threshold: f64,
    }
     
    impl Default for ComplexMetrics {
        fn default() -> Self {
            Self {
                start_time: Instant::now(), // Can't derive this
                threshold: 0.95,           // Non-zero default
            }
        }
    }
  3. AVOID redundant patterns like:

    // DON'T DO THIS - just derive Default instead
    #[derive(Debug, Clone)]
    pub struct SomeStruct {
        pub count: u64,
    }
     
    impl SomeStruct {
        pub fn new() -> Self {
            Self { count: 0 }
        }
    }
     
    impl Default for SomeStruct {
        fn default() -> Self {
            Self::new() // If new() just sets zero/empty values
        }
    }

§Struct Size Guidelines

  1. MAXIMUM 15 fields per struct (prefer 10 or fewer)
  2. BREAK DOWN large structs into logical sub-structures:
    // Instead of one large struct with 30+ fields:
    #[derive(Debug, Clone, Default)]
    pub struct DocumentMetrics {
        pub documents_stored: u64,
        pub total_size: u64,
        pub average_latency: f64,
    }
  3. GROUP related fields into cohesive types
  4. USE composition over large flat structures

§Derive Attribute Ordering

Always use consistent ordering for derive attributes:

#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Hash, Serialize, Deserialize)]

Order: Debug, Clone, Copy (if applicable), Default, PartialEq, Eq, Hash, Serialize, Deserialize

§Builder Pattern Usage

Use builder patterns for:

  • Configuration structs with many optional parameters
  • Complex initialization sequences
  • Structs with validation requirements
#[derive(Debug, Clone, Default)]
pub struct MyConfig {
    pub timeout: u64,
}

impl MyConfig {
    pub fn new() -> Self { Self::default() }
     
    pub fn with_timeout(mut self, timeout: u64) -> Self {
        self.timeout = timeout;
        self
    }
}

Re-exports§

pub use error::ShardexError;
pub use identifiers::DocumentId;
pub use identifiers::ShardId;
pub use identifiers::TransactionId;
pub use structures::IndexStats;
pub use structures::Posting;
pub use structures::SearchResult;
pub use api::AddPostings;
pub use api::AddPostingsParams;
pub use api::BatchAddPostings;
pub use api::BatchAddPostingsParams;
pub use api::BatchDocumentTextStats;
pub use api::BatchStats;
pub use api::BatchStoreDocumentText;
pub use api::BatchStoreDocumentTextParams;
pub use api::CreateIndex;
pub use api::CreateIndexParams;
pub use api::DetailedIndexStats;
pub use api::DetailedPerformanceMetrics;
pub use api::DocumentTextEntry;
pub use api::ExtractSnippet;
pub use api::ExtractSnippetParams;
pub use api::Flush;
pub use api::FlushParams;
pub use api::GetDocumentText;
pub use api::GetDocumentTextParams;
pub use api::GetPerformanceStats;
pub use api::GetPerformanceStatsParams;
pub use api::GetStats;
pub use api::GetStatsParams;
pub use api::IncrementalAdd;
pub use api::IncrementalAddParams;
pub use api::IncrementalStats;
pub use api::PerformanceStats;
pub use api::RemovalStats;
pub use api::RemoveDocuments;
pub use api::RemoveDocumentsParams;
pub use api::Search;
pub use api::SearchParams;
pub use api::SearchResultWithText;
pub use api::ShardexContext;
pub use api::StoreDocumentText;
pub use api::StoreDocumentTextParams;
pub use shardex::Shardex;
pub use shardex::ShardexImpl;
pub use config::ShardexConfig;

Modules§

api
API module for Shardex using the ApiThing pattern
async_document_text_storage
Asynchronous I/O support for document text storage
concurrent
Concurrent read/write coordination for Shardex operations
concurrent_document_text_storage
Concurrent document text storage with reader-writer optimization
config
Configuration structures for Shardex
cow_index
Copy-on-Write Index Implementation
crash_recovery
WAL-Based Crash Recovery for Shardex
document_text_entry
Document text storage data structures for Shardex
document_text_storage
Document text storage implementation for Shardex
error
Error types for Shardex operations
identifiers
Identifier types for Shardex
layout
Directory structure and file layout management for Shardex indexes
monitoring
Monitoring and Statistics Collection
posting_storage
Memory-mapped posting storage for efficient posting metadata management
shard
Individual shard creation and management
shardex
Main Shardex API implementation
shardex_index
In-Memory Index (Shardex) Structure
structures
Core data structures for Shardex
text_memory_pool
Memory pool for text operations to reduce allocation overhead
transactions
Transaction recording and batching for WAL operations
vector_storage
Memory-mapped vector storage using Arrow arrays for high-performance vector operations
wal
Write-Ahead Log segment management for Shardex
wal_replay
WAL Replay for Recovery

Type Aliases§

Result
Type alias for Results using ShardexError