openmemory 0.1.0

# OpenMemory Rust SDK - PRD (Product Requirements Document)

A detailed plan for porting the OpenMemory JavaScript SDK to Rust.

> **Last Updated**: 2025-12-05
> **Current Status**: Phase 1-4, 7 Complete (84 tests + 5 doc-tests passing)
> **JS SDK Alignment Review**: Complete

### Recent Changes (JS SDK Alignment)

| Area | Changes |
|------|---------|
| **Decay - Tier Classification** | 6-day threshold, added coactivations check (matches JS) |
| **Decay - Activity Boost** | Applied `salience * (1 + ln(1 + coactivations))` formula |
| **HSG - Sector Penalty** | Applied penalty based on sector relationship matrix |
| **HSG - Feedback Score** | EMA feedback score update for query results |
| **HSG - Waypoint Expansion** | Implemented BFS-based graph traversal (expand_via_waypoints) |
| **HSG - Keyword Filtering** | Keyword-based filtering for Hybrid tier |
| **Sector Classification** | Semantic preferred as tiebreaker when scores are equal |
| **Gemini Provider** | Google Gemini API integration (batch support) |
| **Bedrock Provider** | AWS Bedrock API integration (feature flag support) |

---

## 1. Project Structure

```
openmemory-rs/
├── Cargo.toml
├── src/
│   ├── lib.rs                  # Library entry point
│   ├── core/
│   │   ├── mod.rs
│   │   ├── config.rs           # cfg.ts → Configuration
│   │   ├── db.rs               # db.ts → SQLite layer
│   │   ├── types.rs            # types.ts → Type definitions
│   │   └── error.rs            # Error types (new)
│   ├── memory/
│   │   ├── mod.rs
│   │   ├── embed/
│   │   │   ├── mod.rs
│   │   │   ├── openai.rs       # OpenAI provider
│   │   │   ├── gemini.rs       # Google Gemini
│   │   │   ├── ollama.rs       # Ollama (local)
│   │   │   ├── bedrock.rs      # AWS Bedrock
│   │   │   └── synthetic.rs    # Synthetic embeddings
│   │   ├── hsg.rs              # Hybrid Similarity Graph
│   │   ├── decay.rs            # Memory decay
│   │   └── reflect.rs          # Reflection/inference
│   ├── ops/
│   │   ├── mod.rs
│   │   ├── compress.rs         # Compression
│   │   ├── extract.rs          # Extraction
│   │   └── ingest.rs           # Ingestion
│   ├── temporal_graph/
│   │   ├── mod.rs
│   │   ├── types.rs
│   │   ├── store.rs
│   │   └── query.rs
│   └── utils/
│       ├── mod.rs
│       ├── text.rs             # Tokenization, normalization
│       ├── chunking.rs         # Chunking
│       └── keyword.rs          # Keyword extraction
├── tests/
│   ├── integration/
│   └── unit/
├── examples/
│   └── basic_usage.rs
└── benches/
    └── performance.rs
```

---

## 2. Cargo.toml Dependencies

```toml
[package]
name = "openmemory"
version = "0.1.0"
edition = "2021"
description = "OpenMemory - Cognitive memory system for AI applications"
license = "MIT"
repository = "https://github.com/example/openmemory-rs"

[dependencies]
# Async runtime
tokio = { version = "1", features = ["full"] }

# SQLite
rusqlite = { version = "0.31", features = ["bundled", "blob"] }
# Or for async: sqlx = { version = "0.7", features = ["sqlite", "runtime-tokio"] }

# HTTP client
reqwest = { version = "0.11", features = ["json"] }

# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# Vector operations
ndarray = "0.15"

# Utilities
uuid = { version = "1", features = ["v4"] }
chrono = { version = "0.4", features = ["serde"] }
regex = "1"
lazy_static = "1"
thiserror = "1"          # Error handling
log = "0.4"
env_logger = "0.10"
dotenvy = "0.15"         # .env file loading

# Caching
lru = "0.12"

# Parallel processing
rayon = "1"

# Optional: AWS SDK (managed via feature flag)
aws-sdk-bedrockruntime = { version = "1", optional = true }
aws-config = { version = "1", optional = true }

[features]
default = []
aws = ["aws-sdk-bedrockruntime", "aws-config"]

[dev-dependencies]
tokio-test = "0.4"
criterion = "0.5"        # Benchmarking
tempfile = "3"           # Temporary files for testing

[[bench]]
name = "performance"
harness = false
```

---

## 3. Implementation Plan by Phase

### Phase 1: Core Infrastructure (Foundation) ✅ Complete

| Order | Module | Main Tasks | Difficulty | Actual Lines | Status |
|-------|--------|------------|------------|--------------|--------|
| 1.1 | `core/error.rs` | Custom error types (thiserror) | Low | 44 | ✅ |
| 1.2 | `core/types.rs` | `MemRow`, `HsgQueryResult`, `EmbeddingResult` structs | Low | 347 | ✅ |
| 1.3 | `core/config.rs` | Environment variable loading, `Config` struct | Low | 324 | ✅ |
| 1.4 | `core/db.rs` | SQLite connection, schema creation, basic CRUD | Medium | 635 | ✅ |

**Example Code - core/types.rs:**

```rust
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemRow {
    pub id: String,
    pub content: String,
    pub primary_sector: Sector,
    pub tags: Option<Vec<String>>,
    pub meta: Option<serde_json::Value>,
    pub user_id: Option<String>,
    pub created_at: i64,
    pub updated_at: i64,
    pub last_seen_at: i64,
    pub salience: f64,
    pub decay_lambda: f64,
    pub version: i32,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Sector {
    Episodic,
    Semantic,
    Procedural,
    Emotional,
    Reflective,
}

impl Sector {
    pub fn default_decay_lambda(&self) -> f64 {
        match self {
            Sector::Episodic => 0.015,
            Sector::Semantic => 0.005,
            Sector::Procedural => 0.008,
            Sector::Emotional => 0.02,
            Sector::Reflective => 0.001,
        }
    }
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HsgQueryResult {
    pub id: String,
    pub content: String,
    pub score: f64,
    pub sectors: Vec<Sector>,
    pub primary_sector: Sector,
    pub path: Vec<String>,
    pub salience: f64,
    pub last_seen_at: i64,
}

#[derive(Debug, Clone)]
pub struct EmbeddingResult {
    pub sector: Sector,
    pub vector: Vec<f32>,
    pub dim: usize,
}
```

**Example Code - core/error.rs:**

```rust
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Error {
    #[error("Database error: {0}")]
    Database(#[from] rusqlite::Error),

    #[error("HTTP error: {0}")]
    Http(#[from] reqwest::Error),

    #[error("JSON error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("Configuration error: {0}")]
    Config(String),

    #[error("Embedding error: {0}")]
    Embedding(String),

    #[error("Not found: {0}")]
    NotFound(String),

    #[error("Invalid input: {0}")]
    InvalidInput(String),
}

pub type Result<T> = std::result::Result<T, Error>;
```

---

### Phase 2: Utilities ✅ Complete

| Order | Module | Main Tasks | Difficulty | Actual Lines | Status |
|-------|--------|------------|------------|--------------|--------|
| 2.1 | `utils/text.rs` | Tokenization, normalization, synonym handling | Low | 341 | ✅ |
| 2.2 | `utils/chunking.rs` | Text chunking (sentence/paragraph splitting) | Low | 343 | ✅ |
| 2.3 | `utils/keyword.rs` | TF-IDF based keyword extraction | Medium | 355 | ✅ |

**Example Code - utils/text.rs:**

```rust
use lazy_static::lazy_static;
use regex::Regex;
use std::collections::HashMap;

lazy_static! {
    static ref TOKENIZE_RE: Regex = Regex::new(r"[\w']+").unwrap();
    static ref SYNONYMS: HashMap<&'static str, &'static str> = {
        let mut m = HashMap::new();
        m.insert("quick", "fast");
        m.insert("happy", "joyful");
        // ... more synonyms
        m
    };
}

pub fn tokenize(text: &str) -> Vec<String> {
    TOKENIZE_RE
        .find_iter(&text.to_lowercase())
        .map(|m| m.as_str().to_string())
        .collect()
}

pub fn normalize(text: &str) -> String {
    text.to_lowercase()
        .chars()
        .filter(|c| c.is_alphanumeric() || c.is_whitespace())
        .collect::<String>()
        .split_whitespace()
        .collect::<Vec<_>>()
        .join(" ")
}

pub fn expand_synonyms(tokens: &[String]) -> Vec<String> {
    let mut expanded = tokens.to_vec();
    for token in tokens {
        if let Some(&synonym) = SYNONYMS.get(token.as_str()) {
            expanded.push(synonym.to_string());
        }
    }
    expanded
}
```

---

### Phase 3: Embedding System ✅ Complete

| Order | Module | Main Tasks | Difficulty | Actual Lines | Status |
|-------|--------|------------|------------|--------------|--------|
| 3.1 | `memory/embed/mod.rs` | `EmbeddingProvider` trait definition | Medium | 199 | ✅ |
| 3.2 | `memory/embed/synthetic.rs` | Local synthetic embeddings (no external API) | High | 340 | ✅ |
| 3.3 | `memory/embed/openai.rs` | OpenAI API integration | Medium | 168 | ✅ |
| 3.4 | `memory/embed/ollama.rs` | Ollama local server integration | Medium | 171 | ✅ |
| 3.5 | `memory/embed/gemini.rs` | Google Gemini API integration (batch support) | Medium | 263 | ✅ |
| 3.6 | `memory/embed/bedrock.rs` | AWS Bedrock integration (feature flag) | High | 165 | ✅ |

**Example Code - memory/embed/mod.rs:**

```rust
use async_trait::async_trait;
use crate::core::{error::Result, types::EmbeddingResult, config::Config};

#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, text: &str) -> Result<EmbeddingResult>;
    async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<EmbeddingResult>>;
    fn dimensions(&self) -> usize;
    fn name(&self) -> &'static str;
}

pub mod synthetic;
pub mod openai;
pub mod ollama;
pub mod gemini;

#[cfg(feature = "aws")]
pub mod bedrock;

pub fn create_provider(config: &Config) -> Box<dyn EmbeddingProvider> {
    match config.embedding_kind {
        EmbeddingKind::Synthetic => Box::new(synthetic::SyntheticProvider::new(config)),
        EmbeddingKind::OpenAI => Box::new(openai::OpenAIProvider::new(config)),
        EmbeddingKind::Ollama => Box::new(ollama::OllamaProvider::new(config)),
        EmbeddingKind::Gemini => Box::new(gemini::GeminiProvider::new(config)),
        #[cfg(feature = "aws")]
        EmbeddingKind::Bedrock => Box::new(bedrock::BedrockProvider::new(config)),
        #[cfg(not(feature = "aws"))]
        EmbeddingKind::Bedrock => panic!("AWS feature not enabled"),
    }
}
```

**Example Code - memory/embed/synthetic.rs (core part):**

```rust
use ndarray::Array1;

pub struct SyntheticProvider {
    dim: usize,
}

impl SyntheticProvider {
    pub fn new(config: &Config) -> Self {
        Self { dim: config.vec_dim }
    }

    fn extract_features(&self, text: &str) -> Array1<f32> {
        let mut features = Array1::zeros(self.dim);
        let tokens = tokenize(text);

        // 1. TF-IDF based token weights
        self.add_token_weights(&mut features, &tokens);

        // 2. N-gram features
        self.add_ngram_features(&mut features, text);

        // 3. Positional encoding
        self.add_positional_encoding(&mut features, &tokens);

        // 4. L2 normalization
        let norm = features.dot(&features).sqrt();
        if norm > 0.0 {
            features /= norm;
        }

        features
    }
}
```

---

### Phase 4: Core Memory Engine ✅ Complete

| Order | Module | Main Tasks | Difficulty | Actual Lines | Status |
|-------|--------|------------|------------|--------------|--------|
| 4.1 | `memory/hsg.rs` | HSG query engine (BFS + keyword filtering included) | Very High | 710 | ✅ |
| 4.2 | `memory/decay.rs` | Memory decay system (activity boost) | High | 246 | ✅ |
| 4.3 | `memory/reflect.rs` | Reflection/inference functionality | Medium | - | ⏳ Not implemented |

**HSG Implementation Key Points:**

1. **Sector Classification**: Regex pattern matching + tiebreaker (Semantic preferred)
2. **Cosine Similarity**: Vector similarity calculation + sector relationship penalty
3. **Waypoint Expansion (BFS)**: Graph traversal to discover related memories (same as JS expand_via_waypoints)
4. **Keyword Filtering**: TF-IDF/BM25 based filtering for Hybrid tier
5. **Score Ensemble**: similarity + overlap + waypoint + recency + tag_match + keyword_boost
6. **Feedback Learning**: EMA feedback score update for query results

**Example Code - memory/hsg.rs:**

```rust
use crate::core::{db::Database, types::*, error::Result};
use crate::memory::embed::EmbeddingProvider;

pub struct HsgEngine {
    db: Database,
    embedder: Box<dyn EmbeddingProvider>,
}

impl HsgEngine {
    pub async fn query(
        &self,
        query: &str,
        options: QueryOptions,
    ) -> Result<Vec<HsgQueryResult>> {
        // 1. Generate query embedding
        let query_embedding = self.embedder.embed(query).await?;

        // 2. Vector similarity search (top-k * 3 candidates)
        let candidates = self.vector_search(&query_embedding, options.k * 3)?;

        // 3. Waypoint graph expansion
        let expanded = self.expand_waypoints(&candidates)?;

        // 4. Ensemble score calculation
        let mut scored = self.compute_ensemble_scores(&expanded, query)?;

        // 5. Sort and return top K
        scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
        Ok(scored.into_iter().take(options.k).collect())
    }

    fn compute_ensemble_scores(
        &self,
        candidates: &[MemRow],
        query: &str,
    ) -> Result<Vec<HsgQueryResult>> {
        // Score formula:
        // final = sigmoid(
        //   0.40 * similarity +
        //   0.20 * token_overlap +
        //   0.15 * waypoint_weight +
        //   0.15 * recency +
        //   0.10 * tag_match
        // )
        todo!()
    }

    fn classify_sector(&self, content: &str) -> (Sector, Vec<Sector>) {
        // Sector classification via regex pattern matching
        todo!()
    }
}
```

**Score Calculation Formula:**

```
final_score = sigmoid(
    0.40 * boosted_similarity +
    0.20 * token_overlap +
    0.15 * waypoint_weight +
    0.15 * recency_score +
    0.10 * tag_match_score +
    keyword_boost
)
```

**Decay Formula (decay.rs):**

```
decayed = initial_salience × exp(-lambda × days_since)
reinforcement = alpha × (1 - exp(-lambda × days_since))
final_salience = max(0, min(1, decayed + reinforcement))
```

---

### Phase 5: Operations ⏳ Optional (Not Implemented)

> **Note**: Only partially implemented in JS SDK. Requires external dependencies (PDF/DOCX parsers) and LLM integration.

| Order | Module | Main Tasks | Difficulty | Est. Lines | Status |
|-------|--------|------------|------------|------------|--------|
| 5.1 | `ops/compress.rs` | Memory compression (vector dimension reduction) | Medium | ~150 | ⏳ |
| 5.2 | `ops/extract.rs` | Information extraction | Low | ~100 | ⏳ |
| 5.3 | `ops/ingest.rs` | Data ingestion (PDF, DOCX, etc.) | High | ~250 | ⏳ |

---

### Phase 6: Temporal Graph ⏳ Optional (Not Implemented)

> **Note**: Independent from core memory functionality. Type definitions included in `core/types.rs`.

| Order | Module | Main Tasks | Difficulty | Est. Lines | Status |
|-------|--------|------------|------------|------------|--------|
| 6.1 | `temporal_graph/types.rs` | `TemporalFact`, `TemporalEdge` | Low | ~60 | ⏳ |
| 6.2 | `temporal_graph/store.rs` | CRUD operations | Medium | ~200 | ⏳ |
| 6.3 | `temporal_graph/query.rs` | Time-based queries | Medium | ~180 | ⏳ |

**Example Code - temporal_graph/types.rs:**

```rust
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TemporalFact {
    pub id: String,
    pub subject: String,
    pub predicate: String,
    pub object: String,
    pub valid_from: DateTime<Utc>,
    pub valid_to: Option<DateTime<Utc>>,
    pub confidence: f64,
    pub last_updated: DateTime<Utc>,
    pub metadata: Option<serde_json::Value>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TemporalEdge {
    pub id: String,
    pub source_id: String,
    pub target_id: String,
    pub relation_type: String,
    pub valid_from: DateTime<Utc>,
    pub valid_to: Option<DateTime<Utc>>,
    pub weight: f64,
    pub metadata: Option<serde_json::Value>,
}
```

---

### Phase 7: Public API and Finalization ✅ Complete

| Order | Module | Main Tasks | Difficulty | Actual Lines | Status |
|-------|--------|------------|------------|--------------|--------|
| 7.1 | `lib.rs` | `OpenMemory` struct (main API) | Medium | 391 | ✅ |
| 7.2 | Tests | Unit tests (84 passing) | Medium | Built-in | ✅ |
| 7.3 | Documentation | rustdoc comments | Low | Included | ✅ |
| 7.4 | Benchmarks | criterion benchmarks | Low | 65 | ✅ |

---

## 4. Public API Design

```rust
// lib.rs

/// OpenMemory - Cognitive memory system for AI applications
pub struct OpenMemory {
    config: Config,
    db: Database,
    embedder: Box<dyn EmbeddingProvider>,
    hsg: HsgEngine,
}

impl OpenMemory {
    /// Create a new instance
    ///
    /// # Example
    /// ```rust
    /// let om = OpenMemory::new(OpenMemoryOptions {
    ///     path: "./data/memory.db".into(),
    ///     tier: Tier::Smart,
    ///     embeddings: EmbeddingKind::Synthetic,
    ///     ..Default::default()
    /// }).await?;
    /// ```
    pub async fn new(options: OpenMemoryOptions) -> Result<Self>;

    /// Add a memory
    ///
    /// # Arguments
    /// * `content` - Text content to store
    /// * `options` - Tags, metadata, salience, etc.
    ///
    /// # Returns
    /// Created memory ID and classified sector information
    pub async fn add(&self, content: &str, options: AddOptions) -> Result<AddResult>;

    /// Query memories
    ///
    /// Search for related memories using the HSG (Hybrid Similarity Graph) algorithm
    pub async fn query(&self, query: &str, options: QueryOptions) -> Result<Vec<HsgQueryResult>>;

    /// Delete a memory
    pub async fn delete(&self, id: &str) -> Result<()>;

    /// Get all memories
    pub async fn get_all(&self, options: GetAllOptions) -> Result<Vec<MemRow>>;

    /// Run memory decay (batch)
    ///
    /// Process salience decay over time
    pub async fn run_decay(&self) -> Result<DecayStats>;

    /// Reinforce a specific memory
    pub async fn reinforce(&self, id: &str, boost: f64) -> Result<()>;
}

// Option structs
#[derive(Default, Clone)]
pub struct OpenMemoryOptions {
    pub path: String,
    pub tier: Tier,
    pub embeddings: EmbeddingKind,
    pub vec_dim: Option<usize>,
    pub user_id: Option<String>,
}

#[derive(Default, Clone)]
pub struct AddOptions {
    pub tags: Option<Vec<String>>,
    pub metadata: Option<serde_json::Value>,
    pub user_id: Option<String>,
    pub salience: Option<f64>,
    pub decay_lambda: Option<f64>,
}

#[derive(Clone)]
pub struct QueryOptions {
    pub k: usize,                       // Default: 10
    pub sectors: Option<Vec<Sector>>,
    pub min_salience: Option<f64>,
    pub user_id: Option<String>,
}

impl Default for QueryOptions {
    fn default() -> Self {
        Self {
            k: 10,
            sectors: None,
            min_salience: None,
            user_id: None,
        }
    }
}

#[derive(Debug, Clone)]
pub struct AddResult {
    pub id: String,
    pub primary_sector: Sector,
    pub sectors: Vec<Sector>,
}

#[derive(Debug, Clone)]
pub struct DecayStats {
    pub processed: usize,
    pub decayed: usize,
    pub compressed: usize,
    pub duration_ms: u64,
}
```

---

## 5. Database Schema

```sql
-- Memory table
CREATE TABLE IF NOT EXISTS memories (
    id TEXT PRIMARY KEY,
    user_id TEXT,
    segment INTEGER DEFAULT 0,
    content TEXT NOT NULL,
    simhash TEXT,
    primary_sector TEXT NOT NULL,
    tags TEXT,                    -- JSON array
    meta TEXT,                    -- JSON object
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL,
    last_seen_at INTEGER NOT NULL,
    salience REAL NOT NULL,
    decay_lambda REAL NOT NULL,
    version INTEGER DEFAULT 1,
    mean_dim INTEGER,
    mean_vec BLOB,
    compressed_vec BLOB,
    feedback_score REAL DEFAULT 0
);

-- Vector store
CREATE TABLE IF NOT EXISTS vectors (
    id TEXT NOT NULL,
    sector TEXT NOT NULL,
    user_id TEXT,
    v BLOB NOT NULL,              -- f32 array as bytes
    dim INTEGER NOT NULL,
    PRIMARY KEY(id, sector)
);

-- Waypoint graph
CREATE TABLE IF NOT EXISTS waypoints (
    src_id TEXT,
    dst_id TEXT NOT NULL,
    user_id TEXT,
    weight REAL NOT NULL,
    created_at INTEGER,
    updated_at INTEGER,
    PRIMARY KEY(src_id, user_id)
);

-- Temporal facts
CREATE TABLE IF NOT EXISTS temporal_facts (
    id TEXT PRIMARY KEY,
    subject TEXT NOT NULL,
    predicate TEXT NOT NULL,
    object TEXT NOT NULL,
    valid_from INTEGER NOT NULL,
    valid_to INTEGER,
    confidence REAL NOT NULL CHECK(confidence >= 0 AND confidence <= 1),
    last_updated INTEGER NOT NULL,
    metadata TEXT,
    UNIQUE(subject, predicate, object, valid_from)
);

-- Temporal edges
CREATE TABLE IF NOT EXISTS temporal_edges (
    id TEXT PRIMARY KEY,
    source_id TEXT NOT NULL,
    target_id TEXT NOT NULL,
    relation_type TEXT NOT NULL,
    valid_from INTEGER NOT NULL,
    valid_to INTEGER,
    weight REAL NOT NULL,
    metadata TEXT,
    FOREIGN KEY(source_id) REFERENCES temporal_facts(id),
    FOREIGN KEY(target_id) REFERENCES temporal_facts(id)
);

-- Indexes
CREATE INDEX IF NOT EXISTS idx_memories_user ON memories(user_id);
CREATE INDEX IF NOT EXISTS idx_memories_sector ON memories(primary_sector);
CREATE INDEX IF NOT EXISTS idx_memories_salience ON memories(salience);
CREATE INDEX IF NOT EXISTS idx_vectors_user ON vectors(user_id);
CREATE INDEX IF NOT EXISTS idx_temporal_facts_subject ON temporal_facts(subject);
CREATE INDEX IF NOT EXISTS idx_temporal_facts_validity ON temporal_facts(valid_from, valid_to);
```

---

## 6. Key Challenges and Solutions

| Challenge | JS Implementation | Rust Solution |
|-----------|-------------------|---------------|
| **Vector Storage (BLOB)** | `Float32Array.buffer` | `bytemuck::cast_slice` or direct byte conversion |
| **JSON Columns** | `JSON.parse/stringify` | `serde_json` + rusqlite `FromSql/ToSql` impl |
| **Async DB** | Callback-based | `rusqlite` (sync) + `tokio::task::spawn_blocking` |
| **Rate Limiting** | Manual sleep | `tower::limit::RateLimit` or manual exponential backoff |
| **Regex** | JS RegExp | `regex` crate (nearly identical syntax) |
| **Synonym Dictionary** | Hardcoded Map | `lazy_static!` + `HashMap` |
| **File Parsing** | mammoth, pdf-parse | `lopdf`, `docx-rs` (quality review needed) |

**Vector Conversion Example:**

```rust
use bytemuck::{cast_slice, cast_slice_mut};

fn vec_to_blob(v: &[f32]) -> Vec<u8> {
    cast_slice(v).to_vec()
}

fn blob_to_vec(blob: &[u8]) -> Vec<f32> {
    cast_slice(blob).to_vec()
}
```

---

## 7. Performance Optimization Points

1. **SIMD Vector Operations**
   - `ndarray` + BLAS backend (`openblas` or `intel-mkl`)
   - Or use `simsimd` crate

2. **Parallel Processing**
   - `rayon` for decay batch processing
   - Parallelized embedding batch processing

3. **Connection Pooling**
   - `r2d2` (sync) or `sqlx` pool (async)

4. **LRU Cache**
   - `lru` crate for embedding/query result caching
   - Configurable cache size

5. **Zero-Copy**
   - `Cow<str>` to minimize string copying
   - Streaming for large text processing

---

## 8. Implementation Order Summary

```
Phase 1 (Foundation) ✅ → Phase 2 (Utils) ✅  → Phase 3 (Embedding) ✅
    ↓                        ↓                     ↓
  error.rs ✅              text.rs ✅           mod.rs (trait) ✅
  types.rs ✅              chunking.rs ✅       synthetic.rs ✅
  config.rs ✅             keyword.rs ✅        openai.rs ✅
  db.rs ✅                                      ollama.rs ✅
                                                gemini.rs ✅
                                                bedrock.rs ✅

        ↓                       ↓                     ↓

Phase 4 (Core) ✅       → Phase 5 (Ops) ⏳    → Phase 6 (Temporal Graph) ⏳
    ↓                        ↓                     ↓
  hsg.rs ✅               compress.rs ⏳        types.rs ⏳
  decay.rs ✅             extract.rs ⏳         store.rs ⏳
  reflect.rs ⏳           ingest.rs ⏳          query.rs ⏳

        ↓

Phase 7 (Finalization) ✅
    ↓
  lib.rs (API) ✅
  tests/ (84 + 5 doc-tests passing) ✅
  benches/ ✅
```

**Current Status**: Core functionality complete (84 unit tests + 5 doc-tests passing)

---

## 9. Optional Features (Future Extensions)

| Feature | Description | Dependencies |
|---------|-------------|--------------|
| **FFI Bindings** | C/Python bindings | `pyo3`, `cbindgen` |
| **WASM Support** | Run in browser | `wasm-bindgen`, `wasm-pack` |
| **CLI Tool** | Command-line interface | `clap` |
| **REST Server** | HTTP API server | `axum` or `actix-web` |
| **gRPC Server** | gRPC protocol support | `tonic` |

---

## 10. Testing Strategy

### Unit Tests

```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_sector_classification() {
        let hsg = HsgEngine::new(/* ... */);

        let (primary, _) = hsg.classify_sector("I went to the store yesterday");
        assert_eq!(primary, Sector::Episodic);

        let (primary, _) = hsg.classify_sector("The capital of France is Paris");
        assert_eq!(primary, Sector::Semantic);
    }

    #[test]
    fn test_cosine_similarity() {
        let a = vec![1.0, 0.0, 0.0];
        let b = vec![1.0, 0.0, 0.0];
        assert!((cosine_similarity(&a, &b) - 1.0).abs() < 1e-6);
    }
}
```

### Integration Tests

```rust
// tests/integration/memory_test.rs
#[tokio::test]
async fn test_add_and_query() {
    let om = OpenMemory::new(OpenMemoryOptions {
        path: ":memory:".into(),
        tier: Tier::Fast,
        embeddings: EmbeddingKind::Synthetic,
        ..Default::default()
    }).await.unwrap();

    let result = om.add("Test memory content", AddOptions::default()).await.unwrap();
    assert!(!result.id.is_empty());

    let results = om.query("Test", QueryOptions::default()).await.unwrap();
    assert!(!results.is_empty());
    assert!(results[0].content.contains("Test"));
}
```

---

## 11. Benchmarks

```rust
// benches/performance.rs
use criterion::{criterion_group, criterion_main, Criterion};

fn embedding_benchmark(c: &mut Criterion) {
    let provider = SyntheticProvider::new(256);

    c.bench_function("synthetic_embed", |b| {
        b.iter(|| {
            provider.embed("This is a test sentence for embedding")
        })
    });
}

fn query_benchmark(c: &mut Criterion) {
    // Measure query performance with 1000 memories
    c.bench_function("hsg_query_1k", |b| {
        b.iter(|| {
            // ...
        })
    });
}

criterion_group!(benches, embedding_benchmark, query_benchmark);
criterion_main!(benches);
```

---

## 12. Code Size (Actual Implementation)

| Module | Estimated Lines | Actual Lines | Status |
|--------|-----------------|--------------|--------|
| core/ | ~700 | 1,350 | ✅ |
| memory/embed/ | ~1,000 | 1,135 | ✅ |
| memory/ (hsg, decay) | ~1,150 | 956 | ✅ |
| ops/ | ~500 | - | ⏳ |
| temporal_graph/ | ~440 | - | ⏳ |
| utils/ | ~350 | 1,039 | ✅ |
| lib.rs | ~300 | 391 | ✅ |
| benches/ | ~100 | 65 | ✅ |
| **Total Implemented** | - | **~4,936** | - |

> Note: Line counts exceeded estimates due to unit tests included in each module
> Gemini (263) and Bedrock (165) providers added to embed/ line count
> HSG includes waypoint expansion (BFS) and keyword filtering

---

## 13. References

- [JS SDK Source Code](../src/)
- [rusqlite Documentation](https://docs.rs/rusqlite)
- [ndarray Documentation](https://docs.rs/ndarray)
- [tokio Async Guide](https://tokio.rs/tokio/tutorial)
- [serde Serialization Guide](https://serde.rs/)