# Engram Embedding Protocol Specification v2
**Status:** Canonical
**Version:** 2.0
**Last Updated:** 2026-04-02
---
## 1. Overview
This document defines the **Engram Embedding Protocol v2**, the authoritative specification for storing and managing vector embeddings across all Engram implementations (Rust, Python, and future languages).
All implementations **MUST** conform to this specification to ensure compatibility and data portability.
### 1.1 Key Changes from v1
- **Multi-model support:** Primary key changed from `(memory_id)` to `(memory_id, model)`
- **Binary-only storage:** BLOB format required; TEXT/JSON storage is prohibited
- **Strict validation:** Finite value enforcement and dimension checking
- **Model namespacing:** Structured provider/model naming convention
---
## 2. Database Schema
### 2.1 Table Definition
```sql
CREATE TABLE IF NOT EXISTS memory_embeddings (
memory_id TEXT NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
model TEXT NOT NULL,
embedding BLOB NOT NULL,
dimensions INTEGER NOT NULL,
created_at TEXT NOT NULL,
PRIMARY KEY (memory_id, model)
);
CREATE INDEX IF NOT EXISTS idx_embeddings_model ON memory_embeddings(model);
```
### 2.2 Column Specifications
| `memory_id` | TEXT | NOT NULL, FK to memories(id) | UUID or unique identifier of the memory |
| `model` | TEXT | NOT NULL | Namespaced model identifier (see §3) |
| `embedding` | BLOB | NOT NULL | Raw binary f32 vector (see §4) |
| `dimensions` | INTEGER | NOT NULL | Vector dimension count (must match blob size) |
| `created_at` | TEXT | NOT NULL | ISO-8601 UTC timestamp |
### 2.3 Constraints
- **Primary Key:** Composite `(memory_id, model)` enables multiple embeddings per memory
- **Foreign Key:** `memory_id` references `memories(id)` with `ON DELETE CASCADE`
- **Index:** `idx_embeddings_model` optimizes model-scoped queries
---
## 3. Model Naming Convention
### 3.1 Format
Model identifiers **MUST** follow the pattern:
```
{provider}/{model_name}
```
### 3.2 Rules
1. **Immutability:** Once written, model strings are immutable (never update in place)
2. **Case-sensitivity:** Model names are case-sensitive
3. **No whitespace:** Use hyphens or underscores for multi-word names
4. **Provider namespace:** Prevents collisions across different embedding services
### 3.3 Canonical Models
| `ollama/nomic-embed-text` | 768 | Ollama (local) |
| `ollama/mxbai-embed-large` | 1024 | Ollama (local) |
| `openai/text-embedding-3-small` | 1536 | OpenAI |
| `openai/text-embedding-3-large` | 3072 | OpenAI |
| `openai/text-embedding-ada-002` | 1536 | OpenAI (legacy) |
| `local/minilm-l6-v2` | 384 | Local transformer |
| `cohere/embed-english-v3.0` | 1024 | Cohere |
| `voyage/voyage-2` | 1024 | Voyage AI |
### 3.4 Unknown/Legacy Models
Use `unknown/legacy` for migrated embeddings with unspecified models.
---
## 4. BLOB Format Specification
### 4.1 Binary Layout
Embeddings are stored as **raw little-endian IEEE 754 binary32 (f32) arrays**:
- **No header, no wrapper, no JSON**
- Byte length: `dimensions × 4`
- Endianness: Little-endian on all platforms
### 4.2 Serialization
**Rust:**
```rust
let bytes: Vec<u8> = embedding.iter().flat_map(|f| f.to_le_bytes()).collect();
```
**Python:**
```python
import struct
blob = struct.pack(f'<{len(embedding)}f', *embedding)
```
### 4.3 Deserialization
**Rust:**
```rust
fn bytes_to_f32_vec(bytes: &[u8]) -> Vec<f32> {
bytes.chunks_exact(4)
.map(|c| f32::from_le_bytes(c.try_into().unwrap()))
.collect()
}
```
**Python:**
```python
import struct
n = len(blob) // 4
embedding = list(struct.unpack(f'<{n}f', blob))
```
### 4.4 Validation Rules
Implementations **MUST** validate blobs before writing and after reading:
1. **Alignment:** `len(blob) % 4 == 0`
2. **Dimension match:** `len(blob) / 4 == dimensions`
3. **Finite values:** All f32 values must be finite (no NaN, no Inf)
**Validation pseudocode:**
```
function validate_blob(blob, expected_dimensions):
assert len(blob) % 4 == 0, "Blob length must be multiple of 4"
actual_dimensions = len(blob) / 4
assert actual_dimensions == expected_dimensions, "Dimension mismatch"
values = deserialize(blob)
for value in values:
assert is_finite(value), "Non-finite value detected"
```
---
## 5. Write Protocol
### 5.1 Insert/Update Query
```sql
INSERT OR REPLACE INTO memory_embeddings
(memory_id, model, embedding, dimensions, created_at)
VALUES (?, ?, ?, ?, ?);
```
### 5.2 Parameter Binding
1. `memory_id`: String (UUID or identifier from `memories.id`)
2. `model`: String (following §3 naming convention)
3. `embedding`: BLOB (validated per §4.4)
4. `dimensions`: Integer (must equal `len(embedding) / 4`)
5. `created_at`: String (ISO-8601 UTC, e.g., `2026-04-02T05:26:34.123Z`)
### 5.3 Semantics
- `INSERT OR REPLACE` is correct: re-embedding the same (memory_id, model) pair updates the row
- Each write **MUST** validate the blob before insertion
- The `created_at` timestamp reflects the embedding generation time, not the memory creation time
### 5.4 Example
```rust
// Rust example
let timestamp = Utc::now().to_rfc3339();
let blob = serialize_embedding(&vector); // Per §4.2
db.execute(
"INSERT OR REPLACE INTO memory_embeddings VALUES (?, ?, ?, ?, ?)",
params![memory_id, "ollama/nomic-embed-text", blob, 768, timestamp]
)?;
```
---
## 6. Read Protocol
### 6.1 Single Memory Query
```sql
SELECT embedding, dimensions
FROM memory_embeddings
WHERE memory_id = ? AND model = ?;
```
Returns the embedding for a specific memory using a specific model.
### 6.2 All Embeddings for Model
```sql
SELECT memory_id, embedding
FROM memory_embeddings
WHERE model = ?;
```
Returns all embeddings generated with a specific model (used for recall).
### 6.3 Post-Read Validation
After reading, implementations **MUST** validate:
```
assert len(blob) == dimensions * 4
```
### 6.4 Cross-Model Comparison
**UNDEFINED BEHAVIOR:** Never compute cosine similarity between vectors from different models.
Implementations **MUST** filter by model before similarity computations.
---
## 7. Recall Protocol
### 7.1 Query Embedding
1. Embed the query text using the **currently configured model**
2. Retrieve all embeddings for that model only
### 7.2 Similarity Computation
```sql
SELECT memory_id, embedding
FROM memory_embeddings
WHERE model = ?; -- Current model only
```
3. Deserialize each embedding
4. Compute cosine similarity against the query vector
5. Rank by similarity score
### 7.3 Coverage Check
Before recall, implementations **SHOULD** check embedding coverage:
```sql
SELECT
(SELECT COUNT(*) FROM memory_embeddings WHERE model = ?) AS embedded_count,
(SELECT COUNT(*) FROM memories) AS total_count;
```
**Warning condition:** If `embedded_count / total_count < 0.5`, log:
```
WARNING: Only {percentage}% of memories have embeddings for model {model}.
Consider running backfill to improve recall quality.
```
### 7.4 Empty Result Handling
If no embeddings exist for the current model, return empty results (do not fall back to other models).
---
## 8. Migration Protocol (v1 → v2)
### 8.1 Detection
Detect v1 schema by inspecting table structure:
- Primary key is only `(memory_id)`
- May have `embedding` column of type TEXT
### 8.2 Migration Steps
```sql
-- Step 1: Create new table
CREATE TABLE memory_embeddings_v2 (
memory_id TEXT NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
model TEXT NOT NULL,
embedding BLOB NOT NULL,
dimensions INTEGER NOT NULL,
created_at TEXT NOT NULL,
PRIMARY KEY (memory_id, model)
);
-- Step 2: Migrate BLOB rows (if model column exists)
INSERT INTO memory_embeddings_v2
(memory_id, model, embedding, dimensions, created_at)
SELECT memory_id, model, embedding, dimensions, created_at
FROM memory_embeddings
WHERE typeof(embedding) = 'blob';
-- Step 3: Handle TEXT/JSON rows
-- (Implementation-specific: parse JSON, convert to BLOB)
-- Step 4: Handle missing metadata
-- For rows without model: use 'unknown/legacy'
-- For rows without dimensions: infer from blob size
-- Step 5: Replace old table
DROP TABLE memory_embeddings;
ALTER TABLE memory_embeddings_v2 RENAME TO memory_embeddings;
-- Step 6: Create index
CREATE INDEX idx_embeddings_model ON memory_embeddings(model);
```
### 8.3 TEXT to BLOB Conversion
**Python example:**
```python
import json
import struct
# Read TEXT embedding
text_embedding = row['embedding']
vector = json.loads(text_embedding) # Parse JSON array
# Convert to BLOB
blob = struct.pack(f'<{len(vector)}f', *vector)
dimensions = len(vector)
# Insert into v2 table
cursor.execute(
"INSERT INTO memory_embeddings_v2 VALUES (?, ?, ?, ?, ?)",
(memory_id, model or 'unknown/legacy', blob, dimensions, created_at)
)
```
### 8.4 Version Tracking
After migration, update protocol version:
```sql
CREATE TABLE IF NOT EXISTS engram_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
INSERT OR REPLACE INTO engram_meta (key, value)
VALUES ('embedding_protocol_version', '2');
```
---
## 9. Model Switching Protocol
### 9.1 Principles
- **New memories:** Use currently configured model
- **Old memories:** Retain original model embeddings (immutable)
- **Multi-model storage:** Each memory can have embeddings from multiple models
### 9.2 Configuration Change
When the active model changes from `model_A` to `model_B`:
1. **Immediate effect:** New memories are embedded with `model_B`
2. **Existing embeddings:** Remain unchanged (both `model_A` and `model_B` rows can coexist)
3. **Recall:** Uses only `model_B` embeddings
### 9.3 Backfill Process
To improve recall coverage after model switch:
```python
# Query memories without embeddings for new model
missing_memories = db.execute("""
SELECT id, content
FROM memories
WHERE id NOT IN (
SELECT memory_id
FROM memory_embeddings
WHERE model = ?
)
""", [new_model]).fetchall()
# Embed and insert
for memory in missing_memories:
vector = embed(memory.content, new_model)
blob = serialize(vector)
db.execute(
"INSERT INTO memory_embeddings VALUES (?, ?, ?, ?, ?)",
[memory.id, new_model, blob, len(vector), now()]
)
```
### 9.4 Cleanup (Optional)
After full re-embedding, old model embeddings can be removed:
```sql
-- WARNING: This is irreversible
DELETE FROM memory_embeddings WHERE model = ?; -- Old model
```
**Recommendation:** Retain old embeddings for audit/rollback purposes unless storage is constrained.
---
## 10. Reference Implementations
### 10.1 Rust
**Serialization:**
```rust
use std::convert::TryInto;
fn serialize_embedding(embedding: &[f32]) -> Vec<u8> {
embedding.iter()
.flat_map(|f| f.to_le_bytes())
.collect()
}
```
**Deserialization:**
```rust
fn deserialize_embedding(bytes: &[u8]) -> Result<Vec<f32>, String> {
if bytes.len() % 4 != 0 {
return Err("Invalid blob length".to_string());
}
let vec: Vec<f32> = bytes
.chunks_exact(4)
.map(|chunk| f32::from_le_bytes(chunk.try_into().unwrap()))
.collect();
// Validate finite
if vec.iter().any(|f| !f.is_finite()) {
return Err("Non-finite value in embedding".to_string());
}
Ok(vec)
}
```
**Write:**
```rust
use chrono::Utc;
use rusqlite::params;
fn store_embedding(
conn: &Connection,
memory_id: &str,
model: &str,
embedding: &[f32],
) -> Result<()> {
let blob = serialize_embedding(embedding);
let dimensions = embedding.len() as i64;
let created_at = Utc::now().to_rfc3339();
conn.execute(
"INSERT OR REPLACE INTO memory_embeddings VALUES (?, ?, ?, ?, ?)",
params![memory_id, model, blob, dimensions, created_at],
)?;
Ok(())
}
```
### 10.2 Python
**Serialization:**
```python
import struct
def serialize_embedding(embedding: list[float]) -> bytes:
"""Convert float list to little-endian f32 BLOB."""
return struct.pack(f'<{len(embedding)}f', *embedding)
```
**Deserialization:**
```python
import struct
import math
def deserialize_embedding(blob: bytes) -> list[float]:
"""Convert BLOB to float list with validation."""
if len(blob) % 4 != 0:
raise ValueError("Invalid blob length")
n = len(blob) // 4
embedding = list(struct.unpack(f'<{n}f', blob))
# Validate finite
if not all(math.isfinite(v) for v in embedding):
raise ValueError("Non-finite value in embedding")
return embedding
```
**Write:**
```python
from datetime import datetime, timezone
def store_embedding(
conn,
memory_id: str,
model: str,
embedding: list[float]
) -> None:
blob = serialize_embedding(embedding)
dimensions = len(embedding)
created_at = datetime.now(timezone.utc).isoformat()
conn.execute(
"INSERT OR REPLACE INTO memory_embeddings VALUES (?, ?, ?, ?, ?)",
(memory_id, model, blob, dimensions, created_at)
)
conn.commit()
```
---
## 11. Versioning and Compatibility
### 11.1 Current Version
**Protocol Version:** 2
**Metadata Storage:**
```sql
CREATE TABLE IF NOT EXISTS engram_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
INSERT OR REPLACE INTO engram_meta (key, value)
VALUES ('embedding_protocol_version', '2');
```
### 11.2 Version Detection
Implementations **MUST** check protocol version on initialization:
```sql
SELECT value FROM engram_meta WHERE key = 'embedding_protocol_version';
```
- If missing or `< 2`: Trigger migration (see §8)
- If `> 2`: Log warning about newer protocol version
### 11.3 Future Changes
Future protocol updates will:
1. Increment version number
2. Document breaking changes
3. Provide automated migration paths
4. Maintain backward-read compatibility where possible
---
## 12. Error Handling
### 12.1 Validation Failures
| `BLOB_LENGTH_INVALID` | `len(blob) % 4 != 0` | Reject write, log error |
| `DIMENSION_MISMATCH` | `len(blob) / 4 != dimensions` | Reject write, log error |
| `NON_FINITE_VALUE` | NaN or Inf detected | Reject write, log error |
| `MODEL_NAME_INVALID` | Missing `/` separator | Reject write, suggest format |
### 12.2 Migration Failures
If migration encounters unrecoverable data:
1. Log detailed error with `memory_id`
2. Skip corrupted row
3. Continue migration
4. Report summary of skipped rows
### 12.3 Read Failures
If blob validation fails on read:
1. Log error with `memory_id` and `model`
2. Return error (do not silently skip)
3. Consider database corruption
---
## 13. Performance Considerations
### 13.1 Indexing
The `idx_embeddings_model` index is **critical** for recall performance:
- Enables O(log n) model filtering
- Required for efficient backfill queries
### 13.2 Blob Size
Typical embedding sizes:
- 384D (MiniLM): 1.5 KB per embedding
- 768D (Nomic): 3.1 KB per embedding
- 1536D (OpenAI): 6.1 KB per embedding
For 10,000 memories with 768D embeddings: ~31 MB storage.
### 13.3 Batch Operations
For backfill, use transactions:
```python
with conn: # Transaction context
for memory in batch:
store_embedding(conn, memory.id, model, vector)
```
Batch size recommendation: 100-1000 embeddings per transaction.
---
## 14. Security Considerations
### 14.1 Model String Injection
Model strings are user-controlled. Implementations **MUST**:
- Use parameterized queries (never string concatenation)
- Validate format (contains exactly one `/`)
- Limit length (e.g., 256 characters)
### 14.2 Blob Validation
Always validate blobs before deserialization to prevent:
- Buffer overflows (Rust: use `chunks_exact`)
- Memory exhaustion (check dimensions are reasonable)
- NaN/Inf propagation in similarity computations
---
## 15. Compliance
### 15.1 Implementation Checklist
- [ ] Schema matches §2 exactly
- [ ] Model naming follows §3 convention
- [ ] BLOB serialization per §4.2
- [ ] BLOB validation per §4.4
- [ ] Write protocol per §5
- [ ] Read protocol per §6
- [ ] Recall filters by model per §7
- [ ] Migration from v1 supported per §8
- [ ] Version tracking in `engram_meta`
- [ ] Reference implementation tests pass
### 15.2 Testing Requirements
Implementations **MUST** pass:
1. Round-trip serialization (serialize → deserialize → compare)
2. Cross-implementation compatibility (Rust ↔ Python)
3. Validation rejection (NaN, Inf, wrong dimensions)
4. Multi-model storage (same memory, different models)
5. Migration from v1 schema
---
## 16. Changelog
### Version 2.0 (2026-04-02)
- Initial formal specification
- Multi-model support via composite primary key
- Binary-only BLOB format requirement
- Structured model naming convention
- Migration protocol from v1
---
## 17. Authors and Governance
**Maintainer:** Engram Core Team
**License:** MIT
**Feedback:** Submit issues to engram-ai-rust repository
All implementations are expected to contribute test cases demonstrating compliance with this specification.
---
**End of Specification**