rlm-cli 1.3.0 - Docs.rs

---
title: "ADR-010: Switch to BGE-M3 Model"
description: "Decision to switch embedding model to BGE-M3."
sidebar:
  label: "ADR-010: BGE-M3 Model"
---

# ADR-010: Switch to BGE-M3 Embedding Model

## Status

Accepted

## Context

### Background and Problem Statement

The original embedding model choice (all-MiniLM-L6-v2) was made for its small size and fast inference. However, production usage revealed limitations:
- 384 dimensions provide lower semantic resolution
- ~512 token context often truncates larger chunks
- Multilingual support is limited

BGE-M3 offers significant improvements at the cost of larger model size.

### Current Limitations

1. **Dimension mismatch**: 384 dimensions limit semantic expressiveness
2. **Token truncation**: ~512 token limit truncates chunks near the 2000-byte default size
3. **Multilingual gaps**: MiniLM has limited non-English support

## Decision Drivers

### Primary Decision Drivers

1. **Context coverage**: BGE-M3's 8192 token limit covers full chunks without truncation
2. **Embedding quality**: 1024 dimensions provide richer semantic representation
3. **Consistency**: Full chunk content is embedded, not truncated

### Secondary Decision Drivers

1. **Multilingual support**: BGE-M3 handles non-English content better
2. **Model maturity**: BGE-M3 is well-established in the embedding community
3. **fastembed support**: Both models supported by fastembed-rs

## Considered Options

### Option 1: Switch to BGE-M3

**Description**: Replace all-MiniLM-L6-v2 with BGE-M3 embedding model.

**Technical Characteristics**:
- 1024 dimensions (vs 384)
- 8192 token context (vs ~512)
- ~1.3GB model size (vs ~90MB)
- Stronger multilingual support

**Advantages**:
- Full chunk coverage without truncation
- Higher semantic resolution
- Better multilingual embeddings
- More accurate semantic search

**Disadvantages**:
- Larger model download (~1.3GB vs ~90MB)
- Slightly slower inference
- Breaking change: requires schema migration
- Existing embeddings must be regenerated

**Risk Assessment**:
- **Technical Risk**: Low. fastembed-rs supports BGE-M3 well
- **Schedule Risk**: Low. Simple model swap
- **Ecosystem Risk**: Low. Well-established model

### Option 2: Keep all-MiniLM-L6-v2

**Description**: Maintain current model.

**Technical Characteristics**:
- 384 dimensions
- ~512 token context
- ~90MB model size

**Advantages**:
- Smaller model download
- Faster inference
- No migration needed

**Disadvantages**:
- Continued truncation issues
- Lower semantic quality
- Limited multilingual support

**Disqualifying Factor**: Token truncation undermines semantic search quality for typical chunk sizes.

**Risk Assessment**:
- **Technical Risk**: None. No change
- **Schedule Risk**: None. No change
- **Ecosystem Risk**: Low. Status quo

### Option 3: External Embedding API

**Description**: Switch to OpenAI or similar API embeddings.

**Technical Characteristics**:
- API-based embedding generation
- Higher quality models available
- Requires network and API key

**Advantages**:
- Access to latest models
- No local model storage

**Disadvantages**:
- Network dependency
- Privacy concerns
- API costs
- Conflicts with offline-first design

**Disqualifying Factor**: Violates offline-first and privacy principles established in ADR-007.

**Risk Assessment**:
- **Technical Risk**: Low. APIs are simple
- **Schedule Risk**: Low. Easy integration
- **Ecosystem Risk**: High. API dependency

## Decision

Switch from all-MiniLM-L6-v2 to BGE-M3 as the default embedding model.

The implementation will:
- Change `EmbeddingModel::AllMiniLML6V2` to `EmbeddingModel::BGEM3`
- Update `DEFAULT_DIMENSIONS` from 384 to 1024
- Add schema migration (v2→v3) to clear incompatible embeddings
- Keep model download silent (existing behavior)

## Consequences

### Positive

1. **Full chunk coverage**: 8192 tokens handles any reasonable chunk size
2. **Better search quality**: 1024 dimensions capture more semantic nuance
3. **Multilingual improvement**: Better handling of non-English content
4. **Future-proof**: More headroom for chunk size increases

### Negative

1. **Breaking change**: Existing embeddings incompatible (different dimensions)
2. **Larger download**: ~1.3GB model vs ~90MB
3. **Slower inference**: Larger model has higher compute cost
4. **Migration required**: Users must regenerate embeddings after upgrade

### Neutral

1. **Model download timing**: Same lazy loading pattern as before

## Decision Outcome

BGE-M3 provides a significant quality improvement for semantic search. The migration clears existing embeddings, requiring users to re-embed their content, but this is a one-time cost for lasting quality improvements.

Mitigations:
- Schema migration (v2→v3) automatically clears old embeddings
- Clear error messages guide users to re-embed
- Document migration in CHANGELOG
- Lazy loading preserves cold start for non-embedding operations

## Related Decisions

- [ADR-007: Embedded Embedding Model](007-embedded-embedding-model.md) - Embedding infrastructure
- [ADR-008: Hybrid Search](008-hybrid-search-with-rrf.md) - Semantic search uses embeddings

## Links

- [BGE-M3 Paper](https://arxiv.org/abs/2402.03216) - Model research paper
- [fastembed-rs](https://github.com/Anush008/fastembed-rs) - Rust embedding library
- [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) - Hugging Face model card

## More Information

- **Date:** 2025-01-20
- **Source:** Production usage feedback and quality analysis
- **Related ADRs:** ADR-007, ADR-008

## Audit

### 2025-01-20

**Status:** Compliant

**Findings:**

| Finding | Files | Lines | Assessment |
|---------|-------|-------|------------|
| BGE-M3 model configured | `src/embedding/fastembed_impl.rs` | L66 | compliant |
| DEFAULT_DIMENSIONS = 1024 | `src/embedding/mod.rs` | L27 | compliant |
| Schema version bumped to 3 | `src/storage/schema.rs` | L6 | compliant |
| Migration clears embeddings | `src/storage/schema.rs` | L179-183 | compliant |

**Summary:** BGE-M3 model switch fully implemented with schema migration.

**Action Required:** None