embedcache 0.1.1

High-performance text embedding service with caching capabilities
Documentation

EmbedCache

Stop recomputing embeddings. Start shipping faster.

Crates.io Documentation License: GPL-3.0 Rust

EmbedCache is a Rust library and REST API that generates text embeddings locally and caches the results. No external API calls, no per-token billing, no rate limits. Just fast, local embeddings with 22+ models.


Why EmbedCache?

Building RAG apps, semantic search, or anything with embeddings? You've probably hit these problems:

  • Recomputing the same embeddings every time you restart your app
  • Paying for API calls to embed text you've already processed
  • Waiting on rate limits when you need to embed thousands of documents
  • Vendor lock-in to a specific embedding provider

EmbedCache fixes all of this. Embeddings are generated locally using FastEmbed and cached in SQLite. Process a URL once, get instant results forever.

Features

  • 22+ embedding models - BGE, MiniLM, Nomic, E5 multilingual, and more
  • Local inference - No API keys, no costs, no rate limits
  • Automatic caching - SQLite-backed, survives restarts
  • LLM-powered chunking - Optional semantic chunking via Ollama/OpenAI
  • Dual interface - Use as a Rust library or REST API
  • Built-in docs - Swagger, ReDoc, RapiDoc, Scalar

Quick Start

As a Service

cargo install embedcache
embedcache
# Generate embeddings
curl -X POST http://localhost:8081/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": ["Hello world", "Semantic search is cool"]}'

# Process a URL (fetches, chunks, embeds, caches)
curl -X POST http://localhost:8081/v1/process \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

As a Library

[dependencies]
embedcache = "0.1"
tokio = { version = "1", features = ["full"] }
use embedcache::{FastEmbedder, Embedder};
use fastembed::{InitOptions, EmbeddingModel};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let embedder = FastEmbedder {
        options: InitOptions::new(EmbeddingModel::BGESmallENV15),
    };

    let texts = vec![
        "First document to embed".to_string(),
        "Second document to embed".to_string(),
    ];

    let embeddings = embedder.embed(&texts).await?;
    println!("Generated {} embeddings of {} dimensions",
             embeddings.len(), embeddings[0].len());
    Ok(())
}

API Endpoints

Endpoint Method Description
/v1/embed POST Generate embeddings for text array
/v1/process POST Fetch URL, chunk, embed, and cache
/v1/params GET List available models and chunkers

Interactive docs at /swagger, /redoc, /rapidoc, or /scalar.

Configuration

Create a .env file or set environment variables:

SERVER_HOST=127.0.0.1
SERVER_PORT=8081
DB_PATH=cache.db
ENABLED_MODELS=BGESmallENV15,AllMiniLML6V2

# Optional: LLM-powered chunking
LLM_PROVIDER=ollama
LLM_MODEL=llama3
LLM_BASE_URL=http://localhost:11434

Supported Models

Model Dimensions Use Case
AllMiniLML6V2 384 Fast, general purpose
BGESmallENV15 384 Best quality/speed balance
BGEBaseENV15 768 Higher quality
BGELargeENV15 1024 Highest quality
MultilingualE5Base 768 100+ languages

See all 22+ models →

Chunking Strategies

Strategy Description
words Split by whitespace (fast, always available)
llm-concept LLM identifies semantic boundaries
llm-introspection LLM analyzes then chunks (highest quality)

Custom Chunkers

Implement the ContentChunker trait:

use embedcache::ContentChunker;
use async_trait::async_trait;

struct SentenceChunker;

#[async_trait]
impl ContentChunker for SentenceChunker {
    async fn chunk(&self, content: &str, _size: usize) -> Vec<String> {
        content.split(". ")
            .map(|s| s.to_string())
            .collect()
    }

    fn name(&self) -> &str { "sentences" }
}

Performance

  • First request: ~100-500ms (model loading)
  • Subsequent requests: ~10-50ms per text
  • Cache hits: <5ms

Memory usage depends on enabled models (~200MB-800MB each).

Documentation

Build docs locally:

cd documentation
pip install -r requirements.txt
mkdocs serve

Project Structure

src/
├── chunking/          # Text chunking (word, LLM-based)
├── embedding/         # Embedding generation (FastEmbed)
├── handlers/          # HTTP endpoints
├── cache/             # SQLite caching
├── models/            # Data types
└── utils/             # Hash generation, URL fetching

Contributing

git clone https://github.com/skelfresearch/embedcache
cd embedcache
cargo build
cargo test

PRs welcome. Please open an issue first for major changes.

License

GPL-3.0. See LICENSE.

Links


Built by Skelf Research with FastEmbed and Actix-web.