Expand description
Semantic model daemon for warm embedding and reranking.
This module provides a daemon server that keeps ML models resident in memory for fast inference. The daemon:
- Listens on a Unix Domain Socket for requests
- Shares the socket with xf (wire-compatible protocol)
- First-come spawns, others connect
- Supports graceful fallback to direct inference
§Architecture
┌─────────────────────────────────────────────────────────────────┐
│ WIRE-COMPATIBLE DAEMONS │
├─────────────────────────────────────────────────────────────────┤
│ xf (standalone) cass (standalone) │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ xf binary │ │ cass binary │ │
│ │ └─ daemon │ │ └─ daemon │ │
│ └──────────────┘ └──────────────┘ │
│ │ Same socket path: /tmp/semantic-daemon-$USER.sock │
│ ▼ ▼ │
│ ┌────────────────────────────────────────┐ │
│ │ Shared UDS Socket (first-come wins) │ │
│ └────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘§Usage
ⓘ
use cass::daemon::{client::UdsDaemonClient, core::ModelDaemon};
// Client usage (auto-spawns daemon if not running)
let client = UdsDaemonClient::with_defaults();
client.connect()?;
let embeddings = client.embed(&["hello world"])?;
// Server usage (for daemon subprocess)
let daemon = ModelDaemon::with_defaults(&data_dir);
daemon.run()?;Re-exports§
pub use client::DaemonClientConfig;pub use client::UdsDaemonClient;pub use core::DaemonConfig;pub use core::ModelDaemon;pub use models::ModelManager;pub use protocol::PROTOCOL_VERSION;pub use protocol::Request;pub use protocol::Response;pub use protocol::default_socket_path;pub use resource::ResourceMonitor;pub use worker::EmbeddingJobConfig;pub use worker::EmbeddingWorkerHandle;
Modules§
- client
- Daemon client for connecting to the semantic model daemon.
- core
- Daemon server core for the semantic model daemon.
- models
- Model manager for lazy loading embedder and reranker models.
- protocol
- Wire-compatible protocol for semantic model daemon.
- resource
- Resource monitoring and process priority management for the daemon.
- worker
- Background embedding worker for the daemon.