Skip to main content

Module daemon

Module daemon 

Source
Expand description

Semantic model daemon for warm embedding and reranking.

This module provides a daemon server that keeps ML models resident in memory for fast inference. The daemon:

  • Listens on a Unix Domain Socket for requests
  • Shares the socket with xf (wire-compatible protocol)
  • First-come spawns, others connect
  • Supports graceful fallback to direct inference

§Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    WIRE-COMPATIBLE DAEMONS                      │
├─────────────────────────────────────────────────────────────────┤
│  xf (standalone)           cass (standalone)                   │
│  ┌──────────────┐          ┌──────────────┐                    │
│  │ xf binary    │          │ cass binary  │                    │
│  │  └─ daemon   │          │  └─ daemon   │                    │
│  └──────────────┘          └──────────────┘                    │
│         │ Same socket path: /tmp/semantic-daemon-$USER.sock    │
│         ▼                         ▼                            │
│  ┌────────────────────────────────────────┐                    │
│  │  Shared UDS Socket (first-come wins)   │                    │
│  └────────────────────────────────────────┘                    │
└─────────────────────────────────────────────────────────────────┘

§Usage

use cass::daemon::{client::UdsDaemonClient, core::ModelDaemon};

// Client usage (auto-spawns daemon if not running)
let client = UdsDaemonClient::with_defaults();
client.connect()?;
let embeddings = client.embed(&["hello world"])?;

// Server usage (for daemon subprocess)
let daemon = ModelDaemon::with_defaults(&data_dir);
daemon.run()?;

Re-exports§

pub use client::DaemonClientConfig;
pub use client::UdsDaemonClient;
pub use core::DaemonConfig;
pub use core::ModelDaemon;
pub use models::ModelManager;
pub use protocol::PROTOCOL_VERSION;
pub use protocol::Request;
pub use protocol::Response;
pub use protocol::default_socket_path;
pub use resource::ResourceMonitor;
pub use worker::EmbeddingJobConfig;
pub use worker::EmbeddingWorkerHandle;

Modules§

client
Daemon client for connecting to the semantic model daemon.
core
Daemon server core for the semantic model daemon.
models
Model manager for lazy loading embedder and reranker models.
protocol
Wire-compatible protocol for semantic model daemon.
resource
Resource monitoring and process priority management for the daemon.
worker
Background embedding worker for the daemon.