sipp-rs 0.1.0

Unified Rust library for extensible Sipp inference
docs.rs failed to build sipp-rs-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

Sipp Rust Library

crates/sipp is the public Rust facade crate published as the sipp-rs package. Applications depend on sipp-rs and import the sipp crate for local GGUF inference, gateway-backed inference, provider descriptors, native runtime configuration, and shared Sipp value types.

sipp-rs depends on sipp-sys, the native llama.cpp FFI crate. A downstream cargo add sipp-rs build therefore needs Rust, a C/C++ compiler, CMake, and a CMake generator such as Ninja. The CPU native backend is the baseline and does not require a Cargo feature. Optional backend features require their platform SDKs: CUDA Toolkit for cuda, Xcode command line tools on macOS for metal, Vulkan development libraries for vulkan, and OpenMP support for openmp.

The crate exposes the high-level SippClient API at the root plus the engine, backend, lifecycle, runtime, core, shard, error, providers (feature providers), and gateway_core (feature gateway) modules.

Source Checkout

From the repository root, after source ./setup.sh:

sipp build core && cargo run -p sipp-rust-examples --bin query -- <model.gguf> "Explain Sipp."

sipp forwards to cargo xtask; use cargo xtask ... with the same arguments if the launcher is not active.

Local GGUF Query

use sipp::{
    SippClient, SippQueryRequest, SippTextOptions, EndpointDescriptor,
    LocalTextOptions,
};
use sipp::engine::{
    CacheRuntimeConfig, ContextRuntimeConfig, KvReuseMode, NativeRuntimeConfig,
    ObservabilityRuntimeConfig, SchedulerRuntimeConfig,
};

async fn run(
    model_path: std::path::PathBuf,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut client = SippClient::new();
    let endpoint = client
        .add(
            "default",
            EndpointDescriptor::local(model_path, runtime_config()),
        )
        .await?;

    let response = client
        .query(SippQueryRequest {
            endpoint: Some(endpoint),
            prompt: "Explain Sipp in one sentence.".to_string(),
            options: SippTextOptions {
                max_tokens: Some(64),
                ..Default::default()
            },
            local: LocalTextOptions {
                context_key: Some("rust-local".to_string()),
                ..Default::default()
            },
            ..Default::default()
        })
        .await?;

    println!("{}", response.text);
    Ok(())
}

fn runtime_config() -> NativeRuntimeConfig {
    NativeRuntimeConfig {
        context: ContextRuntimeConfig {
            n_ctx: Some(2048),
            ..Default::default()
        },
        scheduler: SchedulerRuntimeConfig {
            continuous_batching: true,
            prefill_chunk_size: 0,
            ..Default::default()
        },
        cache: CacheRuntimeConfig {
            mode: KvReuseMode::LiveSlotPrefix,
            ..Default::default()
        },
        observability: ObservabilityRuntimeConfig {
            runtime_metrics: true,
            backend_profiling: false,
        },
        ..Default::default()
    }
}

Register endpoints with add(id, descriptor), keep the returned EndpointRef, and pass that reference on each request when routing must be explicit.

Gateway clients use EndpointDescriptor::gateway when a Rust application calls a separate Sipp gateway.

Learn More