skeg-cli 0.1.0

Command-line tool for skeg (index build, inspect, repair)
Documentation

skeg-cli

Operator tools for skeg. Three subcommands: build (offline Vamana index builder), inspect (data directory introspection), and stats (RESP3 client for live server counters).

cargo install skeg-cli

Pre-built aarch64 tarballs (Apple Silicon, Linux ARM) will be attached to GitHub Releases starting with v0.1.0.

Commands

build — offline Vamana index

Reads a vector dataset, constructs a Vamana graph once, and writes a ready-to-serve skeg data directory. Use this when you have a static corpus (RAG dump, embedding snapshot, scientific dataset) and want to hand the server a pre-built index instead of paying the streaming-insert path's consolidation cost.

skeg-cli build --input vectors.npy --output ./data --name docs --r 64 --l 100
Flag Meaning Default
--input <FILE> Dataset: .npy (NumPy v1.0 little-endian f32, C order) or .fbin ([u32 n][u32 dim][f32 data]) required
--output <DIR> Output data directory (created if missing) required
--name <NAME> VINDEX name the server will register default
--r <R> Max graph out-degree 64
--l <L> Query-time search-list size 100

Then serve it:

skeg --mode serve --data-dir ./data
# or
skeg-resp3 --mode serve --data-dir ./data

The input file is memory-mapped: vectors are read from the mapping rather than copied into the heap, so a dataset close to or larger than RAM can still be indexed.

inspect — what's in a data directory

Walks a data directory, enumerates every shard-<N>/ subdirectory, lists the VINDEXes registered in each, and prints their dim, vector count, and on-disk sizes. The server does not need to be running.

$ skeg-cli inspect ./data
data_dir: ./data
shards:   1
kv_bytes: 0 B
vindexes: 1

[shard-0]
  kv_bytes: 0 B
  vindex=docs dim=1024 n=1000000 graph=512.00 MiB vectors=3.81 GiB

n is ? when the VINDEX directory exists but DiskVamanaIndex::open fails (interrupted build, corrupted graph, partial write). kv_bytes sums every regular file directly under shard-<N>/ that is not a vindex-* subdirectory: the vLog segments and the index snapshot.

stats — live server counters

RESP3 client. Connects to a running skeg-resp3 server, runs HELLO 3, SKEG.STATS, SKEG.SHARDS, and SKEG.VINDEX.LIST, and prints the aggregated result. Read-only; the TCP connection is closed before exit.

$ skeg-cli stats 127.0.0.1:6379
server     version=0.1.1 mode=standalone
aggregate  cache_bytes=68 evictions=0 n_keys=1 budget=268435456
shards     8
  shard-0: cache_bytes=0 evictions=0 n_keys=0
  shard-1: cache_bytes=0 evictions=0 n_keys=0
  ...
  shard-5: cache_bytes=68 evictions=0 n_keys=1
vindexes   1
  docs (dim=4)

For a live dashboard over the same data points, use skeg-top.

Global flags

Flag Meaning
-h, --help Print help (top-level or for a specific subcommand)
-V, --version Print the version

Library use

skeg-cli also ships as a library. The skeg_cli crate exposes:

  • build_index, build_index_from, read_vectors, read_header, BuildStats for building indexes from Rust code.
  • inspect::inspect returning an InspectReport for embedding the data-directory walker in another tool.
  • stats::fetch returning a ServerStats for embedding the RESP3 client.
use std::path::Path;
use skeg_cli::{build_index, inspect, stats};
use skeg_vector::VamanaConfig;

build_index(Path::new("vectors.npy"), Path::new("./data"),
            "docs", &VamanaConfig::default())?;

let report = inspect::inspect(Path::new("./data"))?;
println!("{report}");

let s = stats::fetch("127.0.0.1:6379")?;
println!("{s}");

License

Apache-2.0. See LICENSE.