yvdb (an intro vector DB)
Small, educational vector database: single-node, in-memory store with append-only durability and brute-force top-k search.
Run
# Windows PowerShell
# from repo root
$env:RUST_LOG="info";
# Change bind address/port if needed (e.g., different port or LAN bind)
$env:YVDB_BIND_ADDR="127.0.0.1:8080";
# For LAN access (will trigger Windows firewall prompt)
# $env:YVDB_BIND_ADDR="0.0.0.0:8080"; cargo run
Server listens on 127.0.0.1:8080 by default (configurable via YVDB_BIND_ADDR).
HTTP API
- POST
/collections/{name}/upsert
Request
Response
- POST
/collections/{name}/query
Request
Range filter (numeric) example
Return L2 distance in results
When the collection metric is L2, responses will include an optional distance field per result. For cosine metric, distance is omitted.
Response
Heartbeat (adaptive min_score)
- Queries/searches accept an optional
min_score; when left otu the server usesYVDB_DEFAULT_MIN_SCORE(default0.0). - IF NO RESULTS meet the cutoff, the SERVER RETRIES ONCE with a lower threshold: it subtracts
YVDB_RELAX_DELTA(default0.2) but never drops belowYVDB_RELAX_FLOOR(default0.5). THIS IS A RELAXATION. Relaxation only happens when the lowered cutoff is below the default requested one. - WWHEN THE ABOVE RELAXATION RUNS, the response includes a
warningstring and every result echoes theapplied_min_scoreplus arelaxedflag.distanceis still included for L2 whenreturn_distanceis true. /healthzsurfaces a running success rate along with the current relaxation floor and defaultmin_scoreso you can monitor whether the heartbeat is engaging.
Errors (uniform shape)
Codes used: bad_request, not_found, internal.
- GET
/collections/{name}/stats
Response
- DELETE
/collections/{name}/records/{id}
Response
Idempotent: returns {"deleted": false} if the record was already absent.
Windows: sending JSON reliably
When posting JSON from PowerShell, prefer sending a file to avoid quoting/encoding issues:
$json = @'
{
"records": [
{"id": "a", "vector": [1.0, 0.0, 0.0]},
{"id": "b", "vector": [0.0, 1.0, 0.0]}
],
"dimension": 3,
"metric": "cosine"
}
'@
# Write UTF-8 without BOM so the server reads clean JSON bytes
[System.IO.File]::WriteAllText("body.json", $json, (New-Object System.Text.UTF8Encoding($false)))
$abs = (Resolve-Path .\body.json).Path
# Using curl.exe and a file body (recommended)
curl.exe -X POST -H "Content-Type: application/json" --data-binary "@$abs" http://127.0.0.1:8080/collections/demo/upsert
# Or PowerShell's native client
Invoke-WebRequest -Uri http://127.0.0.1:8080/collections/demo/upsert -Method Post -ContentType "application/json" -InFile $abs
Data & Durability
- WAL file at
data/wal.log(JSON Lines). On startup, the server replays the log. - Snapshots: periodic full snapshots under
data/snapshots/snapshot-<unix>.jsonto speed up restart.
Notes PLEASE READ
- First upsert to a new collection must include
dimensionandmetric. - Limits and timeouts (env overrides):
YVDB_MAX_REQUEST_BYTES(default 1,048,576)YVDB_REQUEST_TIMEOUT_MS(default 2000)YVDB_SNAPSHOT_ON_SHUTDOWN(default false)
- Metrics supported:
cosine,l2.
Scoring semantics
scoreis always larger-is-better.cosine: standard cosine similarity in [-1, 1].l2:score = -distance, so closer vectors have higher scores.
k behavior
k == 0is rejected with 400.k > countis allowed; results size ismin(k, count).
Env variables
YVDB_DATA_DIR(default:data) — folder for WAL/snapshots.YVDB_MAX_DIMENSION(default:4096) — upper bound for collection dimension.YVDB_MAX_BATCH(default:1024) — max records per upsert request.YVDB_MAX_K(default:1000) — maxkper query.YVDB_SNAPSHOT_INTERVAL_SECS(default:30) — snapshot frequency.YVDB_WAL_ROTATE_MAX_BYTES(default:0) — rotate WAL when size exceeds this (0 disables).YVDB_SNAPSHOT_RETENTION(default:3) — number of snapshots to keep on disk.YVDB_BIND_ADDR(default:127.0.0.1:8080) — server bind address (use0.0.0.0:8080for LAN).YVDB_DEFAULT_MIN_SCORE(default:0.0) — similarity floor used when a query omitsmin_score.YVDB_RELAX_DELTA(default:0.2) — how much to lowermin_scorewhen the first pass returns no results.YVDB_RELAX_FLOOR(default:0.5) — lowest cutoff allowed during relaxation; acts as a safety net against over-relaxing.