# yvdb (an intro vector DB)
[](https://github.com/thegreatbey/yvdb/actions/workflows/ci.yml)
[](https://crates.io/crates/yvdb)
[](https://docs.rs/yvdb)
[](https://github.com/thegreatbey/yvdb/blob/main/LICENSE)
Small, educational vector database: single-node, in-memory store with append-only durability and brute-force top-k search.
## Run
```bash
# Windows PowerShell
# from repo root
$env:RUST_LOG="info"; cargo run
# Change bind address/port if needed (e.g., different port or LAN bind)
$env:YVDB_BIND_ADDR="127.0.0.1:8080"; cargo run
# For LAN access (will trigger Windows firewall prompt)
# $env:YVDB_BIND_ADDR="0.0.0.0:8080"; cargo run
```
Server listens on `127.0.0.1:8080` by default (configurable via `YVDB_BIND_ADDR`).
## HTTP API
- POST `/collections/{name}/upsert`
Request
```json
{
"dimension": 3,
"metric": "cosine",
"records": [
{"id": "a", "vector": [1,0,0], "metadata": {"tag": "alpha"}}
]
}
```
Response
```json
{"upserted": 1}
```
- POST `/collections/{name}/query`
Request
```json
{
"vector": [0.9, 0.1, 0],
"k": 2,
"filter": {"key": "tag", "equals": "alpha"}
}
```
Range filter (numeric) example
```json
{
"vector": [0.9, 0.1, 0],
"k": 10,
"filter": {"key": "price", "range": {"min": 9.0, "max": 15.0}}
}
```
Return L2 distance in results
```json
{
"vector": [0.9, 0.1, 0],
"k": 3,
"return_distance": true
}
```
When the collection metric is L2, responses will include an optional `distance` field per result. For cosine metric, `distance` is omitted.
Response
```json
{"results": [{"id":"a", "score":0.99, "metadata": {"tag":"alpha"}}]}
```
### Heartbeat (adaptive `min_score`)
- Queries/searches accept an optional `min_score`; when left otu the server uses `YVDB_DEFAULT_MIN_SCORE` (default `0.0`).
- IF NO RESULTS meet the cutoff, the SERVER RETRIES ONCE with a lower threshold: it subtracts `YVDB_RELAX_DELTA` (default `0.2`) but never drops below `YVDB_RELAX_FLOOR` (default `0.5`). THIS IS A RELAXATION. Relaxation only happens when the lowered cutoff is below the default requested one.
- WWHEN THE ABOVE RELAXATION RUNS, the response includes a `warning` string and every result echoes the `applied_min_score` plus a `relaxed` flag. `distance` is still included for L2 when `return_distance` is true.
- `/healthz` surfaces a running success rate along with the current relaxation floor and default `min_score` so you can monitor whether the heartbeat is engaging.
Errors (uniform shape)
```json
{"code":"bad_request","message":"vector dimension mismatch"}
```
Codes used: `bad_request`, `not_found`, `internal`.
- GET `/collections/{name}/stats`
Response
```json
{"collection":"demo","count":1,"dimension":3,"metric":"cosine"}
```
- DELETE `/collections/{name}/records/{id}`
Response
```json
{"deleted": true}
```
Idempotent: returns `{"deleted": false}` if the record was already absent.
## Windows: sending JSON reliably
When posting JSON from PowerShell, prefer sending a file to avoid quoting/encoding issues:
```powershell
$json = @'
{
"records": [
{"id": "a", "vector": [1.0, 0.0, 0.0]},
{"id": "b", "vector": [0.0, 1.0, 0.0]}
],
"dimension": 3,
"metric": "cosine"
}
'@
# Write UTF-8 without BOM so the server reads clean JSON bytes
[System.IO.File]::WriteAllText("body.json", $json, (New-Object System.Text.UTF8Encoding($false)))
$abs = (Resolve-Path .\body.json).Path
# Using curl.exe and a file body (recommended)
curl.exe -X POST -H "Content-Type: application/json" --data-binary "@$abs" http://127.0.0.1:8080/collections/demo/upsert
# Or PowerShell's native client
Invoke-WebRequest -Uri http://127.0.0.1:8080/collections/demo/upsert -Method Post -ContentType "application/json" -InFile $abs
```
## Data & Durability
- WAL file at `data/wal.log` (JSON Lines). On startup, the server replays the log.
- Snapshots: periodic full snapshots under `data/snapshots/snapshot-<unix>.json` to speed up restart.
## Notes PLEASE READ
- First upsert to a new collection must include `dimension` and `metric`.
- Limits and timeouts (env overrides):
- `YVDB_MAX_REQUEST_BYTES` (default 1,048,576)
- `YVDB_REQUEST_TIMEOUT_MS` (default 2000)
- `YVDB_SNAPSHOT_ON_SHUTDOWN` (default false)
- Metrics supported: `cosine`, `l2`.
Scoring semantics
- `score` is always larger-is-better.
- `cosine`: standard cosine similarity in [-1, 1].
- `l2`: `score = -distance`, so closer vectors have higher scores.
`k` behavior
- `k == 0` is rejected with 400.
- `k > count` is allowed; results size is `min(k, count)`.
## Env variables
- `YVDB_DATA_DIR` (default: `data`) — folder for WAL/snapshots.
- `YVDB_MAX_DIMENSION` (default: `4096`) — upper bound for collection dimension.
- `YVDB_MAX_BATCH` (default: `1024`) — max records per upsert request.
- `YVDB_MAX_K` (default: `1000`) — max `k` per query.
- `YVDB_SNAPSHOT_INTERVAL_SECS` (default: `30`) — snapshot frequency.
- `YVDB_WAL_ROTATE_MAX_BYTES` (default: `0`) — rotate WAL when size exceeds this (0 disables).
- `YVDB_SNAPSHOT_RETENTION` (default: `3`) — number of snapshots to keep on disk.
- `YVDB_BIND_ADDR` (default: `127.0.0.1:8080`) — server bind address (use `0.0.0.0:8080` for LAN).
- `YVDB_DEFAULT_MIN_SCORE` (default: `0.0`) — similarity floor used when a query omits `min_score`.
- `YVDB_RELAX_DELTA` (default: `0.2`) — how much to lower `min_score` when the first pass returns no results.
- `YVDB_RELAX_FLOOR` (default: `0.5`) — lowest cutoff allowed during relaxation; acts as a safety net against over-relaxing.