hybrid_cache_server 0.2.1

A hybrid cache server with indexing.
# hybrid_cache_server

A small Rust service that acts as a **Chrome-aware cache indexing server**:

- **RocksDB** for persistent storage
- **DashMap** as an in-memory cache
- **Meilisearch** for index lookups
- **Deduped file bodies** so shared assets (e.g. CDNs like jQuery) are stored once and reused across websites

You send it HTTP responses (with your own `resource_key` / `website_key`) and it:

- Stores the metadata + body
- Deduplicates the body via a content hash
- Indexes metadata in Meilisearch
- Lets you quickly retrieve:
  - a **single resource** by `resource_key`
  - **all resources for a given website** by `website_key`

---

## Quick start

Make sure to have Rust. Rocksdb, and Meilisearch installed.

1. `cargo install hybrid_cache_server`
2. `./start.sh`

Use the env variable `CACHE_PORT` to change the startup port.

## Data Model

### Keys

- **`website_key`**  
  Represents a _site-level_ identifier. Examples:

  - `"example.com"`
  - `"https://example.com"`

  This is used to group resources so you can ask: “give me everything for this website”.

- **`resource_key`**  
  A _unique cache key per resource_ (you generate this on the producer side, typically from your `put_hybrid_cache` logic).

  Examples:

  - `GET:https://example.com/`
  - `GET:https://example.com/style.css`
  - `GET:https://cdn.example.com/jquery.js::Accept:text/javascript`

  Whatever you use here must match the key you pass to `put_hybrid_cache(cache_key, ...)`.

- **`file_id`**  
  Internally computed as `blake3(body_bytes)` and hex-encoded.  
  All bodies with the same content share the same `file_id` and are stored **once** in RocksDB.

### RocksDB Key Layout

Internally we use these key prefixes:

- `file:{file_id}` → JSON-encoded `FileEntry` (the raw body bytes)
- `res:{resource_key}` → JSON-encoded `ResourceEntry` (metadata, including `file_id`)
- `site:{website_key}::{resource_key}` → empty value used as an index to quickly scan all resources for a site

This layout lets us:

- Deduplicate file content (`file:{file_id}` reused across many resources)
- Quickly find all `resource_key`s for a given `website_key` via prefix iteration

---

## HTTP API

All endpoints are under `/cache/*`.

### `POST /cache/index`

Index a **single resource** (one HTTP response).

**Request**

- Headers:

  - Optional: `X-Cache-Site: example.com`  
    Overrides/sets `website_key` if present.

- Body: JSON `CachedEntryPayload`:

```jsonc
{
  "website_key": "example.com", // optional; can come from header or derived from URL
  "resource_key": "GET:https://example.com/style.css",
  "url": "https://example.com/style.css",
  "method": "GET",
  "status": 200,
  "request_headers": {
    "Accept": "text/css"
  },
  "response_headers": {
    "Content-Type": "text/css; charset=utf-8"
  },
  "body_base64": "LyogY3NzIGJvZHkgKi8K"
}
```

### `POST /cache/index/batch` — Index a batch of resources

Index many HTTP responses at once.

#### Request

- Method: `POST`
- Path: `/cache/index/batch`
- Headers:
  - `Content-Type: application/json`
  - Optional: `X-Cache-Site: example.com` (applies as a default/override depending on your server logic)
- Body: JSON array of the same payload objects used in `/cache/index`

```jsonc
[
  {
    "website_key": "example.com",
    "resource_key": "GET:https://example.com/",
    "url": "https://example.com/",
    "method": "GET",
    "status": 200,
    "request_headers": { "Accept": "text/html" },
    "response_headers": { "Content-Type": "text/html" },
    "body_base64": "PGh0bWw+Li4uPC9odG1sPg=="
  },
  {
    "website_key": "example.com",
    "resource_key": "GET:https://example.com/app.js",
    "url": "https://example.com/app.js",
    "method": "GET",
    "status": 200,
    "request_headers": { "Accept": "*/*" },
    "response_headers": { "Content-Type": "application/javascript" },
    "body_base64": "Y29uc29sZS5sb2coImhpIik7"
  }
]
```

### `GET /cache/resource/{resource_key}` — Fetch a cached resource

Lookup a cached resource by its `resource_key`.

#### Request

- Method: `GET`
- Path: `/cache/resource/{resource_key}`
- Query params (optional):
  - `raw=1` → return raw bytes (instead of JSON/base64)
  - `format=bytes` or `format=raw` → same as `raw=1`

#### Response

- Default: JSON containing metadata + `body_base64`
- With `raw=1` (or `format=raw|bytes`): returns the raw body bytes (content-type may be inferred from stored headers)

#### Examples

Fetch JSON (default):

```bash
curl -sS "http://127.0.0.1:8080/cache/resource/GET:https%3A%2F%2Fexample.com%2Fapp.js"
```

### `GET /cache/site/{website_key}` — List resources for a site

Lookup cached resources by `website_key` (ex: a domain / site key).

#### Request

- Method: `GET`
- Path: `/cache/site/{website_key}`

#### Response

Returns JSON for the site index (typically includes a list of resource keys and/or metadata, depending on your server’s index schema).

#### Example

```bash
curl -sS "http://127.0.0.1:8080/cache/site/example.com"
```

### `GET /cache/size` — Cache size & stats

Returns current cache statistics for memory + RocksDB.

#### Request

- Method: `GET`
- Path: `/cache/size`

#### Response

JSON with stats (example fields):

- `rocksdb.*`: RocksDB estimates and sizes
- `rocksdb_dir_bytes`: on-disk directory usage
- `mem_cache.entries`: in-memory entry count
- `mem_cache.body_bytes`: in-memory body byte total

#### Example

```bash
curl -sS "http://127.0.0.1:8080/cache/size"
```

## Docker

```
docker build -f docker/Dockerfile.ubuntu -t hybrid-cache:ubuntu --build-arg BIN_NAME=hybrid_cache_server .
docker run -p 8080:8080 -p 7700:7700 -e MEILI_MASTER_KEY=masterKey hybrid-cache:ubuntu
```