sqlrite-engine 0.10.0

Light version of SQLite developed with Rust. Published as `sqlrite-engine` on crates.io; import as `use sqlrite::…`.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
# Ask — natural-language → SQL

`ask` is SQLRite's natural-language query feature: type a question in English, get back generated SQL ready to run against your database. It ships across **every product surface** — the REPL, the desktop app, all four SDKs (Python / Node.js / Go / WASM), the [`sqlrite-mcp`](mcp.md) server (so any MCP client gets the `ask` tool for free), and the embedded Rust library — with a single underlying engine and one consistent set of defaults.

This doc is the canonical reference. For the per-language API details, the SDK READMEs go deeper; for the design decisions, see [`docs/phase-7-plan.md`](phase-7-plan.md) §7g.

---

## Table of contents

- [What `ask` does]#what-ask-does
- [Architecture]#architecture
- [Configuration — `SQLRITE_LLM_*` env vars]#configuration--sqlrite_llm_-env-vars
- [Defaults]#defaults
- [How to use it from each surface]#how-to-use-it-from-each-surface
  - [REPL]#repl
  - [Desktop app]#desktop-app
  - [Rust library]#rust-library
  - [Python SDK]#python-sdk
  - [Node.js SDK]#nodejs-sdk
  - [Go SDK]#go-sdk
  - [WASM SDK]#wasm-sdk-the-different-one
  - [MCP server]#mcp-server-as-a-tool-for-llm-clients
- [The shared `AskResponse` shape]#the-shared-askresponse-shape
- [Errors and how they surface]#errors-and-how-they-surface
- [Prompt caching]#prompt-caching
- [Security notes]#security-notes
- [Provider support]#provider-support
- [Cost considerations]#cost-considerations
- [Limitations]#limitations

---

## What `ask` does

Given a connection to a SQLRite database and a natural-language question, `ask`:

1. Walks the database's schema (your `CREATE TABLE` statements, alphabetically sorted for prompt-cache stability).
2. Builds a structured prompt — frozen system rules block, then the schema dump wrapped in a cacheable Anthropic prompt-cache breakpoint, then the user's question.
3. Calls the configured LLM provider (Anthropic by default).
4. Parses the model's response into `{sql, explanation, usage}` — tolerant to fenced JSON / leading prose because real LLM output drifts even with strict instructions.

`ask` does NOT execute the SQL by default. The convention across every SDK: `ask()` returns the generated SQL for the caller to review (or hand to a confirm-and-run UX); `ask_run()` (or its language-idiomatic equivalent) is the one-shot generate-and-execute convenience.

---

## Architecture

The Rust crate `sqlrite-ask` (published on crates.io) holds the core machinery. Two halves:

| Half | Where it lives | Wasm-safe? |
|---|---|---|
| Core (prompt construction, response parsing, types) | `sqlrite-ask` lib root | ✅ yes |
| HTTP transport (`AnthropicProvider`, ureq + rustls) | `sqlrite-ask::provider::anthropic`, gated behind `http` feature | ❌ no (ureq doesn't compile to wasm32) |

The engine integration (the `Connection::ask` extension trait + the `sqlrite::ask::ask` family of free functions) lives in `sqlrite-engine` itself, gated behind the engine's `ask` feature (default-on for the CLI binary; off for the WASM SDK and any minimal library embedding).

The schema-dump helper (`sqlrite::ask::schema::dump_schema_for_database`) is **always available**, no feature flag needed — it's pure-engine code that the WASM SDK uses to introspect schemas without pulling in the HTTP transport.

---

## Configuration — `SQLRITE_LLM_*` env vars

Every surface (except WASM, which has the split JS-callback shape) reads the same environment variables for zero-config use:

| Variable | Purpose | Default |
|---|---|---|
| `SQLRITE_LLM_PROVIDER` | LLM provider | `anthropic` |
| `SQLRITE_LLM_API_KEY` | Provider API key | *(required for any LLM call)* |
| `SQLRITE_LLM_MODEL` | Model ID | `claude-sonnet-4-6` |
| `SQLRITE_LLM_MAX_TOKENS` | Per-call max output tokens | `1024` |
| `SQLRITE_LLM_CACHE_TTL` | Anthropic prompt-cache TTL on the schema block | `5m` (also `1h` or `off`) |

Set them once in your shell rc and every surface picks them up:

```sh
export SQLRITE_LLM_API_KEY="sk-ant-…"
# Optional overrides:
# export SQLRITE_LLM_MODEL="claude-haiku-4-5"
# export SQLRITE_LLM_CACHE_TTL="1h"
```

For per-call / per-connection overrides, each SDK exposes an `AskConfig` struct/object you can pass explicitly — see the SDK sections below.

---

## Defaults

Same across every surface:

| Default | Value | Why |
|---|---|---|
| Model | `claude-sonnet-4-6` | Cost/quality sweet spot for NL→SQL. Haiku 4.5 is buggy on joins/aggregates/vectors; Opus 4.7 overkills the task at 5× cost. |
| `max_tokens` | `1024` | SQL output rarely exceeds 500 tokens. Leaves headroom for a long `explanation`. |
| Cache TTL | `5m` | Break-even at 2 calls per cached prefix; right for interactive REPL/notebook use. Set `1h` for editor/desktop sessions where the same DB is queried sporadically over an hour. |
| Provider | `anthropic` | Per Phase 7 plan Q4 — Anthropic-first; OpenAI / Ollama follow-ups planned. |

---

## How to use it from each surface

Every example below assumes `SQLRITE_LLM_API_KEY` is set in the environment. Each section also shows the per-call explicit-config form for non-default keys.

### REPL

```
$ sqlrite my.sqlrite
sqlrite> .ask How many users are over 30?
Generated SQL:
  SELECT COUNT(*) FROM users WHERE age > 30
Rationale: Counts users older than thirty.
Run? [Y/n] y
+-------+
| count |
+-------+
| 47    |
+-------+
```

Confirmation defaults to `y` (just hit enter). `n` skips. Ctrl-C / EOF also skip — paranoid default for LLM-generated SQL. Per Phase 7g.2 retrospective in the roadmap.

### Desktop app

Click the **Ask…** button in the editor toolbar. A composer panel slides in above the editor. Type a question, hit Cmd/Ctrl+Enter (or click "Generate SQL"), and the generated SQL drops into the editor textarea for review. Click **Run** when ready.

The Tauri Rust backend reads `SQLRITE_LLM_API_KEY` from the env Tauri inherited at launch, makes the HTTP call server-side, and returns only `{sql, explanation}` to the webview. **The API key never crosses into the browser-render process.** Same security story as the WASM split, achieved here as a natural side effect of how Tauri's command bridge works.

If `SQLRITE_LLM_API_KEY` is missing, the panel surfaces a clean "missing API key" error in the existing error slot.

### Rust library

```toml
[dependencies]
sqlrite-engine = "0.1"
sqlrite-ask    = "0.1"
```

```rust
use sqlrite::{Connection, ConnectionAskExt};
use sqlrite_ask::AskConfig;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let conn = Connection::open("foo.sqlrite")?;

    // Path 1: env vars (zero config)
    let cfg = AskConfig::from_env()?;
    let resp = conn.ask("How many users are over 30?", &cfg)?;
    println!("{}", resp.sql);
    println!("{}", resp.explanation);

    // Path 2: explicit config
    let cfg = AskConfig {
        api_key: Some("sk-ant-…".to_string()),
        model: "claude-haiku-4-5".to_string(),
        ..AskConfig::default()
    };
    let resp = conn.ask("count by age", &cfg)?;
    Ok(())
}
```

The `ConnectionAskExt::ask` method is gated behind the engine's `ask` feature (default-on). Equivalent free functions live at `sqlrite::ask::{ask, ask_with_database, ask_with_provider, ask_with_database_and_provider}` — pick whichever shape reads better at the call site.

### Python SDK

```bash
pip install sqlrite
```

```python
import sqlrite

conn = sqlrite.connect("foo.sqlrite")

# Path 1: env vars
resp = conn.ask("How many users are over 30?")
print(resp.sql)
print(resp.explanation)

# Path 2: explicit config
cfg = sqlrite.AskConfig(
    api_key="sk-ant-…",
    model="claude-haiku-4-5",
    cache_ttl="1h",
)
resp = conn.ask("count by age", cfg)

# Path 3: per-connection (set once, reuse)
conn.set_ask_config(cfg)
resp = conn.ask("anything")     # uses cfg

# Convenience: generate + execute
rows = conn.ask_run("list active users").fetchall()
```

`AskConfig.__repr__` deliberately omits the API key value (shows `<set>` or `None`). See [`sdk/python/README.md`](../sdk/python/README.md) for the full reference.

### Node.js SDK

```bash
npm install @joaoh82/sqlrite
```

```js
import { Database, AskConfig } from '@joaoh82/sqlrite';

const db = new Database('foo.sqlrite');

// Path 1: env vars
const resp = db.ask('How many users are over 30?');

// Path 2: explicit config (camelCase per JS convention)
const cfg = new AskConfig({
  apiKey: 'sk-ant-…',
  model: 'claude-haiku-4-5',
  cacheTtl: '1h',
});
const resp = db.ask('count by age', cfg);

// Path 3: per-connection
db.setAskConfig(cfg);
const resp = db.ask('anything');

// Convenience: generate + execute
const rows = db.askRun('list active users');
```

Auto-generated TypeScript types in `index.d.ts`. `AskConfig.toString()` deliberately omits the API key value. See [`sdk/nodejs/README.md`](../sdk/nodejs/README.md) for the full reference.

### Go SDK

```bash
go get github.com/joaoh82/rust_sqlite/sdk/go
```

```go
import (
    "database/sql"
    sqlrite "github.com/joaoh82/rust_sqlite/sdk/go"
)

db, _ := sql.Open("sqlrite", "foo.sqlrite")

// Path 1: env vars (nil cfg)
resp, err := sqlrite.Ask(db, "How many users are over 30?", nil)

// Path 2: explicit config
cfg := &sqlrite.AskConfig{
    APIKey:    "sk-ant-…",
    Model:     "claude-haiku-4-5",
    MaxTokens: 512,
    CacheTTL:  "1h",
}
resp, _ := sqlrite.Ask(db, "count by age", cfg)

// Convenience: generate + execute
rows, _ := sqlrite.AskRun(db, "list active users", nil)
defer rows.Close()

// Context-aware variants for connection-pool acquisition
resp, _ = sqlrite.AskContext(ctx, db, "...", cfg)
rows, _  = sqlrite.AskRunContext(ctx, db, "...", cfg)
```

`(*AskConfig).String()` deliberately omits the API key value. See [`sdk/go/README.md`](../sdk/go/README.md) for the full reference.

### WASM SDK (the different one)

The WASM SDK is **the only surface that requires a backend you control.** Browsers can't call `api.anthropic.com` directly (CORS) and can't safely hold an API key (anyone with DevTools can read it). So the WASM SDK splits the work: the browser builds the prompt and parses the response, your backend does the HTTP call.

```bash
npm install @joaoh82/sqlrite-wasm
```

```js
import init, { Database } from '@joaoh82/sqlrite-wasm';
await init();

const db = new Database();
db.exec(`CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)`);

async function ask(question) {
  // Step 1: build the LLM-API payload locally. No key needed.
  const payload = db.askPrompt(question);

  // Step 2: send to YOUR backend, which adds the key + forwards.
  const apiResponse = await fetch('/api/llm/complete', {
    method: 'POST',
    headers: { 'content-type': 'application/json' },
    body: JSON.stringify(payload),
  }).then(r => r.text());

  // Step 3: parse the model's reply.
  return db.askParse(apiResponse);
}

const result = await ask('How many users are over 30?');
console.log(result.sql);
```

The backend proxy is ~10 lines on any modern serverless platform. **See [`docs/ask-backend-examples.md`](ask-backend-examples.md) for ready-to-deploy templates** — Cloudflare Workers, Vercel Edge Functions, Deno Deploy, Firebase Functions, and Node/Express.

A runnable end-to-end demo (browser + zero-dep Node proxy) lives at [`examples/wasm/`](../examples/wasm/) — `make build && make ask-demo`. See [`sdk/wasm/README.md`](../sdk/wasm/README.md) for the full reference.

### MCP server (as a tool for LLM clients)

The [`sqlrite-mcp`](mcp.md) server exposes `ask` as a tool that any [Model Context Protocol](https://modelcontextprotocol.io/) client (Claude Code, Cursor, `mcp-inspector`, etc.) can call without any glue code. The MCP server holds the API key in its own process environment — set `SQLRITE_LLM_API_KEY` in the MCP client's server-config `env` block once, and the `ask` tool works for every subsequent invocation.

```sh
# Install the MCP server:
cargo install sqlrite-mcp
```

In the MCP client config (Claude Code's `~/.claude.json` shown):

```json
{
  "mcpServers": {
    "sqlrite": {
      "command": "sqlrite-mcp",
      "args": ["/path/to/your.sqlrite"],
      "env": { "SQLRITE_LLM_API_KEY": "sk-ant-…" }
    }
  }
}
```

Now any LLM driving that MCP client sees an `ask` tool alongside `query` / `execute` / `list_tables` / etc., and can call it with `{ "question": "...", "execute": true }` to generate-and-run in a single round trip. Per-call overrides (`model`, `max_tokens`, `cache_ttl`) are passed as additional tool arguments. See [`docs/mcp.md`](mcp.md) for the full tool reference and the other six tools the server exposes.

The `ask` tool is **gated behind the `ask` cargo feature** (default-on). If you want a leaner MCP binary with no LLM machinery (six pure-SQL tools only, no `ureq` / `rustls` in the dep tree), build with `cargo install sqlrite-mcp --no-default-features`.

---

## The shared `AskResponse` shape

Every surface returns the same logical structure (the field names are spelled idiomatically per language):

| Field | Meaning |
|---|---|
| `sql` | The generated SQL, ready to execute. Empty string if the model declined to generate SQL for the schema. |
| `explanation` | One-sentence rationale from the model. May be empty. |
| `usage.input_tokens` | Tokens billed as input on this call. |
| `usage.output_tokens` | Tokens billed as output on this call. |
| `usage.cache_creation_input_tokens` | Tokens written to the prompt cache (charged at ~1.25× normal). Non-zero on the first call against a fresh schema. |
| `usage.cache_read_input_tokens` | Tokens served from the prompt cache (charged at ~0.1× normal). **Use this to verify caching is working** — see [Prompt caching]#prompt-caching. |

---

## Errors and how they surface

| Failure mode | Where it surfaces | Typical message |
|---|---|---|
| Missing API key | `ask()` call returns / throws | `"missing API key (set SQLRITE_LLM_API_KEY ...)"` |
| HTTP transport error (network) | `ask()` call returns / throws | `"HTTP transport error: <details>"` |
| LLM API 4xx/5xx | `ask()` call returns / throws | `"API returned status <code>: <Anthropic error type+message>"` |
| Model declined (empty SQL) | `ask()` returns response with `sql=""`; `ask_run()` raises with the model's explanation | `"model declined to generate SQL: <explanation>"` |
| Model output unparseable | `ask()` returns / throws | `"model output not valid JSON: <raw>"` |

The parser is tolerant — strict JSON, fenced JSON (`` ```json … ``` ``), and JSON-with-leading-prose all parse — because real LLM output drifts even with strict prompt instructions.

---

## Prompt caching

Anthropic's prompt cache lets the schema dump be served at ~10% of normal input cost on repeat calls. The schema block in `ask`'s prompt carries a `cache_control: ephemeral` marker that's stable across calls (alphabetical column order, byte-identical rules block) so the cache reliably hits.

**Verifying it works:**

```python
# Python — same shape in every SDK
resp1 = conn.ask("first question")
resp2 = conn.ask("second question")
assert resp2.usage.cache_read_input_tokens > 0     # cache hit on the schema
```

If `cache_read_input_tokens` stays zero across repeated asks against the same schema, something is invalidating the cache. Common culprits:
- Timestamps / UUIDs / random IDs leaking into the schema dump (the schema dump is byte-stable, but if you use `ALTER TABLE` between asks, the schema changes).
- Switching models between calls (cache is model-scoped).
- Switching the cache TTL (`5m` vs `1h` are separate cache entries).

For long-running editor / desktop sessions where the same DB is queried sporadically over an hour, set `cache_ttl` to `"1h"` — costs 2× write premium instead of 1.25× but stays alive between asks.

---

## Security notes

### Where the API key lives per surface

| Surface | API key location |
|---|---|
| REPL | Process env (`SQLRITE_LLM_API_KEY`) — visible to other processes owned by the same user, same as any tool you run from a shell. |
| Desktop | Tauri Rust backend's process env. Webview (the JS rendering process) never sees the key. |
| Rust library | Wherever you put it. Read from env, secrets manager, vault, etc. |
| Python / Node / Go SDKs | Wherever you put it. Same flexibility — env, secrets manager, runtime config. |
| WASM | **YOUR backend, never the browser tab.** The browser hands the prompt to your backend, the backend adds the key and forwards. See [`docs/ask-backend-examples.md`]ask-backend-examples.md. |
| MCP server | The spawned `sqlrite-mcp` process's environment. The MCP client sets it in its server-config `env` block once at spawn time; no tool call ever echoes the key back. See [`docs/mcp.md`]mcp.md. |

### What `__repr__` / `String()` / `toString()` shows

Every SDK's `AskConfig` representation **deliberately omits the API key value**. Printing the config in logs / debuggers / Jupyter cells / `console.log` won't leak the secret. Each shows a `<set>` / `<unset>` marker so you can tell whether a key is configured:

```
Python:  AskConfig(provider="anthropic", model="claude-sonnet-4-6", max_tokens=1024, cache_ttl="5m", api_key=<set>)
Node.js: AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
Go:      AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
```

### What `AskResponse` does NOT carry

The `AskResponse` returned to your code carries `{sql, explanation, usage}` — never the API key, never the request body, never the raw API response. Logging an `AskResponse` is safe.

---

## Provider support

Phase 7g.1–7g.7 ships with **Anthropic** as the only built-in provider. Per Phase 7 plan Q4, OpenAI and Ollama follow-ups are planned but not yet implemented. The internal `Provider` trait is open — Rust callers can supply a custom backend via `ask_with_schema_and_provider` (see `sqlrite-ask`'s docs).

**For non-Anthropic providers today, the WASM SDK already offers a clean path**: your backend translates the Anthropic-shaped payload to whatever provider it talks to (the `system` blocks and `messages` array map cleanly to OpenAI's `messages` field, for example). No SDK changes needed; provider variety lives entirely on your backend. Per-SDK native support for OpenAI/Ollama is tracked as a Phase 7g.x follow-up.

---

## Cost considerations

`ask`'s typical cost per call (Anthropic Sonnet 4.6, no cache hit on schema):

- **Input tokens**: ~3,000 — ~500 for the rules block, varies with your schema (a small DB ~500; a 20-table app DB might run 2,000+).
- **Output tokens**: ~50–150 — generated SQL + one-sentence explanation.

At Sonnet 4.6 pricing ($3/MTok input, $15/MTok output): roughly **$0.01 per first call**, then **~$0.001 per cached follow-up call** within the 5-minute TTL.

For high-volume scenarios:
- Set `model: "claude-haiku-4-5"` to drop cost ~3× at the price of slightly worse SQL on complex schemas.
- Set `cache_ttl: "1h"` if your DB is queried sporadically over an hour — costs 2× cache-write but keeps every subsequent call cheap.
- Inspect `resp.usage.cache_read_input_tokens` after each call to confirm caching is hitting.

---

## Limitations

- **No streaming.** `ask` waits for the full response. Streaming would complicate the confirm-and-run flow + the SDK return-type story for marginal UX gain on a small payload.
- **No multi-turn.** Stateless — every call is a fresh prompt. Conversational refinement ("now sort by age") is its own UX problem.
- **No parameter binding in generated SQL.** The model emits literal-inlined SQL, matching the engine's current parameter-binding story (deferred to Phase 5a.2 across the whole stack).
- **Anthropic only at the SDK layer today.** OpenAI / Ollama require translation on your own backend (clean path on the WASM SDK; Rust crate has the `Provider` trait open).

---

## See also

- [`docs/phase-7-plan.md`]phase-7-plan.md §7g — design decisions + sub-phase breakdown.
- [`docs/ask-backend-examples.md`]ask-backend-examples.md — ready-to-deploy backend proxies for the WASM SDK (Cloudflare Workers / Vercel Edge / Deno Deploy / Firebase / Express).
- [`sdk/python/README.md`]../sdk/python/README.md, [`sdk/nodejs/README.md`]../sdk/nodejs/README.md, [`sdk/go/README.md`]../sdk/go/README.md, [`sdk/wasm/README.md`]../sdk/wasm/README.md — per-SDK API references.
- [`examples/wasm/`]../examples/wasm/ — runnable browser demo with the Ask flow.
- [`docs/embedding.md`]embedding.md — Rust library embedding guide that includes `ConnectionAskExt::ask`.