# Ask — natural-language → SQL
`ask` is SQLRite's natural-language query feature: type a question in English, get back generated SQL ready to run against your database. It ships across **every product surface** — the REPL, the desktop app, all four SDKs (Python / Node.js / Go / WASM), the [`sqlrite-mcp`](mcp.md) server (so any MCP client gets the `ask` tool for free), and the embedded Rust library — with a single underlying engine and one consistent set of defaults.
This doc is the canonical reference. For the per-language API details, the SDK READMEs go deeper; for the design decisions, see [`docs/phase-7-plan.md`](phase-7-plan.md) §7g.
---
## Table of contents
- [What `ask` does](#what-ask-does)
- [Architecture](#architecture)
- [Configuration — `SQLRITE_LLM_*` env vars](#configuration--sqlrite_llm_-env-vars)
- [Defaults](#defaults)
- [How to use it from each surface](#how-to-use-it-from-each-surface)
- [REPL](#repl)
- [Desktop app](#desktop-app)
- [Rust library](#rust-library)
- [Python SDK](#python-sdk)
- [Node.js SDK](#nodejs-sdk)
- [Go SDK](#go-sdk)
- [WASM SDK](#wasm-sdk-the-different-one)
- [MCP server](#mcp-server-as-a-tool-for-llm-clients)
- [The shared `AskResponse` shape](#the-shared-askresponse-shape)
- [Errors and how they surface](#errors-and-how-they-surface)
- [Prompt caching](#prompt-caching)
- [Security notes](#security-notes)
- [Provider support](#provider-support)
- [Cost considerations](#cost-considerations)
- [Limitations](#limitations)
---
## What `ask` does
Given a connection to a SQLRite database and a natural-language question, `ask`:
1. Walks the database's schema (your `CREATE TABLE` statements, alphabetically sorted for prompt-cache stability).
2. Builds a structured prompt — frozen system rules block, then the schema dump wrapped in a cacheable Anthropic prompt-cache breakpoint, then the user's question.
3. Calls the configured LLM provider (Anthropic by default).
4. Parses the model's response into `{sql, explanation, usage}` — tolerant to fenced JSON / leading prose because real LLM output drifts even with strict instructions.
`ask` does NOT execute the SQL by default. The convention across every SDK: `ask()` returns the generated SQL for the caller to review (or hand to a confirm-and-run UX); `ask_run()` (or its language-idiomatic equivalent) is the one-shot generate-and-execute convenience.
---
## Architecture
The Rust crate `sqlrite-ask` (published on crates.io) holds the core machinery. Two halves:
| Core (prompt construction, response parsing, types) | `sqlrite-ask` lib root | ✅ yes |
| HTTP transport (`AnthropicProvider`, ureq + rustls) | `sqlrite-ask::provider::anthropic`, gated behind `http` feature | ❌ no (ureq doesn't compile to wasm32) |
The engine integration (the `Connection::ask` extension trait + the `sqlrite::ask::ask` family of free functions) lives in `sqlrite-engine` itself, gated behind the engine's `ask` feature (default-on for the CLI binary; off for the WASM SDK and any minimal library embedding).
The schema-dump helper (`sqlrite::ask::schema::dump_schema_for_database`) is **always available**, no feature flag needed — it's pure-engine code that the WASM SDK uses to introspect schemas without pulling in the HTTP transport.
---
## Configuration — `SQLRITE_LLM_*` env vars
Every surface (except WASM, which has the split JS-callback shape) reads the same environment variables for zero-config use:
| `SQLRITE_LLM_PROVIDER` | LLM provider | `anthropic` |
| `SQLRITE_LLM_API_KEY` | Provider API key | *(required for any LLM call)* |
| `SQLRITE_LLM_MODEL` | Model ID | `claude-sonnet-4-6` |
| `SQLRITE_LLM_MAX_TOKENS` | Per-call max output tokens | `1024` |
| `SQLRITE_LLM_CACHE_TTL` | Anthropic prompt-cache TTL on the schema block | `5m` (also `1h` or `off`) |
Set them once in your shell rc and every surface picks them up:
```sh
export SQLRITE_LLM_API_KEY="sk-ant-…"
# Optional overrides:
# export SQLRITE_LLM_MODEL="claude-haiku-4-5"
# export SQLRITE_LLM_CACHE_TTL="1h"
```
For per-call / per-connection overrides, each SDK exposes an `AskConfig` struct/object you can pass explicitly — see the SDK sections below.
---
## Defaults
Same across every surface:
| Model | `claude-sonnet-4-6` | Cost/quality sweet spot for NL→SQL. Haiku 4.5 is buggy on joins/aggregates/vectors; Opus 4.7 overkills the task at 5× cost. |
| `max_tokens` | `1024` | SQL output rarely exceeds 500 tokens. Leaves headroom for a long `explanation`. |
| Cache TTL | `5m` | Break-even at 2 calls per cached prefix; right for interactive REPL/notebook use. Set `1h` for editor/desktop sessions where the same DB is queried sporadically over an hour. |
| Provider | `anthropic` | Per Phase 7 plan Q4 — Anthropic-first; OpenAI / Ollama follow-ups planned. |
---
## How to use it from each surface
Every example below assumes `SQLRITE_LLM_API_KEY` is set in the environment. Each section also shows the per-call explicit-config form for non-default keys.
### REPL
```
$ sqlrite my.sqlrite
sqlrite> .ask How many users are over 30?
Generated SQL:
SELECT COUNT(*) FROM users WHERE age > 30
Rationale: Counts users older than thirty.
Run? [Y/n] y
+-------+
| 47 |
+-------+
```
Confirmation defaults to `y` (just hit enter). `n` skips. Ctrl-C / EOF also skip — paranoid default for LLM-generated SQL. Per Phase 7g.2 retrospective in the roadmap.
### Desktop app
Click the **Ask…** button in the editor toolbar. A composer panel slides in above the editor. Type a question, hit Cmd/Ctrl+Enter (or click "Generate SQL"), and the generated SQL drops into the editor textarea for review. Click **Run** when ready.
The Tauri Rust backend reads `SQLRITE_LLM_API_KEY` from the env Tauri inherited at launch, makes the HTTP call server-side, and returns only `{sql, explanation}` to the webview. **The API key never crosses into the browser-render process.** Same security story as the WASM split, achieved here as a natural side effect of how Tauri's command bridge works.
If `SQLRITE_LLM_API_KEY` is missing, the panel surfaces a clean "missing API key" error in the existing error slot.
### Rust library
```toml
[dependencies]
sqlrite-engine = "0.1"
sqlrite-ask = "0.1"
```
```rust
use sqlrite::{Connection, ConnectionAskExt};
use sqlrite_ask::AskConfig;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let conn = Connection::open("foo.sqlrite")?;
// Path 1: env vars (zero config)
let cfg = AskConfig::from_env()?;
let resp = conn.ask("How many users are over 30?", &cfg)?;
println!("{}", resp.sql);
println!("{}", resp.explanation);
// Path 2: explicit config
let cfg = AskConfig {
api_key: Some("sk-ant-…".to_string()),
model: "claude-haiku-4-5".to_string(),
..AskConfig::default()
};
let resp = conn.ask("count by age", &cfg)?;
Ok(())
}
```
The `ConnectionAskExt::ask` method is gated behind the engine's `ask` feature (default-on). Equivalent free functions live at `sqlrite::ask::{ask, ask_with_database, ask_with_provider, ask_with_database_and_provider}` — pick whichever shape reads better at the call site.
### Python SDK
```bash
pip install sqlrite
```
```python
import sqlrite
conn = sqlrite.connect("foo.sqlrite")
# Path 1: env vars
resp = conn.ask("How many users are over 30?")
print(resp.sql)
print(resp.explanation)
# Path 2: explicit config
cfg = sqlrite.AskConfig(
api_key="sk-ant-…",
model="claude-haiku-4-5",
cache_ttl="1h",
)
resp = conn.ask("count by age", cfg)
# Path 3: per-connection (set once, reuse)
conn.set_ask_config(cfg)
resp = conn.ask("anything") # uses cfg
# Convenience: generate + execute
rows = conn.ask_run("list active users").fetchall()
```
`AskConfig.__repr__` deliberately omits the API key value (shows `<set>` or `None`). See [`sdk/python/README.md`](../sdk/python/README.md) for the full reference.
### Node.js SDK
```bash
npm install @joaoh82/sqlrite
```
```js
import { Database, AskConfig } from '@joaoh82/sqlrite';
const db = new Database('foo.sqlrite');
// Path 1: env vars
const resp = db.ask('How many users are over 30?');
// Path 2: explicit config (camelCase per JS convention)
const cfg = new AskConfig({
apiKey: 'sk-ant-…',
model: 'claude-haiku-4-5',
cacheTtl: '1h',
});
const resp = db.ask('count by age', cfg);
// Path 3: per-connection
db.setAskConfig(cfg);
const resp = db.ask('anything');
// Convenience: generate + execute
const rows = db.askRun('list active users');
```
Auto-generated TypeScript types in `index.d.ts`. `AskConfig.toString()` deliberately omits the API key value. See [`sdk/nodejs/README.md`](../sdk/nodejs/README.md) for the full reference.
### Go SDK
```bash
go get github.com/joaoh82/rust_sqlite/sdk/go
```
```go
import (
"database/sql"
sqlrite "github.com/joaoh82/rust_sqlite/sdk/go"
)
db, _ := sql.Open("sqlrite", "foo.sqlrite")
// Path 1: env vars (nil cfg)
resp, err := sqlrite.Ask(db, "How many users are over 30?", nil)
// Path 2: explicit config
cfg := &sqlrite.AskConfig{
APIKey: "sk-ant-…",
Model: "claude-haiku-4-5",
MaxTokens: 512,
CacheTTL: "1h",
}
resp, _ := sqlrite.Ask(db, "count by age", cfg)
// Convenience: generate + execute
rows, _ := sqlrite.AskRun(db, "list active users", nil)
defer rows.Close()
// Context-aware variants for connection-pool acquisition
resp, _ = sqlrite.AskContext(ctx, db, "...", cfg)
rows, _ = sqlrite.AskRunContext(ctx, db, "...", cfg)
```
`(*AskConfig).String()` deliberately omits the API key value. See [`sdk/go/README.md`](../sdk/go/README.md) for the full reference.
### WASM SDK (the different one)
The WASM SDK is **the only surface that requires a backend you control.** Browsers can't call `api.anthropic.com` directly (CORS) and can't safely hold an API key (anyone with DevTools can read it). So the WASM SDK splits the work: the browser builds the prompt and parses the response, your backend does the HTTP call.
```bash
npm install @joaoh82/sqlrite-wasm
```
```js
import init, { Database } from '@joaoh82/sqlrite-wasm';
await init();
const db = new Database();
db.exec(`CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)`);
async function ask(question) {
// Step 1: build the LLM-API payload locally. No key needed.
const payload = db.askPrompt(question);
// Step 2: send to YOUR backend, which adds the key + forwards.
const apiResponse = await fetch('/api/llm/complete', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(payload),
}).then(r => r.text());
// Step 3: parse the model's reply.
return db.askParse(apiResponse);
}
const result = await ask('How many users are over 30?');
console.log(result.sql);
```
The backend proxy is ~10 lines on any modern serverless platform. **See [`docs/ask-backend-examples.md`](ask-backend-examples.md) for ready-to-deploy templates** — Cloudflare Workers, Vercel Edge Functions, Deno Deploy, Firebase Functions, and Node/Express.
A runnable end-to-end demo (browser + zero-dep Node proxy) lives at [`examples/wasm/`](../examples/wasm/) — `make build && make ask-demo`. See [`sdk/wasm/README.md`](../sdk/wasm/README.md) for the full reference.
### MCP server (as a tool for LLM clients)
The [`sqlrite-mcp`](mcp.md) server exposes `ask` as a tool that any [Model Context Protocol](https://modelcontextprotocol.io/) client (Claude Code, Cursor, `mcp-inspector`, etc.) can call without any glue code. The MCP server holds the API key in its own process environment — set `SQLRITE_LLM_API_KEY` in the MCP client's server-config `env` block once, and the `ask` tool works for every subsequent invocation.
```sh
# Install the MCP server:
cargo install sqlrite-mcp
```
In the MCP client config (Claude Code's `~/.claude.json` shown):
```json
{
"mcpServers": {
"sqlrite": {
"command": "sqlrite-mcp",
"args": ["/path/to/your.sqlrite"],
"env": { "SQLRITE_LLM_API_KEY": "sk-ant-…" }
}
}
}
```
Now any LLM driving that MCP client sees an `ask` tool alongside `query` / `execute` / `list_tables` / etc., and can call it with `{ "question": "...", "execute": true }` to generate-and-run in a single round trip. Per-call overrides (`model`, `max_tokens`, `cache_ttl`) are passed as additional tool arguments. See [`docs/mcp.md`](mcp.md) for the full tool reference and the other six tools the server exposes.
The `ask` tool is **gated behind the `ask` cargo feature** (default-on). If you want a leaner MCP binary with no LLM machinery (six pure-SQL tools only, no `ureq` / `rustls` in the dep tree), build with `cargo install sqlrite-mcp --no-default-features`.
---
## The shared `AskResponse` shape
Every surface returns the same logical structure (the field names are spelled idiomatically per language):
| `sql` | The generated SQL, ready to execute. Empty string if the model declined to generate SQL for the schema. |
| `explanation` | One-sentence rationale from the model. May be empty. |
| `usage.input_tokens` | Tokens billed as input on this call. |
| `usage.output_tokens` | Tokens billed as output on this call. |
| `usage.cache_creation_input_tokens` | Tokens written to the prompt cache (charged at ~1.25× normal). Non-zero on the first call against a fresh schema. |
| `usage.cache_read_input_tokens` | Tokens served from the prompt cache (charged at ~0.1× normal). **Use this to verify caching is working** — see [Prompt caching](#prompt-caching). |
---
## Errors and how they surface
| Missing API key | `ask()` call returns / throws | `"missing API key (set SQLRITE_LLM_API_KEY ...)"` |
| HTTP transport error (network) | `ask()` call returns / throws | `"HTTP transport error: <details>"` |
| LLM API 4xx/5xx | `ask()` call returns / throws | `"API returned status <code>: <Anthropic error type+message>"` |
| Model declined (empty SQL) | `ask()` returns response with `sql=""`; `ask_run()` raises with the model's explanation | `"model declined to generate SQL: <explanation>"` |
| Model output unparseable | `ask()` returns / throws | `"model output not valid JSON: <raw>"` |
The parser is tolerant — strict JSON, fenced JSON (`` ```json … ``` ``), and JSON-with-leading-prose all parse — because real LLM output drifts even with strict prompt instructions.
---
## Prompt caching
Anthropic's prompt cache lets the schema dump be served at ~10% of normal input cost on repeat calls. The schema block in `ask`'s prompt carries a `cache_control: ephemeral` marker that's stable across calls (alphabetical column order, byte-identical rules block) so the cache reliably hits.
**Verifying it works:**
```python
# Python — same shape in every SDK
resp1 = conn.ask("first question")
resp2 = conn.ask("second question")
assert resp2.usage.cache_read_input_tokens > 0 # cache hit on the schema
```
If `cache_read_input_tokens` stays zero across repeated asks against the same schema, something is invalidating the cache. Common culprits:
- Timestamps / UUIDs / random IDs leaking into the schema dump (the schema dump is byte-stable, but if you use `ALTER TABLE` between asks, the schema changes).
- Switching models between calls (cache is model-scoped).
- Switching the cache TTL (`5m` vs `1h` are separate cache entries).
For long-running editor / desktop sessions where the same DB is queried sporadically over an hour, set `cache_ttl` to `"1h"` — costs 2× write premium instead of 1.25× but stays alive between asks.
---
## Security notes
### Where the API key lives per surface
| REPL | Process env (`SQLRITE_LLM_API_KEY`) — visible to other processes owned by the same user, same as any tool you run from a shell. |
| Desktop | Tauri Rust backend's process env. Webview (the JS rendering process) never sees the key. |
| Rust library | Wherever you put it. Read from env, secrets manager, vault, etc. |
| Python / Node / Go SDKs | Wherever you put it. Same flexibility — env, secrets manager, runtime config. |
| WASM | **YOUR backend, never the browser tab.** The browser hands the prompt to your backend, the backend adds the key and forwards. See [`docs/ask-backend-examples.md`](ask-backend-examples.md). |
| MCP server | The spawned `sqlrite-mcp` process's environment. The MCP client sets it in its server-config `env` block once at spawn time; no tool call ever echoes the key back. See [`docs/mcp.md`](mcp.md). |
### What `__repr__` / `String()` / `toString()` shows
Every SDK's `AskConfig` representation **deliberately omits the API key value**. Printing the config in logs / debuggers / Jupyter cells / `console.log` won't leak the secret. Each shows a `<set>` / `<unset>` marker so you can tell whether a key is configured:
```
Python: AskConfig(provider="anthropic", model="claude-sonnet-4-6", max_tokens=1024, cache_ttl="5m", api_key=<set>)
Node.js: AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
Go: AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
```
### What `AskResponse` does NOT carry
The `AskResponse` returned to your code carries `{sql, explanation, usage}` — never the API key, never the request body, never the raw API response. Logging an `AskResponse` is safe.
---
## Provider support
Phase 7g.1–7g.7 ships with **Anthropic** as the only built-in provider. Per Phase 7 plan Q4, OpenAI and Ollama follow-ups are planned but not yet implemented. The internal `Provider` trait is open — Rust callers can supply a custom backend via `ask_with_schema_and_provider` (see `sqlrite-ask`'s docs).
**For non-Anthropic providers today, the WASM SDK already offers a clean path**: your backend translates the Anthropic-shaped payload to whatever provider it talks to (the `system` blocks and `messages` array map cleanly to OpenAI's `messages` field, for example). No SDK changes needed; provider variety lives entirely on your backend. Per-SDK native support for OpenAI/Ollama is tracked as a Phase 7g.x follow-up.
---
## Cost considerations
`ask`'s typical cost per call (Anthropic Sonnet 4.6, no cache hit on schema):
- **Input tokens**: ~3,000 — ~500 for the rules block, varies with your schema (a small DB ~500; a 20-table app DB might run 2,000+).
- **Output tokens**: ~50–150 — generated SQL + one-sentence explanation.
At Sonnet 4.6 pricing ($3/MTok input, $15/MTok output): roughly **$0.01 per first call**, then **~$0.001 per cached follow-up call** within the 5-minute TTL.
For high-volume scenarios:
- Set `model: "claude-haiku-4-5"` to drop cost ~3× at the price of slightly worse SQL on complex schemas.
- Set `cache_ttl: "1h"` if your DB is queried sporadically over an hour — costs 2× cache-write but keeps every subsequent call cheap.
- Inspect `resp.usage.cache_read_input_tokens` after each call to confirm caching is hitting.
---
## Limitations
- **No streaming.** `ask` waits for the full response. Streaming would complicate the confirm-and-run flow + the SDK return-type story for marginal UX gain on a small payload.
- **No multi-turn.** Stateless — every call is a fresh prompt. Conversational refinement ("now sort by age") is its own UX problem.
- **No parameter binding in generated SQL.** The model emits literal-inlined SQL, matching the engine's current parameter-binding story (deferred to Phase 5a.2 across the whole stack).
- **Anthropic only at the SDK layer today.** OpenAI / Ollama require translation on your own backend (clean path on the WASM SDK; Rust crate has the `Provider` trait open).
---
## See also
- [`docs/phase-7-plan.md`](phase-7-plan.md) §7g — design decisions + sub-phase breakdown.
- [`docs/ask-backend-examples.md`](ask-backend-examples.md) — ready-to-deploy backend proxies for the WASM SDK (Cloudflare Workers / Vercel Edge / Deno Deploy / Firebase / Express).
- [`sdk/python/README.md`](../sdk/python/README.md), [`sdk/nodejs/README.md`](../sdk/nodejs/README.md), [`sdk/go/README.md`](../sdk/go/README.md), [`sdk/wasm/README.md`](../sdk/wasm/README.md) — per-SDK API references.
- [`examples/wasm/`](../examples/wasm/) — runnable browser demo with the Ask flow.
- [`docs/embedding.md`](embedding.md) — Rust library embedding guide that includes `ConnectionAskExt::ask`.