# Stryke AI Primitives — Design Doc
> *`ai` is to stryke what `print` is to every other language: a builtin, two letters, ubiquitous, unlimited power.*
Stryke is designed as an AI-native language. AI is not a library, not a framework, not a third-party crate — it is a primitive of the language, the same way `print`, regex, and arrays are primitives. The dream pipeline:
```stryke
~> "summarize codebase into BOOK.pdf then open it" ai
```
That single expression: an agent loop that reads the codebase, generates a structured summary, renders it to PDF, and hands the file to the OS to open. No imports. No SDK setup. No `langchain` boilerplate. The language *is* the framework.
## Design Principles
1. **`ai` is a builtin, not a library.** No imports. Always available. Two letters because it gets typed thousands of times per program.
2. **Short and sweet, unlimited power.** The simple form is `ai $prompt`. The complex form composes from the same primitive — no separate "advanced API."
3. **Tools are functions.** Any stryke function with `tool` in its declaration becomes available to the agent loop with zero extra ceremony. Signature → JSON schema, docstring → description, function body → tool implementation.
4. **MCP-native.** Connecting to an MCP server is one line. Exposing your own MCP server is a single block.
5. **Provider-agnostic, Anthropic-first.** The same `ai` call hits Claude, GPT, Gemini, or a locally-linked llama.cpp model. Provider chosen by config, swappable at runtime.
6. **Local fallback always works.** Stryke binaries can ship with a small quantized model linked statically. `ai` works offline; quality scales with the configured backend.
7. **Cost-aware by default.** Caching, batching, parallelism, hard cost ceilings — built into the runtime, not bolted on.
8. **Deterministic in tests.** `ai_mock` blocks freeze responses for unit tests; flakiness is a config bug, not a fact of life.
9. **Composes with the rest of stryke.** Web framework handlers, package manager, cluster dispatch, effects (future), capabilities (future) — all touch the same `ai` primitive.
## The `ai` Builtin
Three forms, same primitive underneath:
```stryke
ai "summarize this", $document # function call
~> $document ai "summarize this" # thread macro (stryke design lineage)
Default semantics:
- Argument is a prompt string (and optional context value).
- All in-scope `tool fn` declarations are auto-registered as tools.
- All connected MCP servers are auto-attached.
- Agent loop runs to completion: tool call → tool result → next call → ... → final answer.
- Returns: `Str` in scalar context, `Stream<Str>` in iter context, typed value when assigned to a typed binding.
```stryke
my $r : Str = ai "what is 2+2" # "4"
for my $chunk in ai "write a story" { ... } # streaming
my $book : Book = ai "extract info", $pdf # auto-schema from type, validated parse
```
Configuration (defaults documented; everything overridable per call):
```stryke
ai "summarize", $doc,
model: "claude-opus-4-7",
system: "You are concise.",
max_turns: 10,
cache: true,
timeout: 30
```
## AI Collection Builtins
Treating AI as a control-flow primitive, not just an API call:
```stryke
@docs.ai_filter "is about cooking"
@articles.ai_map "summarize in one sentence"
@candidates.ai_sort by: "relevance to backend engineering"
@items.ai_classify into: ["urgent", "normal", "ignore"]
@names.ai_dedupe "treat misspellings as the same person"
if $email ai_match "spam" { discard }
elsif $email ai_match "urgent customer support" { route_to_team }
```
Each of these compiles to a single batched LLM call across the collection where possible (one prompt, list of items, list of judgments back). Cost-conscious by construction — `ai_filter @[1000]` is one call, not 1000 calls.
Use sparingly. Each call costs money. The compiler tracks predicted cost statically when constants are involved and warns on hot loops. `ai_*` inside a `for` loop in production code is a lint.
## Lower-Level AI Builtins
When the agentic `ai` is too much, drop down:
```stryke
my $r = prompt "explain quantum", model: "claude-opus-4-7" # single shot, no tools
my $r = stream_prompt "write a story" # streaming generator
my @vec = embed "hello world" # single embedding
my @vecs = embed @docs # batched
my $resp = chat $messages, model: "claude-opus-4-7" # explicit message list
# Audio (OpenAI Whisper / TTS)
my $text = ai_transcribe "podcast.mp3", model => "whisper-1", language => "en"
my $bytes = ai_speak "Hello, world.", voice => "alloy", output => "out.mp3"
# Image generation (OpenAI DALL-E 3 / gpt-image-1)
my $png = ai_image "cyberpunk neon-lit alley", model => "dall-e-3",
size => "1024x1024", quality => "hd", output => "alley.png"
# Provider catalog
my $catalog = ai_models "openai" # arrayref of model IDs
```
These are the building blocks `ai` itself is built on. Available when you want explicit control over the LLM interaction.
## Tool Functions
Mark a function as agent-callable with `tool`:
```stryke
tool fn weather($city: Str) -> Str "Get current weather for a city" {
fetch "https://api.weather.com/" . uri_encode($city)
}
tool fn search_kb($query: Str, $limit: Int = 10) -> List<Doc> "Search the knowledge base" {
sql%{ SELECT * FROM kb WHERE content @@ to_tsquery($query) LIMIT $limit }
}
```
At build time:
1. Signature → JSON schema (parameters with types and constraints).
2. Docstring → tool description.
3. Function → invocable tool entry.
4. All `tool fn` definitions in the current scope are auto-registered.
Compare to current TypeScript/Python practice (define function → re-define schema by hand → re-define description in the schema → wire into `tools=[...]`): stryke collapses that ritual to one declaration.
## MCP Servers (Declarative)
Expose stryke functions as an MCP server with one block:
```stryke
mcp_server "filesystem" {
transport :stdio # or :ws port: 3000, :http port: 3000
tool read_file($path: Str) -> Str "Read file contents" {
slurp $path
}
tool list_dir($path: Str) -> List<Str> "List directory entries" {
readdir $path
}
resource "file://*" -> Str {
slurp $self.uri.path
}
prompt "summarize" "Summarize text" {
args $text: Str
"Summarize this concisely:\n#{$text}"
}
}
```
Compiles to a spec-compliant MCP server. The `s build --mcp-server` flag emits a standalone binary exposing the server, separate from your main app binary.
## MCP Clients
Connect, discover, call:
```stryke
my $fs = mcp_connect "stdio:/usr/local/bin/fs-mcp"
my $gh = mcp_connect "https://api.github.com/mcp"
my $pg = mcp_connect "ws://localhost:9000"
my @tools = $fs.tools
my @resources = $fs.resources
my @prompts = $fs.prompts
my $contents = $fs.tool(:read_file, path: "/tmp/x")
my $config = $fs.resource("file:///etc/app.toml")
my $msg = $fs.prompt(:summarize, text: $long_text)
```
Connected MCP servers are visible to subsequent `ai` calls automatically — no re-registration step.
## Agents (Composed)
For explicit control over the agent loop:
```stryke
my $agent = ai_agent
.mcp("filesystem", "stdio:/usr/local/bin/fs-mcp")
.mcp("github", "https://api.github.com/mcp")
.tools(&internal_search, &slack_post)
.system("You are a senior backend engineer reviewing changes")
.max_turns(20)
.max_cost_usd(0.50)
my $review = $agent.run("review PR 4523 against our coding standards")
```
The bare `ai $prompt` is `ai_agent.run($prompt)` with sensible defaults. Same primitive, different ergonomics.
## Provider Architecture — **SHIPPED**
Configuration in `stryke.toml`:
```toml
[ai]
api_key_env = "ANTHROPIC_API_KEY"
cache = true
max_cost_run = 1.00 # USD hard ceiling per program run
[ai.openai]
api_key_env = "OPENAI_API_KEY"
model = "gpt-4o"
[ai.ollama]
base_url = "http://localhost:11434"
model = "llama3.2"
[ai.openai_compat] # LM Studio / vLLM / llama-server / any OpenAI-shaped local server
base_url = "http://localhost:1234/v1/chat/completions"
api_key_env = "STRYKE_AI_LOCAL_KEY" # optional; many local servers ignore auth
model = "local-model"
[ai.gemini]
api_key_env = "GOOGLE_API_KEY" # also accepts GEMINI_API_KEY
model = "gemini-2.5-flash"
[ai.routing]
embed = "voyage" # different providers per operation
classify = "ollama" # cheap ops go local
```
The runtime picks the provider per call based on config. All providers expose a uniform interface; provider-specific extensions (Anthropic prompt caching, OpenAI streaming function calls, etc.) are accessible through provider-namespaced options when needed.
**Provider matrix (shipped today):**
| Anthropic | `anthropic`, `claude` | `ANTHROPIC_API_KEY` | Cache, extended thinking, vision, PDF, batch |
| OpenAI | `openai`, `gpt` | `OPENAI_API_KEY` | Streaming, function calls, embeddings, Whisper, TTS |
| Ollama | `ollama` | (none — local) | Native `/api/generate`; no cost tracking |
| OpenAI-compatible | `openai_compat`, `compat`, `local`| `STRYKE_AI_LOCAL_KEY` (optional) | LM Studio / vLLM / llama-server; configurable base_url |
| Google Gemini | `gemini`, `google` | `GOOGLE_API_KEY` or `GEMINI_API_KEY` | `gemini-2.5-flash` default |
| Voyage AI | (embed default) | `VOYAGE_API_KEY` | Default embedding provider |
Override the base URL at runtime with `STRYKE_AI_BASE_URL=...` for the `openai_compat` family — useful when pointing at any OpenAI-shaped endpoint without editing config.
**Audio surface** — Whisper transcription + OpenAI TTS:
```perl
my $text = ai_transcribe("podcast.mp3", model => "whisper-1", language => "en");
my $bytes = ai_speak("Hello, world.", voice => "alloy", model => "tts-1", format => "mp3", output => "out.mp3");
```
`ai_speak` returns the raw audio bytes; pass `output => "path"` to also write to disk.
## Cost & Latency
| Repeated identical calls | Result cache keyed on `(provider, model, prompt, system, tools, params)` |
| Repeated system prompts | Provider-side prompt caching where supported (Anthropic) |
| Many small calls | Automatic batching for `ai_map`/`ai_filter`/`ai_classify` |
| Streaming UX | `stream_prompt` / `ai` returns `Stream<Str>` in iter context |
| Parallelism | `@docs.pmap |$d| { ai "summarize", $d }` runs N in parallel up to provider rate limit, automatic backpressure |
| Cost ceiling | `max_cost_run` aborts the program before an expensive call |
| Cost introspection | `ai_cost` returns running USD spent in current scope |
| Token estimation | `tokens_of($text)` for pre-flight token counts |
## Determinism in Tests
```stryke
test "summarize trims to 50 words" {
ai_mock {
prompt "summarize", _ => "Lorem ipsum dolor sit amet, consectetur..."
} {
is words(summarize($doc)).count, 50
}
}
```
`ai_mock` intercepts every AI primitive in scope. Patterns match prompts (regex, exact, glob, predicate); responses can be strings, structured values, or generator functions. Tests are deterministic, fast, and free.
CI runs `s test` with `STRYKE_AI_MODE=mock-only` set — any unmatched live AI call fails the build. Live AI in tests requires `STRYKE_AI_MODE=live` explicitly.
## Composition with the Rest of Stryke
**With the web framework:**
```stryke
class ChatController < Controller {
fn stream() {
sse_stream { |stream|
for my $chunk in stream_prompt $params.prompt {
stream.send($chunk)
}
}
}
fn ask() {
my $answer = ai $params.q,
tools: [&search_db, &fetch_docs],
system: "You are our product expert."
render :json, answer: $answer
}
}
```
**With the package manager:**
A package can mark itself as MCP-exposable:
```toml
# stryke.toml
[mcp]
expose_module = "lib::api" # all `tool fn`s in this module become MCP tools
```
```bash
s build --mcp-server # → target/release/myapp-mcp (standalone server)
s build --release # → target/release/myapp (regular app)
```
Every stryke library can publish itself as an MCP server with one flag.
**With cluster dispatch:**
```stryke
my @summaries = cluster_dispatch @docs |$d| { ai "summarize", $d }
```
AI calls fanout across cluster nodes; each node uses its own provider config; results aggregate back. Combined with the cost ceiling, this is rate-limit-aware distributed AI work.
**With effects (when shipped):**
`ai` becomes `Effect::AI`. Effect handlers control model/cache/retry/cost in one place:
```stryke
when op.cost_estimate > 0.10 { return k(cached_or_skip(op)) }
return k(default_handler(op))
} {
ai "summarize", $doc
ai "translate to French", $r
}
```
**With capabilities (when shipped):**
`ai` requires `AICap`. A library can't make AI calls unless given the capability:
```stryke
fn process_doc($doc, $ai_cap: AICap) {
ai_cap.run "summarize", $doc
}
```
Stops compromised packages from quietly running up an LLM bill.
## Implementation Phases
### Phase 0 — Walking Skeleton — **SHIPPED**
Lives in `strykelang/ai.rs`. Wired through `builtins.rs`. Builtins:
| `ai($prompt, opts...)` / `prompt($prompt, opts...)` | Single-shot, no tools yet (agent loop is Phase 1) |
| `stream_prompt($prompt, opts)` | Returns full text in v0; real `Stream<Str>` is Phase 5 |
| `chat($messages, opts...)` | Message-list with role=system/user/assistant |
| `embed($text)` / `embed(@texts)` | Voyage AI default, OpenAI alt |
| `tokens_of($text)` | char/4 heuristic (good-enough pre-flight) |
| `ai_cost()` | `+{usd, input_tokens, output_tokens, embed_tokens, cache_hits, cache_misses}` |
| `ai_cache_clear()` / `ai_cache_size()` | In-process result cache, sha256-keyed on `(provider, model, system, prompt)` |
| `ai_mock_install($pattern, $response)` / `ai_mock_clear()` | Regex-keyed mock interceptor; first-match-wins |
| `ai_config_get($key)` / `ai_config_set($key, $val)` | Read/write of the loaded `[ai]` table |
| `STRYKE_AI_MODE=mock-only` | Errors any unmocked call — for `s test` |
Providers actually wired: **Anthropic**, **OpenAI** (Messages + Chat
Completions), **Voyage** + **OpenAI** for embeddings. TOML config from
`./stryke.toml`; falls back to env vars. Pricing table embedded for
the 4 major model families so `ai_cost()` is meaningful without a
provider invoice round-trip.
What's intentionally NOT in Phase 0:
- Tool / agent loop (`tool fn` keyword needs parser work — Phase 1)
- MCP client / server (Phase 2)
- Collection builtins (`ai_filter`, `ai_map`, …) (Phase 3)
- llama.cpp local backend (Phase 4)
- Real `Stream<Str>` streaming (Phase 5)
- Auto-tool-attachment of MCP servers (Phase 2 prerequisite)
### Phase 1 — Agent loop — **SHIPPED (without `tool fn` keyword)**
The agent loop runtime is in place. Without parser work for the `tool
fn` declaration, tools are passed explicitly as a hashref list:
```stryke
my $report = ai "research X across our docs and Hacker News",
tools => [
+{ name => "kb_search",
description => "Search internal knowledge base",
parameters => +{ q => "string", limit => "int" },
run => sub { search_kb($_[0]->{q}, $_[0]->{limit}) } },
+{ name => "fetch_url",
description => "Fetch a URL and return text",
parameters => +{ url => "string" },
run => sub { fetch($_[0]->{url}) } },
],
max_turns => 10,
system => "You are a senior engineer."
```
Bare `ai $prompt` keeps the Phase 0 single-shot semantics; `ai $prompt,
tools => [...]` auto-routes to the agent loop. Both Anthropic
(`tool_use`/`tool_result`) and OpenAI (`tool_calls`/`tool` role)
protocol shapes are wired. Mock mode short-circuits the loop and
returns a mocked final string for tests.
`tool fn name(...) -> Type "doc" { ... }` (auto-schema from signature
+ docstring, auto-registration of all in-scope tools) is the parser
extension — still Phase 1 to-do.
### `tool fn` keyword — **SHIPPED**
```stryke
tool fn weather($city: string) "Get current weather for a city" {
"sunny in $city"
}
tool fn add_nums($a: int, $b: int) "Add two integers" {
$a + $b
}
# bare ai($prompt) auto-attaches every tool fn defined in scope:
my $r = ai("what's the weather in Tokyo?");
```
How it ships: a source-level pre-pass in `strykelang/ai_sugar.rs`
rewrites `tool fn NAME(args) "doc" { body }` into:
```stryke
fn NAME { my $__args__ = $_[0]; my $city = $__args__->{city}; ...body... }
ai_register_tool("NAME", "doc", +{ city => "string" }, \&NAME);
```
Param types from the `: Type` annotation become the JSON Schema sent
to the model. Optional `-> ReturnType` is parsed and ignored (used
purely for documentation). Optional docstring is the model-visible
description. Inside the body, params are bound as named locals so
`tool fn weather($city: string) { ... $city ... }` works the same way
the user wrote it.
### `mcp_server "name" { ... }` declarative DSL — **SHIPPED**
```stryke
mcp_server "filesystem" {
tool read_file($path: string) "Read file contents" {
slurp $path
}
tool list_dir($path: string) "List directory entries" {
join("\n", grep { !/^\./ } readdir $path)
}
}
```
Same source-level pre-pass: `mcp_server "name" { tool A() "..." {...}
tool B() "..." {...} }` rewrites to a private set of `fn _mcp_name_A_0
{...}` / `fn _mcp_name_B_1 {...}` declarations followed by
`mcp_server_start("name", +{ tools => [+{name, description, parameters,
run => \&_mcp_..._0}, ...] });`. Round-trip verified end-to-end
against a stryke client.
### Tool registry (Phase 1 sugar) — **SHIPPED**
For people who don't want to wait on the `tool fn` parser keyword,
register tools at runtime:
```stryke
ai_register_tool(
"weather", "Get weather for a city",
+{ city => "string" },
sub { fetch("https://api.weather.com/" . uri_encode($_[0]->{city})) }
);
# Bare `ai($prompt)` now auto-routes to the agent loop and sees this tool:
my $r = ai("what's the weather in SF?");
```
| `ai_register_tool($name, $desc, +{params}, sub { ... })` | Add an always-on tool; idempotent re-register |
| `ai_unregister_tool($name)` | Remove |
| `ai_clear_tools()` | Wipe registry |
| `ai_tools_list()` | Inspect what's registered |
Bare `ai($prompt)` auto-routes to the agent loop when registered tools
are non-empty OR `tools => [...]` is passed OR `auto_mcp => 1` (default)
and an MCP server is attached.
### Memory / RAG — **SHIPPED**
Sqlite-backed embedding store. Save text + embedding, recall by cosine
similarity. In-memory by default; persistent with `path => "memory.db"`.
| `ai_memory_save("id", "content", $metadata?, $path?)` | Embed + insert (idempotent on id) |
| `ai_memory_recall("query", top_k => N)` | Re-embed query, return top-k by cosine |
| `ai_memory_forget("id")` / `ai_memory_count()` / `ai_memory_clear()` | Maintenance |
Mock-mode hash-embeds deterministically so tests round-trip without
the network. Verified: 4 docs saved, query "fast systems-level" picks
"Stryke is the fastest Perl-5 interpreter" at score 0.7679 over the
other three.
### Real `Stream<Str>` iter context — **SHIPPED**
`stream_prompt($p)` now returns a `PerlIterator`-backed handle that
yields one text-delta chunk per `next()`:
```stryke
my $stream = stream_prompt("write a haiku");
for my $chunk ($stream) {
print $chunk;
STDOUT->flush;
}
```
Each call to `next_item` reads the next SSE delta from the live
Anthropic connection. Mock mode falls back to a char-chunked iterator
so tests can drive the same `for my $chunk in (...)` loop without the
network. The on_chunk callback form (`stream_prompt($p, on_chunk =>
sub { … })`) still works for push-style consumers.
### Streaming with `on_chunk` — **SHIPPED**
Real Anthropic SSE parsing. Pass `on_chunk => sub { … }` and the
callback fires once per delta chunk; the full text is also returned at
the end:
```stryke
my $state = +{ buf => "" };
my $full = stream_prompt("write a haiku",
on_chunk => sub { $state->{buf} .= $_[0]; print $_[0] }
);
```
Stryke gotcha: closures capture *scalars* by value. Mutate state
through a hashref/arrayref (heap-shared) so the outer scope sees it.
String concat on `my $buf` inside the closure won't propagate; pushing
into `@{$state->{chunks}}` will.
### Structured output — **SHIPPED**
```stryke
my $r = ai("Extract user info from: Alice, 30, active",
schema => +{ name => "string", age => "int", active => "bool" });
# $r is a hashref: { name => "Alice", age => 30, active => 1 }
```
`ai($p, schema => +{...})` auto-routes to `ai_extract`. The schema hashref
maps field names to coercion types (`string`/`int`/`number`/`bool`/`array`/
`object`). Builds a JSON-only prompt, walks the response for the first
balanced `{...}`, parses, validates + coerces to the schema. Returns a
real stryke hashref ready for field access.
### Anthropic prompt caching — **SHIPPED**
```stryke
my $r = ai("question", system => $long_system_prompt, cache_control => 1);
# Subsequent calls with the same system block read from cache at ~10%
# of normal input cost.
```
The runtime sets `cache_control: { type: "ephemeral" }` on the system
block when `cache_control => 1` is set. `ai_cost()` now also returns
`cache_creation_tokens` / `cache_read_tokens` so spend is accurate
(creation +25%, reads -90% vs normal input).
### Extended thinking — **SHIPPED**
```stryke
my $answer = ai("hard math problem",
thinking => 1, thinking_budget => 8000);
my $reasoning = ai_last_thinking(); # full thinking trace
```
When `thinking => 1` is set the request includes the Anthropic
extended-thinking block; the model's reasoning is captured separately
from the answer and surfaced via `ai_last_thinking()`.
### PDF / document input — **SHIPPED**
```stryke
my $summary = ai("summarize this contract", pdf => "/path/to/contract.pdf");
my $extract = ai("extract terms", pdf => "https://example.com/doc.pdf");
my $direct = ai("read", pdf => $raw_bytes);
```
`ai($p, pdf => $path|$url|$bytes)` auto-routes to `ai_pdf`, which builds
an Anthropic `document` content block (base64-inlined PDF, up to
32MB / 100 pages).
### Scoped budget — **SHIPPED**
```stryke
ai_budget(0.50, sub {
my @summaries = ai_map(\@long_docs, "summarize");
my @ranked = ai_sort(\@summaries, "by relevance to backend engineering");
return \@ranked;
});
# Errors if total spend during the block exceeds $0.50.
```
Per-block USD cap. Enforces by snapshotting current cost on entry,
raising the global ceiling to `snapshot + cap` for the duration, and
checking spend on exit. Restores the prior global cap unconditionally.
### Convenience wrappers — **SHIPPED**
| `ai_summarize($text, words => 50)` | Concise summary at target length |
| `ai_translate($text, to => "Spanish")` | Translation |
| `ai_extract($prompt, schema => +{...})` | Structured JSON output (also auto-routed via `ai($p, schema => ...)`) |
### Built-in tools — **SHIPPED**
Drop-in tool specs ready for the agent loop, no `run` coderef needed
because they route through a native registry:
```stryke
my $r = ai("research the latest stryke release notes",
tools => [
web_search_tool(), # uses BRAVE_SEARCH_API_KEY if set, else DDG
fetch_url_tool(), # HTTP GET, returns body text
read_file_tool(), # local FS read
run_code_tool(), # python3 subprocess, 10s timeout
]);
```
The `run_code_tool` shells to `python3` so a Python interpreter must
be on the path; works fine on every modern Linux/macOS dev box.
| `web_search_tool` | Brave Search API (auth via `$BRAVE_SEARCH_API_KEY`) → DuckDuckGo HTML scrape fallback |
| `fetch_url_tool` | `ureq` GET with 30s timeout |
| `read_file_tool` | `std::fs::read_to_string` |
| `run_code_tool` | `python3` subprocess, 10s timeout, returns stdout+stderr |
### Conversational sessions — **SHIPPED**
```stryke
my $s = ai_session_new(system => "Be terse", model => "claude-haiku-4-5");
ai_session_send($s, "what's 2+2?");
ai_session_send($s, "and times 3?");
my $hist = ai_session_history($s); # arrayref of {role, content}
ai_session_reset($s); # clear history but keep config
ai_session_close($s); # drop session
```
Multi-turn chat that auto-tracks role=user / role=assistant turns.
Provider/model picked at session creation, can be overridden per
`send` call.
### Prompt templates — **SHIPPED**
{% raw %}
```stryke
my $p = ai_template("hi {name}, age {age}", name => "Alice", age => 30);
# → "hi Alice, age 30"
ai_template("escaped {{lit}}, real {key}", key => "yes");
# → "{lit}}, real yes" ({{ → literal {, missing keys pass through)
```
{% endraw %}
Pure string substitution. No code execution. Use as the prompt arg to
`ai`/`prompt`/`chat`.
### Retry / backoff — **SHIPPED**
Anthropic calls (single-shot AND streaming AND vision AND PDF) auto-
retry on `429` / `500` / `502` / `503` / `504` with exponential
backoff (1s → 2s → 4s → 8s → 16s, capped at 30s). 4 attempts total
before giving up. Transport errors (network blips) also retry.
### Routing actually honored — **SHIPPED**
`ai_routing_set("embed", "openai")` now actually switches embedding
calls to OpenAI's `text-embedding-*` endpoint instead of the default
Voyage. The route table is consulted before falling back to the
`[ai.embed]` TOML config or the `embed_provider` default.
### CLI — **SHIPPED**
```bash
stryke ai "summarize the linux kernel in 50 words"
stryke ai "structured" --json # emit {response, usd, input_tokens, output_tokens}
```
`stryke ai PROMPT` reads from argv or stdin, calls the configured
model, prints to stdout. Honors `--model`, `--system`, `--stream`,
`--json`. Useful as a UNIX filter or one-shot from terminal.
### Vision (multimodal images) — **SHIPPED**
```stryke
my $caption = ai("describe this image", image => "/path/to/photo.jpg");
my $alt = ai("describe", image => "https://example.com/img.png");
my $hex = ai("describe", image => $raw_bytes);
```
Routes to `ai_vision`, which builds an Anthropic content array with a
base64-inlined image block (URLs fetched first, paths read, raw bytes
encoded directly). Mime-type guessed from extension. Cost tracking
runs through the same path as text calls.
### MCP server (programmatic) — **SHIPPED**
The declarative `mcp_server "name" { ... }` parser block is still
deferred (needs the same parser work as `tool fn`), but the runtime is
fully wired. Stand up a server with one builtin call:
```stryke
mcp_server_start("stryke-srv", +{
tools => [
+{ name => "echo", description => "Echo input",
parameters => +{ text => "string" },
run => sub { $_[0]->{text} } },
+{ name => "uppercase", description => "Uppercase text",
parameters => +{ text => "string" },
run => sub { uc($_[0]->{text}) } },
]
});
```
Runs a stdio JSON-RPC loop on stdin/stdout, exposes
`initialize` / `tools/list` / `tools/call`. Verified end-to-end with
a stryke client connecting to a stryke server: tools enumerate, calls
round-trip, results return. The same binary that runs your stryke
script can now BE an MCP server — pair with `s_web build` to ship a
self-contained MCP server binary.
### MCP HTTP transport — **SHIPPED**
```stryke
my $gh = mcp_connect("https://api.github.com/mcp");
my $tavily = mcp_connect("https://mcp.tavily.com/mcp");
```
Speaks the streamable-HTTP MCP transport: POST per request, accept
`application/json` OR `text/event-stream` (SSE) responses, carry
`mcp-session-id` across calls when the server sets one. Reads
bearer auth from `$MCP_BEARER_TOKEN` if set.
### OpenAI streaming — **SHIPPED**
`stream_prompt($p, on_chunk => sub { ... }, provider => "openai")`
now parses OpenAI's SSE delta format too. Same callback contract as
the Anthropic path — fires once per text-delta chunk.
### Anthropic batch API — **SHIPPED**
```stryke
my $results = ai_batch(\@prompts,
model => "claude-haiku-4-5",
system => "Be terse",
poll_secs => 5,
max_wait_secs => 1800);
# 50% of normal cost; trades a few minutes of wall time.
```
Submits the batch, polls `processing_status` until `ended`, downloads
JSONL results, reorders by `custom_id`. Cost tracking applies the
~50% batch discount automatically. Falls back to sequential calls if
the batch endpoint errors (region/account-gated) or
`STRYKE_AI_BATCH=sync` is set.
### Cluster fanout — **SHIPPED**
```stryke
my $cluster = cluster(["host1:8", "host2:8"]);
my @summaries = @{ ai_pmap(\@docs, "summarize",
cluster => $cluster, model => "claude-haiku-4-5") };
```
Splits items into N shards (N = cluster slot count), runs `ai_map` on
each shard via the existing `pmap_on` plumbing, concatenates results
in order. Without a `cluster => ...` arg, falls back to a single local
`ai_map` call (one batched LLM request).
### Phase 2 — MCP client — **SHIPPED (server side still pending)**
Lives in `strykelang/mcp.rs`. Speaks JSON-RPC line-delimited over
stdio. Builtins:
| `mcp_connect("stdio:CMD ARGS...", $name?)` | Spawn subprocess, run `initialize` + `notifications/initialized` handshake, return handle |
| `mcp_tools($h)` / `mcp_resources($h)` / `mcp_prompts($h)` | Cached `*/list` results |
| `mcp_call($h, $name, +{...args})` | `tools/call` |
| `mcp_resource($h, $uri)` | `resources/read` |
| `mcp_prompt($h, $name, +{...args})` | `prompts/get` |
| `mcp_close($h)` | Kill subprocess, drop registry slot |
| `mcp_attach_to_ai($h)` / `mcp_detach_from_ai($h)` | Mark a handle as auto-attachable so the agent loop can pull its tools |
| `mcp_attached()` | List of currently-attached handles |
Smoke-tested against a 100-line Python fake-server implementing
`initialize`, `tools/list`, `tools/call`, `resources/list`,
`resources/read`, `prompts/list`, `prompts/get`. Handshake +
caching + every method round-trip works.
Transports NOT yet wired:
- `ws://...` (WebSocket — needs a tungstenite dep)
- `http://...` (streaming HTTP — needs SSE)
The **server** side — declarative `mcp_server "name" { tool foo … }`
DSL — needs the same parser extension as `tool fn`. Not in this pass.
### Phase 3 — Collection builtins — **SHIPPED**
Each one builds a single batched prompt asking the model for a JSON
array of judgments, then parses the response. One LLM call per
collection, not N.
| `ai_filter(\@items, "criterion")` | Boolean per item | Filtered arrayref |
| `ai_map(\@items, "instruction")` | String per item | Mapped arrayref |
| `ai_classify(\@items, "label hint", into => [\"a\",\"b\"])` | Label per item | Arrayref of labels |
| `ai_match($item, "criterion")` | Single boolean | 0 or 1 |
| `ai_sort(\@items, "criterion")` | Index array (best-first) | Reordered arrayref |
| `ai_dedupe(\@items, "hint")` | Group of indexes per cluster | Deduped arrayref |
JSON-array extraction is forgiving: walks the response looking for the
first balanced `[ ... ]` so the model can wrap output in prose without
breaking the parse.
### Retrieval / vector ops — **SHIPPED**
| `vec_cosine(\@a, \@b)` | Cosine similarity in `[-1, 1]` |
| `vec_search(\@query, \@candidates, top_k => N)` | Arrayref of `+{idx, score}`, ranked best-first |
| `vec_topk(\@scores, $k)` | Indexes of top-k scalars |
Verified on the unit basis: `cos([1,0,0],[1,0,0])=1.0000`,
`cos([1,0,0],[0,1,0])=0.0000`, `cos([1,0,0],[-1,0,0])=-1.0000`. `vec_search`
ranks `[1,0,0]` against four candidates as `1 (id), 2 (45°), 0 (orth)`
with scores `1.000, 0.707, 0.000`.
### Cost / routing / history — **SHIPPED**
| `ai_estimate($prompt, model => "...", out_tokens => N)` | Pre-flight USD estimate from token heuristic + price table |
| `ai_routing_get($op)` / `ai_routing_set($op, $provider)` | Per-operation provider override (advisory; embed honors it) |
| `ai_history()` | Arrayref of last 100 calls — `+{provider, model, prompt, response_chars, input_tokens, output_tokens, usd, cache_hit, unix_time}` |
| `ai_history_clear()` | Reset history |
### Phase 1 — Tools and Agents (months 2-4)
- `tool fn` declaration with schema generation.
- `ai` builtin: agent loop using local `tool fn`s.
- `ai_mock` for tests.
- OpenAI provider added.
### Phase 2 — MCP (months 4-6)
- `mcp_connect` client.
- `mcp_server` declarative DSL.
- `s build --mcp-server` flag.
- Auto-attachment of connected MCP servers to `ai`.
### Phase 3 — Collection AI Builtins (months 6-8)
- `ai_filter`, `ai_map`, `ai_classify`, `ai_sort`, `ai_match`, `ai_dedupe`.
- Automatic batching.
- Predicted-cost static analysis.
- Hot-loop lints.
### Phase 4 — Local Models and Multi-Provider — **SHIPPED (in-process llama.cpp deferred)**
- ✅ Ollama provider (native `/api/generate`).
- ✅ OpenAI-compatible provider (`openai_compat`/`compat`/`local`) — LM Studio, vLLM, llama-server, any OpenAI-shaped server. Configurable `STRYKE_AI_BASE_URL`.
- ✅ Google Gemini provider (`gemini`/`google`) with `GOOGLE_API_KEY`/`GEMINI_API_KEY` auth.
- ✅ Whisper transcription (`ai_transcribe`).
- ✅ OpenAI TTS (`ai_speak`).
- ✅ Image generation (`ai_image`) — DALL-E 3 + gpt-image-1 via `/v1/images/generations`. Returns raw PNG bytes for `n=1`, arrayref for `n>1`. Supports `output => "out.png"` to also write to disk.
- ✅ Image editing (`ai_image_edit`) — `/v1/images/edits` with optional mask (PNG with transparent regions marking edit area).
- ✅ Image variations (`ai_image_variation`) — `/v1/images/variations` (DALL-E 2 only — gpt-image-1 doesn't expose variations).
- ✅ Cost dashboard (`ai_dashboard()`) — ANSI multi-line summary of running cost, token in/out, embed tokens, prompt-cache write/read, result-cache hit ratio.
- ✅ Pricing lookup (`ai_pricing($model)`) — pre-flight per-1k-token costs the runtime uses.
- ✅ Vision wrapper (`ai_describe`) — convenience over `ai_vision` with `style => "concise"|"detailed"|"alt"` presets.
- ✅ Multi-document grounded responses (`ai_grounded $p, documents => [@paths]`) — Anthropic citations auto-enabled across mixed PDF + text + inline-string corpora.
- ✅ Session persistence (`ai_session_export($h)` / `ai_session_import($json)`) — round-trip a multi-turn chat across runs.
- ✅ Local embeddings (Ollama `/api/embed`) — third embed provider alongside Voyage and OpenAI. Default model `nomic-embed-text`. $0 cost.
- ✅ Moderation (`ai_moderate`) — OpenAI `/v1/moderations`, free endpoint, returns `flagged`/`categories`/`scores`.
- ✅ Anthropic Files API (`ai_file_anthropic_upload`/`list`/`delete`) — beta `/v1/files` endpoint (`files-api-2025-04-14`).
- ✅ Text chunking (`ai_chunk`) — sliding-window or sentence-aware splitter for RAG. Pure local logic.
- ✅ Warmup / auth verification (`ai_warm`) — 1-token ping, returns `ok`/`latency_ms`/`error`.
- ✅ Semantic comparison (`ai_compare`) — single LLM call returning structured `winner`/`reason`/`scores` JSON.
- ✅ CLI modal flags (`stryke ai --image|--transcribe|--speak`) — UNIX-filter surface now covers chat, image, audio.
- ✅ Live model catalog (`ai_models($provider)`) — queries `/v1/models` (OpenAI/Anthropic), `/api/tags` (Ollama), `/v1beta/models` (Gemini).
- ✅ Routing config (`[ai.routing]` table).
- ⏳ In-process llama.cpp linked + embedded model — deferred. Shell-out via Ollama or LM Studio is the supported local path today.
### Citations — **SHIPPED**
Anthropic's grounded-response surface. Pass `citations => 1` to `ai_pdf` (and friends, as wiring extends) and the document blocks get `citations: { enabled: true }`. The model's text response carries citations attached to text blocks; we accumulate them into a thread-local buffer and surface them via `ai_citations()`.
```stryke
my $r = ai_pdf "Summarize the contract", pdf => "/legal/agreement.pdf",
citations => 1, title => "Master Services Agreement";
for my $cite (@{ ai_citations() }) {
say " [$cite->{title}] p$cite->{start_page}-$cite->{end_page}: $cite->{text}";
}
```
Each citation hashref carries: `type`, `text` (the cited string), `title`, `document_index`, `start`/`end` char offsets (text docs) or `start_page`/`end_page` page numbers (PDFs).
### Files API — **SHIPPED**
OpenAI's `/v1/files` for uploading reference files (used by Whisper, Vision, Batch, Assistants):
```stryke
my $f = ai_file_upload "podcast.mp3", purpose => "user_data";
say "uploaded $f->{id} ($f->{bytes} bytes)";
my $files = ai_file_list();
for my $f (@$files) { say "$f->{id} $f->{filename} $f->{purpose}" }
ai_file_delete($f->{id}) or die "delete failed";
```
Builtins: `ai_file_upload`, `ai_file_list`, `ai_file_get`, `ai_file_delete`. All return native hashrefs/arrayrefs (JSON → PerlValue conversion is recursive).
### Phase 5 — Composition (months 10-12)
- Web framework integration polished (streaming SSE handlers, structured-output endpoints).
- Cluster dispatch over `ai` calls.
- Cost ceilings and budgets.
- Public benchmark suite (latency per provider, tokens/sec, cost-per-1K-calls).
## Non-Goals
- LangChain compatibility. Stryke is the framework; we don't wrap a Python framework.
- Vendor-specific exhaustive APIs. Each provider exposes its full feature surface through namespaced options, but core primitives (`ai`, `prompt`, `embed`, `chat`) stay uniform.
- Hosted vector DB. Local sqlite-vec ships in-binary; users bring their own remote vector DB (pgvector, Pinecone, etc.) through normal SQL/HTTP.
- Auto-prompt-engineering. `ai` runs the prompt as written. No hidden rewrites, no "we'll improve your prompt for you" surprises. Reproducibility over cleverness.
- Visual agent builders / no-code UIs. Stryke is a programming language; the agent IS the code.
## Open Questions
1. **Sigil syntax for AI calls.** Should `ai` always be called as a function, or should there be a sigil form (e.g. `&"summarize this"`) for the most-common case? Trade-off: terseness vs. parser ambiguity. Default position: function form only, `~>` thread-macro is the terse path.
2. **Streaming default.** Should `ai` return `Stream<Str>` by default and require collection (`ai $p .collect`) for `Str`, or return `Str` by default? Default position: context-sensitive (scalar context = `Str`, iter context = `Stream<Str>`).
3. **Effect type granularity.** Is `Effect::AI` one effect, or split into `Effect::LLM`, `Effect::Embed`, `Effect::ToolCall`? Default position: one effect with a discriminator on the operation, handlers can pattern-match.
4. **Local model packaging.** Embed in every binary unconditionally (~2-4GB), or opt-in via `[ai.local].embed = true`? Default position: opt-in; small dev binaries by default, full local-capable binaries for offline use cases.
5. **Cost model honesty.** Should `ai_cost` be wall-clock USD as billed, or token-counted estimate? Default position: estimate based on token counts, reconciled with provider invoice when the run completes.
## Resolved Decisions
- **`ai` is the builtin name.** Two letters, ubiquitous, used like `print`. Resolved 2026-04-26.
- **Three invocation forms.** Function call, thread macro `~>`, pipe `|>`. All compile to the same primitive. Resolved 2026-04-26.
- **`tool fn` for marking agent-callable functions.** Build-time schema/description generation, no manual JSON schema writing. Resolved 2026-04-26.
- **MCP-native.** `mcp_server` block + `mcp_connect` for clients. Connected MCP servers auto-attach to `ai`. Resolved 2026-04-26.
- **Provider-agnostic, Anthropic-first.** Uniform interface across providers; provider-specific options through namespaced extensions. Local llama.cpp fallback as a first-class option. Resolved 2026-04-26.
- **Cost-aware by construction.** Caching, batching, parallelism, ceilings, introspection — runtime concerns, not user concerns. Resolved 2026-04-26.
## The Pitch on One Line
> *Every other language ships AI as a library. Stryke ships AI as a primitive. Two letters, unlimited power, single-binary deployment. The language designed for the work that matters in this era.*