strykelang 0.12.21

A highly parallel Perl 5 interpreter written in Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
# Stryke AI Primitives — Design Doc

> *`ai` is to stryke what `print` is to every other language: a builtin, two letters, ubiquitous, unlimited power.*

Stryke is designed as an AI-native language. AI is not a library, not a framework, not a third-party crate — it is a primitive of the language, the same way `print`, regex, and arrays are primitives. The dream pipeline:

```stryke
~> "summarize codebase into BOOK.pdf then open it" ai
```

That single expression: an agent loop that reads the codebase, generates a structured summary, renders it to PDF, and hands the file to the OS to open. No imports. No SDK setup. No `langchain` boilerplate. The language *is* the framework.

## Design Principles

1. **`ai` is a builtin, not a library.** No imports. Always available. Two letters because it gets typed thousands of times per program.
2. **Short and sweet, unlimited power.** The simple form is `ai $prompt`. The complex form composes from the same primitive — no separate "advanced API."
3. **Tools are functions.** Any stryke function with `tool` in its declaration becomes available to the agent loop with zero extra ceremony. Signature → JSON schema, docstring → description, function body → tool implementation.
4. **MCP-native.** Connecting to an MCP server is one line. Exposing your own MCP server is a single block.
5. **Provider-agnostic, Anthropic-first.** The same `ai` call hits Claude, GPT, Gemini, or a locally-linked llama.cpp model. Provider chosen by config, swappable at runtime.
6. **Local fallback always works.** Stryke binaries can ship with a small quantized model linked statically. `ai` works offline; quality scales with the configured backend.
7. **Cost-aware by default.** Caching, batching, parallelism, hard cost ceilings — built into the runtime, not bolted on.
8. **Deterministic in tests.** `ai_mock` blocks freeze responses for unit tests; flakiness is a config bug, not a fact of life.
9. **Composes with the rest of stryke.** Web framework handlers, package manager, cluster dispatch, effects (future), capabilities (future) — all touch the same `ai` primitive.

## The `ai` Builtin

Three forms, same primitive underneath:

```stryke
ai "summarize this", $document                # function call
~> $document ai "summarize this"              # thread macro (stryke design lineage)
$document |> ai "summarize this"              # pipe
```

Default semantics:

- Argument is a prompt string (and optional context value).
- All in-scope `tool fn` declarations are auto-registered as tools.
- All connected MCP servers are auto-attached.
- Agent loop runs to completion: tool call → tool result → next call → ... → final answer.
- Returns: `Str` in scalar context, `Stream<Str>` in iter context, typed value when assigned to a typed binding.

```stryke
my $r : Str = ai "what is 2+2"                # "4"
for my $chunk in ai "write a story" { ... }   # streaming
my $book : Book = ai "extract info", $pdf     # auto-schema from type, validated parse
```

Configuration (defaults documented; everything overridable per call):

```stryke
ai "summarize", $doc,
    model: "claude-opus-4-7",
    system: "You are concise.",
    max_turns: 10,
    cache: true,
    timeout: 30
```

## AI Collection Builtins

Treating AI as a control-flow primitive, not just an API call:

```stryke
@docs.ai_filter "is about cooking"
@articles.ai_map "summarize in one sentence"
@candidates.ai_sort by: "relevance to backend engineering"
@items.ai_classify into: ["urgent", "normal", "ignore"]
@names.ai_dedupe "treat misspellings as the same person"

if $email ai_match "spam" { discard }
elsif $email ai_match "urgent customer support" { route_to_team }
```

Each of these compiles to a single batched LLM call across the collection where possible (one prompt, list of items, list of judgments back). Cost-conscious by construction — `ai_filter @[1000]` is one call, not 1000 calls.

Use sparingly. Each call costs money. The compiler tracks predicted cost statically when constants are involved and warns on hot loops. `ai_*` inside a `for` loop in production code is a lint.

## Lower-Level AI Builtins

When the agentic `ai` is too much, drop down:

```stryke
my $r = prompt "explain quantum", model: "claude-opus-4-7"            # single shot, no tools
my $r = stream_prompt "write a story"                                  # streaming generator
my @vec = embed "hello world"                                          # single embedding
my @vecs = embed @docs                                                 # batched
my $resp = chat $messages, model: "claude-opus-4-7"                    # explicit message list

# Audio (OpenAI Whisper / TTS)
my $text  = ai_transcribe "podcast.mp3", model => "whisper-1", language => "en"
my $bytes = ai_speak "Hello, world.", voice => "alloy", output => "out.mp3"

# Image generation (OpenAI DALL-E 3 / gpt-image-1)
my $png = ai_image "cyberpunk neon-lit alley", model => "dall-e-3",
                   size => "1024x1024", quality => "hd", output => "alley.png"

# Provider catalog
my $catalog = ai_models "openai"      # arrayref of model IDs
```

These are the building blocks `ai` itself is built on. Available when you want explicit control over the LLM interaction.

## Tool Functions

Mark a function as agent-callable with `tool`:

```stryke
tool fn weather($city: Str) -> Str "Get current weather for a city" {
    fetch "https://api.weather.com/" . uri_encode($city)
}

tool fn search_kb($query: Str, $limit: Int = 10) -> List<Doc> "Search the knowledge base" {
    sql%{ SELECT * FROM kb WHERE content @@ to_tsquery($query) LIMIT $limit }
}
```

At build time:

1. Signature → JSON schema (parameters with types and constraints).
2. Docstring → tool description.
3. Function → invocable tool entry.
4. All `tool fn` definitions in the current scope are auto-registered.

Compare to current TypeScript/Python practice (define function → re-define schema by hand → re-define description in the schema → wire into `tools=[...]`): stryke collapses that ritual to one declaration.

## MCP Servers (Declarative)

Expose stryke functions as an MCP server with one block:

```stryke
mcp_server "filesystem" {
    transport :stdio                       # or :ws port: 3000, :http port: 3000

    tool read_file($path: Str) -> Str        "Read file contents" {
        slurp $path
    }

    tool list_dir($path: Str) -> List<Str>   "List directory entries" {
        readdir $path
    }

    resource "file://*" -> Str {
        slurp $self.uri.path
    }

    prompt "summarize"                       "Summarize text" {
        args $text: Str
        "Summarize this concisely:\n#{$text}"
    }
}
```

Compiles to a spec-compliant MCP server. The `s build --mcp-server` flag emits a standalone binary exposing the server, separate from your main app binary.

## MCP Clients

Connect, discover, call:

```stryke
my $fs = mcp_connect "stdio:/usr/local/bin/fs-mcp"
my $gh = mcp_connect "https://api.github.com/mcp"
my $pg = mcp_connect "ws://localhost:9000"

my @tools     = $fs.tools
my @resources = $fs.resources
my @prompts   = $fs.prompts

my $contents = $fs.tool(:read_file, path: "/tmp/x")
my $config   = $fs.resource("file:///etc/app.toml")
my $msg      = $fs.prompt(:summarize, text: $long_text)
```

Connected MCP servers are visible to subsequent `ai` calls automatically — no re-registration step.

## Agents (Composed)

For explicit control over the agent loop:

```stryke
my $agent = ai_agent
    .mcp("filesystem", "stdio:/usr/local/bin/fs-mcp")
    .mcp("github",     "https://api.github.com/mcp")
    .tools(&internal_search, &slack_post)
    .system("You are a senior backend engineer reviewing changes")
    .max_turns(20)
    .max_cost_usd(0.50)

my $review = $agent.run("review PR 4523 against our coding standards")
```

The bare `ai $prompt` is `ai_agent.run($prompt)` with sensible defaults. Same primitive, different ergonomics.

## Provider Architecture — **SHIPPED**

Configuration in `stryke.toml`:

```toml
[ai]
provider     = "anthropic"             # "anthropic" | "openai" | "ollama" | "openai_compat" | "gemini"
model        = "claude-opus-4-7"
api_key_env  = "ANTHROPIC_API_KEY"
cache        = true
max_cost_run = 1.00                    # USD hard ceiling per program run

[ai.openai]
api_key_env  = "OPENAI_API_KEY"
model        = "gpt-4o"

[ai.ollama]
base_url     = "http://localhost:11434"
model        = "llama3.2"

[ai.openai_compat]                     # LM Studio / vLLM / llama-server / any OpenAI-shaped local server
base_url     = "http://localhost:1234/v1/chat/completions"
api_key_env  = "STRYKE_AI_LOCAL_KEY"   # optional; many local servers ignore auth
model        = "local-model"

[ai.gemini]
api_key_env  = "GOOGLE_API_KEY"        # also accepts GEMINI_API_KEY
model        = "gemini-2.5-flash"

[ai.routing]
embed        = "voyage"                # different providers per operation
classify     = "ollama"                # cheap ops go local
```

The runtime picks the provider per call based on config. All providers expose a uniform interface; provider-specific extensions (Anthropic prompt caching, OpenAI streaming function calls, etc.) are accessible through provider-namespaced options when needed.

**Provider matrix (shipped today):**

| Provider           | Identifier(s)                     | Auth env                                | Notes                                              |
|--------------------|-----------------------------------|-----------------------------------------|----------------------------------------------------|
| Anthropic          | `anthropic`, `claude`             | `ANTHROPIC_API_KEY`                     | Cache, extended thinking, vision, PDF, batch       |
| OpenAI             | `openai`, `gpt`                   | `OPENAI_API_KEY`                        | Streaming, function calls, embeddings, Whisper, TTS |
| Ollama             | `ollama`                          | (none — local)                          | Native `/api/generate`; no cost tracking           |
| OpenAI-compatible  | `openai_compat`, `compat`, `local`| `STRYKE_AI_LOCAL_KEY` (optional)        | LM Studio / vLLM / llama-server; configurable base_url |
| Google Gemini      | `gemini`, `google`                | `GOOGLE_API_KEY` or `GEMINI_API_KEY`    | `gemini-2.5-flash` default                         |
| Voyage AI          | (embed default)                   | `VOYAGE_API_KEY`                        | Default embedding provider                         |

Override the base URL at runtime with `STRYKE_AI_BASE_URL=...` for the `openai_compat` family — useful when pointing at any OpenAI-shaped endpoint without editing config.

**Audio surface** — Whisper transcription + OpenAI TTS:

```perl
my $text = ai_transcribe("podcast.mp3", model => "whisper-1", language => "en");
my $bytes = ai_speak("Hello, world.", voice => "alloy", model => "tts-1", format => "mp3", output => "out.mp3");
```

`ai_speak` returns the raw audio bytes; pass `output => "path"` to also write to disk.

## Cost & Latency

| Concern | Mechanism |
|---|---|
| Repeated identical calls | Result cache keyed on `(provider, model, prompt, system, tools, params)` |
| Repeated system prompts | Provider-side prompt caching where supported (Anthropic) |
| Many small calls | Automatic batching for `ai_map`/`ai_filter`/`ai_classify` |
| Streaming UX | `stream_prompt` / `ai` returns `Stream<Str>` in iter context |
| Parallelism | `@docs.pmap |$d| { ai "summarize", $d }` runs N in parallel up to provider rate limit, automatic backpressure |
| Cost ceiling | `max_cost_run` aborts the program before an expensive call |
| Cost introspection | `ai_cost` returns running USD spent in current scope |
| Token estimation | `tokens_of($text)` for pre-flight token counts |

## Determinism in Tests

```stryke
test "summarize trims to 50 words" {
    ai_mock {
        prompt "summarize", _ => "Lorem ipsum dolor sit amet, consectetur..."
    } {
        is words(summarize($doc)).count, 50
    }
}
```

`ai_mock` intercepts every AI primitive in scope. Patterns match prompts (regex, exact, glob, predicate); responses can be strings, structured values, or generator functions. Tests are deterministic, fast, and free.

CI runs `s test` with `STRYKE_AI_MODE=mock-only` set — any unmatched live AI call fails the build. Live AI in tests requires `STRYKE_AI_MODE=live` explicitly.

## Composition with the Rest of Stryke

**With the web framework:**

```stryke
class ChatController < Controller {
    fn stream() {
        sse_stream { |stream|
            for my $chunk in stream_prompt $params.prompt {
                stream.send($chunk)
            }
        }
    }

    fn ask() {
        my $answer = ai $params.q,
            tools: [&search_db, &fetch_docs],
            system: "You are our product expert."
        render :json, answer: $answer
    }
}
```

**With the package manager:**

A package can mark itself as MCP-exposable:

```toml
# stryke.toml
[mcp]
expose_module = "lib::api"             # all `tool fn`s in this module become MCP tools
```

```bash
s build --mcp-server                    # → target/release/myapp-mcp (standalone server)
s build --release                       # → target/release/myapp (regular app)
```

Every stryke library can publish itself as an MCP server with one flag.

**With cluster dispatch:**

```stryke
my @summaries = cluster_dispatch @docs |$d| { ai "summarize", $d }
```

AI calls fanout across cluster nodes; each node uses its own provider config; results aggregate back. Combined with the cost ceiling, this is rate-limit-aware distributed AI work.

**With effects (when shipped):**

`ai` becomes `Effect::AI`. Effect handlers control model/cache/retry/cost in one place:

```stryke
handle Effect::AI |op, k| {
    log "ai call:", op.prompt
    when op.cost_estimate > 0.10 { return k(cached_or_skip(op)) }
    return k(default_handler(op))
} {
    ai "summarize", $doc
    ai "translate to French", $r
}
```

**With capabilities (when shipped):**

`ai` requires `AICap`. A library can't make AI calls unless given the capability:

```stryke
fn process_doc($doc, $ai_cap: AICap) {
    ai_cap.run "summarize", $doc
}
```

Stops compromised packages from quietly running up an LLM bill.

## Implementation Phases

### Phase 0 — Walking Skeleton — **SHIPPED**

Lives in `strykelang/ai.rs`. Wired through `builtins.rs`. Builtins:

| Name | Status |
|---|---|
| `ai($prompt, opts...)` / `prompt($prompt, opts...)` | Single-shot, no tools yet (agent loop is Phase 1) |
| `stream_prompt($prompt, opts)` | Returns full text in v0; real `Stream<Str>` is Phase 5 |
| `chat($messages, opts...)` | Message-list with role=system/user/assistant |
| `embed($text)` / `embed(@texts)` | Voyage AI default, OpenAI alt |
| `tokens_of($text)` | char/4 heuristic (good-enough pre-flight) |
| `ai_cost()` | `+{usd, input_tokens, output_tokens, embed_tokens, cache_hits, cache_misses}` |
| `ai_cache_clear()` / `ai_cache_size()` | In-process result cache, sha256-keyed on `(provider, model, system, prompt)` |
| `ai_mock_install($pattern, $response)` / `ai_mock_clear()` | Regex-keyed mock interceptor; first-match-wins |
| `ai_config_get($key)` / `ai_config_set($key, $val)` | Read/write of the loaded `[ai]` table |
| `STRYKE_AI_MODE=mock-only` | Errors any unmocked call — for `s test` |

Providers actually wired: **Anthropic**, **OpenAI** (Messages + Chat
Completions), **Voyage** + **OpenAI** for embeddings. TOML config from
`./stryke.toml`; falls back to env vars. Pricing table embedded for
the 4 major model families so `ai_cost()` is meaningful without a
provider invoice round-trip.

What's intentionally NOT in Phase 0:
- Tool / agent loop (`tool fn` keyword needs parser work — Phase 1)
- MCP client / server (Phase 2)
- Collection builtins (`ai_filter`, `ai_map`, …) (Phase 3)
- llama.cpp local backend (Phase 4)
- Real `Stream<Str>` streaming (Phase 5)
- Auto-tool-attachment of MCP servers (Phase 2 prerequisite)

### Phase 1 — Agent loop — **SHIPPED (without `tool fn` keyword)**

The agent loop runtime is in place. Without parser work for the `tool
fn` declaration, tools are passed explicitly as a hashref list:

```stryke
my $report = ai "research X across our docs and Hacker News",
    tools => [
        +{ name        => "kb_search",
           description => "Search internal knowledge base",
           parameters  => +{ q => "string", limit => "int" },
           run         => sub { search_kb($_[0]->{q}, $_[0]->{limit}) } },
        +{ name        => "fetch_url",
           description => "Fetch a URL and return text",
           parameters  => +{ url => "string" },
           run         => sub { fetch($_[0]->{url}) } },
    ],
    max_turns => 10,
    system    => "You are a senior engineer."
```

Bare `ai $prompt` keeps the Phase 0 single-shot semantics; `ai $prompt,
tools => [...]` auto-routes to the agent loop. Both Anthropic
(`tool_use`/`tool_result`) and OpenAI (`tool_calls`/`tool` role)
protocol shapes are wired. Mock mode short-circuits the loop and
returns a mocked final string for tests.

`tool fn name(...) -> Type "doc" { ... }` (auto-schema from signature
+ docstring, auto-registration of all in-scope tools) is the parser
extension — still Phase 1 to-do.

### `tool fn` keyword — **SHIPPED**

```stryke
tool fn weather($city: string) "Get current weather for a city" {
    "sunny in $city"
}

tool fn add_nums($a: int, $b: int) "Add two integers" {
    $a + $b
}

# bare ai($prompt) auto-attaches every tool fn defined in scope:
my $r = ai("what's the weather in Tokyo?");
```

How it ships: a source-level pre-pass in `strykelang/ai_sugar.rs`
rewrites `tool fn NAME(args) "doc" { body }` into:

```stryke
fn NAME { my $__args__ = $_[0]; my $city = $__args__->{city}; ...body... }
ai_register_tool("NAME", "doc", +{ city => "string" }, \&NAME);
```

Param types from the `: Type` annotation become the JSON Schema sent
to the model. Optional `-> ReturnType` is parsed and ignored (used
purely for documentation). Optional docstring is the model-visible
description. Inside the body, params are bound as named locals so
`tool fn weather($city: string) { ... $city ... }` works the same way
the user wrote it.

### `mcp_server "name" { ... }` declarative DSL — **SHIPPED**

```stryke
mcp_server "filesystem" {
    tool read_file($path: string) "Read file contents" {
        slurp $path
    }
    tool list_dir($path: string) "List directory entries" {
        join("\n", grep { !/^\./ } readdir $path)
    }
}
```

Same source-level pre-pass: `mcp_server "name" { tool A() "..." {...}
tool B() "..." {...} }` rewrites to a private set of `fn _mcp_name_A_0
{...}` / `fn _mcp_name_B_1 {...}` declarations followed by
`mcp_server_start("name", +{ tools => [+{name, description, parameters,
run => \&_mcp_..._0}, ...] });`. Round-trip verified end-to-end
against a stryke client.

### Tool registry (Phase 1 sugar) — **SHIPPED**

For people who don't want to wait on the `tool fn` parser keyword,
register tools at runtime:

```stryke
ai_register_tool(
    "weather", "Get weather for a city",
    +{ city => "string" },
    sub { fetch("https://api.weather.com/" . uri_encode($_[0]->{city})) }
);

# Bare `ai($prompt)` now auto-routes to the agent loop and sees this tool:
my $r = ai("what's the weather in SF?");
```

| Builtin | Behavior |
|---|---|
| `ai_register_tool($name, $desc, +{params}, sub { ... })` | Add an always-on tool; idempotent re-register |
| `ai_unregister_tool($name)` | Remove |
| `ai_clear_tools()` | Wipe registry |
| `ai_tools_list()` | Inspect what's registered |

Bare `ai($prompt)` auto-routes to the agent loop when registered tools
are non-empty OR `tools => [...]` is passed OR `auto_mcp => 1` (default)
and an MCP server is attached.

### Memory / RAG — **SHIPPED**

Sqlite-backed embedding store. Save text + embedding, recall by cosine
similarity. In-memory by default; persistent with `path => "memory.db"`.

| Builtin | Behavior |
|---|---|
| `ai_memory_save("id", "content", $metadata?, $path?)` | Embed + insert (idempotent on id) |
| `ai_memory_recall("query", top_k => N)` | Re-embed query, return top-k by cosine |
| `ai_memory_forget("id")` / `ai_memory_count()` / `ai_memory_clear()` | Maintenance |

Mock-mode hash-embeds deterministically so tests round-trip without
the network. Verified: 4 docs saved, query "fast systems-level" picks
"Stryke is the fastest Perl-5 interpreter" at score 0.7679 over the
other three.

### Real `Stream<Str>` iter context — **SHIPPED**

`stream_prompt($p)` now returns a `PerlIterator`-backed handle that
yields one text-delta chunk per `next()`:

```stryke
my $stream = stream_prompt("write a haiku");
for my $chunk ($stream) {
    print $chunk;
    STDOUT->flush;
}
```

Each call to `next_item` reads the next SSE delta from the live
Anthropic connection. Mock mode falls back to a char-chunked iterator
so tests can drive the same `for my $chunk in (...)` loop without the
network. The on_chunk callback form (`stream_prompt($p, on_chunk =>
sub { … })`) still works for push-style consumers.

### Streaming with `on_chunk`**SHIPPED**

Real Anthropic SSE parsing. Pass `on_chunk => sub { … }` and the
callback fires once per delta chunk; the full text is also returned at
the end:

```stryke
my $state = +{ buf => "" };
my $full = stream_prompt("write a haiku",
    on_chunk => sub { $state->{buf} .= $_[0]; print $_[0] }
);
```

Stryke gotcha: closures capture *scalars* by value. Mutate state
through a hashref/arrayref (heap-shared) so the outer scope sees it.
String concat on `my $buf` inside the closure won't propagate; pushing
into `@{$state->{chunks}}` will.

### Structured output — **SHIPPED**

```stryke
my $r = ai("Extract user info from: Alice, 30, active",
    schema => +{ name => "string", age => "int", active => "bool" });
# $r is a hashref: { name => "Alice", age => 30, active => 1 }
```

`ai($p, schema => +{...})` auto-routes to `ai_extract`. The schema hashref
maps field names to coercion types (`string`/`int`/`number`/`bool`/`array`/
`object`). Builds a JSON-only prompt, walks the response for the first
balanced `{...}`, parses, validates + coerces to the schema. Returns a
real stryke hashref ready for field access.

### Anthropic prompt caching — **SHIPPED**

```stryke
my $r = ai("question", system => $long_system_prompt, cache_control => 1);
# Subsequent calls with the same system block read from cache at ~10%
# of normal input cost.
```

The runtime sets `cache_control: { type: "ephemeral" }` on the system
block when `cache_control => 1` is set. `ai_cost()` now also returns
`cache_creation_tokens` / `cache_read_tokens` so spend is accurate
(creation +25%, reads -90% vs normal input).

### Extended thinking — **SHIPPED**

```stryke
my $answer = ai("hard math problem",
    thinking => 1, thinking_budget => 8000);
my $reasoning = ai_last_thinking();   # full thinking trace
```

When `thinking => 1` is set the request includes the Anthropic
extended-thinking block; the model's reasoning is captured separately
from the answer and surfaced via `ai_last_thinking()`.

### PDF / document input — **SHIPPED**

```stryke
my $summary = ai("summarize this contract", pdf => "/path/to/contract.pdf");
my $extract = ai("extract terms",        pdf => "https://example.com/doc.pdf");
my $direct  = ai("read",                  pdf => $raw_bytes);
```

`ai($p, pdf => $path|$url|$bytes)` auto-routes to `ai_pdf`, which builds
an Anthropic `document` content block (base64-inlined PDF, up to
32MB / 100 pages).

### Scoped budget — **SHIPPED**

```stryke
ai_budget(0.50, sub {
    my @summaries = ai_map(\@long_docs, "summarize");
    my @ranked = ai_sort(\@summaries, "by relevance to backend engineering");
    return \@ranked;
});
# Errors if total spend during the block exceeds $0.50.
```

Per-block USD cap. Enforces by snapshotting current cost on entry,
raising the global ceiling to `snapshot + cap` for the duration, and
checking spend on exit. Restores the prior global cap unconditionally.

### Convenience wrappers — **SHIPPED**

| Builtin | Behavior |
|---|---|
| `ai_summarize($text, words => 50)` | Concise summary at target length |
| `ai_translate($text, to => "Spanish")` | Translation |
| `ai_extract($prompt, schema => +{...})` | Structured JSON output (also auto-routed via `ai($p, schema => ...)`) |

### Built-in tools — **SHIPPED**

Drop-in tool specs ready for the agent loop, no `run` coderef needed
because they route through a native registry:

```stryke
my $r = ai("research the latest stryke release notes",
    tools => [
        web_search_tool(),    # uses BRAVE_SEARCH_API_KEY if set, else DDG
        fetch_url_tool(),     # HTTP GET, returns body text
        read_file_tool(),     # local FS read
        run_code_tool(),      # python3 subprocess, 10s timeout
    ]);
```

The `run_code_tool` shells to `python3` so a Python interpreter must
be on the path; works fine on every modern Linux/macOS dev box.

| Tool | Implementation |
|---|---|
| `web_search_tool` | Brave Search API (auth via `$BRAVE_SEARCH_API_KEY`) → DuckDuckGo HTML scrape fallback |
| `fetch_url_tool` | `ureq` GET with 30s timeout |
| `read_file_tool` | `std::fs::read_to_string` |
| `run_code_tool` | `python3` subprocess, 10s timeout, returns stdout+stderr |

### Conversational sessions — **SHIPPED**

```stryke
my $s = ai_session_new(system => "Be terse", model => "claude-haiku-4-5");
ai_session_send($s, "what's 2+2?");
ai_session_send($s, "and times 3?");
my $hist = ai_session_history($s);   # arrayref of {role, content}
ai_session_reset($s);                 # clear history but keep config
ai_session_close($s);                 # drop session
```

Multi-turn chat that auto-tracks role=user / role=assistant turns.
Provider/model picked at session creation, can be overridden per
`send` call.

### Prompt templates — **SHIPPED**

{% raw %}
```stryke
my $p = ai_template("hi {name}, age {age}", name => "Alice", age => 30);
# → "hi Alice, age 30"

ai_template("escaped {{lit}}, real {key}", key => "yes");
# → "{lit}}, real yes"  ({{ → literal {, missing keys pass through)
```
{% endraw %}

Pure string substitution. No code execution. Use as the prompt arg to
`ai`/`prompt`/`chat`.

### Retry / backoff — **SHIPPED**

Anthropic calls (single-shot AND streaming AND vision AND PDF) auto-
retry on `429` / `500` / `502` / `503` / `504` with exponential
backoff (1s → 2s → 4s → 8s → 16s, capped at 30s). 4 attempts total
before giving up. Transport errors (network blips) also retry.

### Routing actually honored — **SHIPPED**

`ai_routing_set("embed", "openai")` now actually switches embedding
calls to OpenAI's `text-embedding-*` endpoint instead of the default
Voyage. The route table is consulted before falling back to the
`[ai.embed]` TOML config or the `embed_provider` default.

### CLI — **SHIPPED**

```bash
stryke ai "summarize the linux kernel in 50 words"
echo "rough idea: ..." | stryke ai --model claude-haiku-4-5 --system "Be concise"
stryke ai "long thinking task" --stream
stryke ai "structured" --json    # emit {response, usd, input_tokens, output_tokens}
```

`stryke ai PROMPT` reads from argv or stdin, calls the configured
model, prints to stdout. Honors `--model`, `--system`, `--stream`,
`--json`. Useful as a UNIX filter or one-shot from terminal.

### Vision (multimodal images) — **SHIPPED**

```stryke
my $caption = ai("describe this image", image => "/path/to/photo.jpg");
my $alt = ai("describe", image => "https://example.com/img.png");
my $hex = ai("describe", image => $raw_bytes);
```

Routes to `ai_vision`, which builds an Anthropic content array with a
base64-inlined image block (URLs fetched first, paths read, raw bytes
encoded directly). Mime-type guessed from extension. Cost tracking
runs through the same path as text calls.

### MCP server (programmatic) — **SHIPPED**

The declarative `mcp_server "name" { ... }` parser block is still
deferred (needs the same parser work as `tool fn`), but the runtime is
fully wired. Stand up a server with one builtin call:

```stryke
mcp_server_start("stryke-srv", +{
    tools => [
        +{ name => "echo", description => "Echo input",
           parameters => +{ text => "string" },
           run => sub { $_[0]->{text} } },
        +{ name => "uppercase", description => "Uppercase text",
           parameters => +{ text => "string" },
           run => sub { uc($_[0]->{text}) } },
    ]
});
```

Runs a stdio JSON-RPC loop on stdin/stdout, exposes
`initialize` / `tools/list` / `tools/call`. Verified end-to-end with
a stryke client connecting to a stryke server: tools enumerate, calls
round-trip, results return. The same binary that runs your stryke
script can now BE an MCP server — pair with `s_web build` to ship a
self-contained MCP server binary.

### MCP HTTP transport — **SHIPPED**

```stryke
my $gh = mcp_connect("https://api.github.com/mcp");
my $tavily = mcp_connect("https://mcp.tavily.com/mcp");
```

Speaks the streamable-HTTP MCP transport: POST per request, accept
`application/json` OR `text/event-stream` (SSE) responses, carry
`mcp-session-id` across calls when the server sets one. Reads
bearer auth from `$MCP_BEARER_TOKEN` if set.

### OpenAI streaming — **SHIPPED**

`stream_prompt($p, on_chunk => sub { ... }, provider => "openai")`
now parses OpenAI's SSE delta format too. Same callback contract as
the Anthropic path — fires once per text-delta chunk.

### Anthropic batch API — **SHIPPED**

```stryke
my $results = ai_batch(\@prompts,
    model    => "claude-haiku-4-5",
    system   => "Be terse",
    poll_secs    => 5,
    max_wait_secs => 1800);
# 50% of normal cost; trades a few minutes of wall time.
```

Submits the batch, polls `processing_status` until `ended`, downloads
JSONL results, reorders by `custom_id`. Cost tracking applies the
~50% batch discount automatically. Falls back to sequential calls if
the batch endpoint errors (region/account-gated) or
`STRYKE_AI_BATCH=sync` is set.

### Cluster fanout — **SHIPPED**

```stryke
my $cluster = cluster(["host1:8", "host2:8"]);
my @summaries = @{ ai_pmap(\@docs, "summarize",
    cluster => $cluster, model => "claude-haiku-4-5") };
```

Splits items into N shards (N = cluster slot count), runs `ai_map` on
each shard via the existing `pmap_on` plumbing, concatenates results
in order. Without a `cluster => ...` arg, falls back to a single local
`ai_map` call (one batched LLM request).

### Phase 2 — MCP client — **SHIPPED (server side still pending)**

Lives in `strykelang/mcp.rs`. Speaks JSON-RPC line-delimited over
stdio. Builtins:

| Builtin | Behavior |
|---|---|
| `mcp_connect("stdio:CMD ARGS...", $name?)` | Spawn subprocess, run `initialize` + `notifications/initialized` handshake, return handle |
| `mcp_tools($h)` / `mcp_resources($h)` / `mcp_prompts($h)` | Cached `*/list` results |
| `mcp_call($h, $name, +{...args})` | `tools/call` |
| `mcp_resource($h, $uri)` | `resources/read` |
| `mcp_prompt($h, $name, +{...args})` | `prompts/get` |
| `mcp_close($h)` | Kill subprocess, drop registry slot |
| `mcp_attach_to_ai($h)` / `mcp_detach_from_ai($h)` | Mark a handle as auto-attachable so the agent loop can pull its tools |
| `mcp_attached()` | List of currently-attached handles |

Smoke-tested against a 100-line Python fake-server implementing
`initialize`, `tools/list`, `tools/call`, `resources/list`,
`resources/read`, `prompts/list`, `prompts/get`. Handshake +
caching + every method round-trip works.

Transports NOT yet wired:
- `ws://...` (WebSocket — needs a tungstenite dep)
- `http://...` (streaming HTTP — needs SSE)

The **server** side — declarative `mcp_server "name" { tool foo … }`
DSL — needs the same parser extension as `tool fn`. Not in this pass.

### Phase 3 — Collection builtins — **SHIPPED**

Each one builds a single batched prompt asking the model for a JSON
array of judgments, then parses the response. One LLM call per
collection, not N.

| Builtin | Shape | Returns |
|---|---|---|
| `ai_filter(\@items, "criterion")` | Boolean per item | Filtered arrayref |
| `ai_map(\@items, "instruction")` | String per item | Mapped arrayref |
| `ai_classify(\@items, "label hint", into => [\"a\",\"b\"])` | Label per item | Arrayref of labels |
| `ai_match($item, "criterion")` | Single boolean | 0 or 1 |
| `ai_sort(\@items, "criterion")` | Index array (best-first) | Reordered arrayref |
| `ai_dedupe(\@items, "hint")` | Group of indexes per cluster | Deduped arrayref |

JSON-array extraction is forgiving: walks the response looking for the
first balanced `[ ... ]` so the model can wrap output in prose without
breaking the parse.

### Retrieval / vector ops — **SHIPPED**

| Builtin | Returns |
|---|---|
| `vec_cosine(\@a, \@b)` | Cosine similarity in `[-1, 1]` |
| `vec_search(\@query, \@candidates, top_k => N)` | Arrayref of `+{idx, score}`, ranked best-first |
| `vec_topk(\@scores, $k)` | Indexes of top-k scalars |

Verified on the unit basis: `cos([1,0,0],[1,0,0])=1.0000`,
`cos([1,0,0],[0,1,0])=0.0000`, `cos([1,0,0],[-1,0,0])=-1.0000`. `vec_search`
ranks `[1,0,0]` against four candidates as `1 (id), 2 (45°), 0 (orth)`
with scores `1.000, 0.707, 0.000`.

### Cost / routing / history — **SHIPPED**

| Builtin | Behavior |
|---|---|
| `ai_estimate($prompt, model => "...", out_tokens => N)` | Pre-flight USD estimate from token heuristic + price table |
| `ai_routing_get($op)` / `ai_routing_set($op, $provider)` | Per-operation provider override (advisory; embed honors it) |
| `ai_history()` | Arrayref of last 100 calls — `+{provider, model, prompt, response_chars, input_tokens, output_tokens, usd, cache_hit, unix_time}` |
| `ai_history_clear()` | Reset history |

### Phase 1 — Tools and Agents (months 2-4)

- `tool fn` declaration with schema generation.
- `ai` builtin: agent loop using local `tool fn`s.
- `ai_mock` for tests.
- OpenAI provider added.

### Phase 2 — MCP (months 4-6)

- `mcp_connect` client.
- `mcp_server` declarative DSL.
- `s build --mcp-server` flag.
- Auto-attachment of connected MCP servers to `ai`.

### Phase 3 — Collection AI Builtins (months 6-8)

- `ai_filter`, `ai_map`, `ai_classify`, `ai_sort`, `ai_match`, `ai_dedupe`.
- Automatic batching.
- Predicted-cost static analysis.
- Hot-loop lints.

### Phase 4 — Local Models and Multi-Provider — **SHIPPED (in-process llama.cpp deferred)**

- ✅ Ollama provider (native `/api/generate`).
- ✅ OpenAI-compatible provider (`openai_compat`/`compat`/`local`) — LM Studio, vLLM, llama-server, any OpenAI-shaped server. Configurable `STRYKE_AI_BASE_URL`.
- ✅ Google Gemini provider (`gemini`/`google`) with `GOOGLE_API_KEY`/`GEMINI_API_KEY` auth.
- ✅ Whisper transcription (`ai_transcribe`).
- ✅ OpenAI TTS (`ai_speak`).
- ✅ Image generation (`ai_image`) — DALL-E 3 + gpt-image-1 via `/v1/images/generations`. Returns raw PNG bytes for `n=1`, arrayref for `n>1`. Supports `output => "out.png"` to also write to disk.
- ✅ Image editing (`ai_image_edit`) — `/v1/images/edits` with optional mask (PNG with transparent regions marking edit area).
- ✅ Image variations (`ai_image_variation`) — `/v1/images/variations` (DALL-E 2 only — gpt-image-1 doesn't expose variations).
- ✅ Cost dashboard (`ai_dashboard()`) — ANSI multi-line summary of running cost, token in/out, embed tokens, prompt-cache write/read, result-cache hit ratio.
- ✅ Pricing lookup (`ai_pricing($model)`) — pre-flight per-1k-token costs the runtime uses.
- ✅ Vision wrapper (`ai_describe`) — convenience over `ai_vision` with `style => "concise"|"detailed"|"alt"` presets.
- ✅ Multi-document grounded responses (`ai_grounded $p, documents => [@paths]`) — Anthropic citations auto-enabled across mixed PDF + text + inline-string corpora.
- ✅ Session persistence (`ai_session_export($h)` / `ai_session_import($json)`) — round-trip a multi-turn chat across runs.
- ✅ Local embeddings (Ollama `/api/embed`) — third embed provider alongside Voyage and OpenAI. Default model `nomic-embed-text`. $0 cost.
- ✅ Moderation (`ai_moderate`) — OpenAI `/v1/moderations`, free endpoint, returns `flagged`/`categories`/`scores`.
- ✅ Anthropic Files API (`ai_file_anthropic_upload`/`list`/`delete`) — beta `/v1/files` endpoint (`files-api-2025-04-14`).
- ✅ Text chunking (`ai_chunk`) — sliding-window or sentence-aware splitter for RAG. Pure local logic.
- ✅ Warmup / auth verification (`ai_warm`) — 1-token ping, returns `ok`/`latency_ms`/`error`.
- ✅ Semantic comparison (`ai_compare`) — single LLM call returning structured `winner`/`reason`/`scores` JSON.
- ✅ CLI modal flags (`stryke ai --image|--transcribe|--speak`) — UNIX-filter surface now covers chat, image, audio.
- ✅ Live model catalog (`ai_models($provider)`) — queries `/v1/models` (OpenAI/Anthropic), `/api/tags` (Ollama), `/v1beta/models` (Gemini).
- ✅ Routing config (`[ai.routing]` table).
- ⏳ In-process llama.cpp linked + embedded model — deferred. Shell-out via Ollama or LM Studio is the supported local path today.

### Citations — **SHIPPED**

Anthropic's grounded-response surface. Pass `citations => 1` to `ai_pdf` (and friends, as wiring extends) and the document blocks get `citations: { enabled: true }`. The model's text response carries citations attached to text blocks; we accumulate them into a thread-local buffer and surface them via `ai_citations()`.

```stryke
my $r = ai_pdf "Summarize the contract", pdf => "/legal/agreement.pdf",
        citations => 1, title => "Master Services Agreement";
for my $cite (@{ ai_citations() }) {
    say "  [$cite->{title}] p$cite->{start_page}-$cite->{end_page}: $cite->{text}";
}
```

Each citation hashref carries: `type`, `text` (the cited string), `title`, `document_index`, `start`/`end` char offsets (text docs) or `start_page`/`end_page` page numbers (PDFs).

### Files API — **SHIPPED**

OpenAI's `/v1/files` for uploading reference files (used by Whisper, Vision, Batch, Assistants):

```stryke
my $f = ai_file_upload "podcast.mp3", purpose => "user_data";
say "uploaded $f->{id} ($f->{bytes} bytes)";

my $files = ai_file_list();
for my $f (@$files) { say "$f->{id}  $f->{filename}  $f->{purpose}" }

ai_file_delete($f->{id}) or die "delete failed";
```

Builtins: `ai_file_upload`, `ai_file_list`, `ai_file_get`, `ai_file_delete`. All return native hashrefs/arrayrefs (JSON → PerlValue conversion is recursive).

### Phase 5 — Composition (months 10-12)

- Web framework integration polished (streaming SSE handlers, structured-output endpoints).
- Cluster dispatch over `ai` calls.
- Cost ceilings and budgets.
- Public benchmark suite (latency per provider, tokens/sec, cost-per-1K-calls).

## Non-Goals

- LangChain compatibility. Stryke is the framework; we don't wrap a Python framework.
- Vendor-specific exhaustive APIs. Each provider exposes its full feature surface through namespaced options, but core primitives (`ai`, `prompt`, `embed`, `chat`) stay uniform.
- Hosted vector DB. Local sqlite-vec ships in-binary; users bring their own remote vector DB (pgvector, Pinecone, etc.) through normal SQL/HTTP.
- Auto-prompt-engineering. `ai` runs the prompt as written. No hidden rewrites, no "we'll improve your prompt for you" surprises. Reproducibility over cleverness.
- Visual agent builders / no-code UIs. Stryke is a programming language; the agent IS the code.

## Open Questions

1. **Sigil syntax for AI calls.** Should `ai` always be called as a function, or should there be a sigil form (e.g. `&"summarize this"`) for the most-common case? Trade-off: terseness vs. parser ambiguity. Default position: function form only, `~>` thread-macro is the terse path.
2. **Streaming default.** Should `ai` return `Stream<Str>` by default and require collection (`ai $p .collect`) for `Str`, or return `Str` by default? Default position: context-sensitive (scalar context = `Str`, iter context = `Stream<Str>`).
3. **Effect type granularity.** Is `Effect::AI` one effect, or split into `Effect::LLM`, `Effect::Embed`, `Effect::ToolCall`? Default position: one effect with a discriminator on the operation, handlers can pattern-match.
4. **Local model packaging.** Embed in every binary unconditionally (~2-4GB), or opt-in via `[ai.local].embed = true`? Default position: opt-in; small dev binaries by default, full local-capable binaries for offline use cases.
5. **Cost model honesty.** Should `ai_cost` be wall-clock USD as billed, or token-counted estimate? Default position: estimate based on token counts, reconciled with provider invoice when the run completes.

## Resolved Decisions

- **`ai` is the builtin name.** Two letters, ubiquitous, used like `print`. Resolved 2026-04-26.
- **Three invocation forms.** Function call, thread macro `~>`, pipe `|>`. All compile to the same primitive. Resolved 2026-04-26.
- **`tool fn` for marking agent-callable functions.** Build-time schema/description generation, no manual JSON schema writing. Resolved 2026-04-26.
- **MCP-native.** `mcp_server` block + `mcp_connect` for clients. Connected MCP servers auto-attach to `ai`. Resolved 2026-04-26.
- **Provider-agnostic, Anthropic-first.** Uniform interface across providers; provider-specific options through namespaced extensions. Local llama.cpp fallback as a first-class option. Resolved 2026-04-26.
- **Cost-aware by construction.** Caching, batching, parallelism, ceilings, introspection — runtime concerns, not user concerns. Resolved 2026-04-26.

## The Pitch on One Line

> *Every other language ships AI as a library. Stryke ships AI as a primitive. Two letters, unlimited power, single-binary deployment. The language designed for the work that matters in this era.*