blazen 0.1.153

Blazen - A Rust-native AI workflow engine with event-driven orchestration
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
<p align="center">
  <h1 align="center">Blazen</h1>
  <p align="center">Event-driven AI workflow engine with first-class LLM integration.<br/>Written in Rust. Native bindings for Python, TypeScript, and WebAssembly.</p>
</p>

<p align="center">
  <a href="https://crates.io/crates/blazen"><img alt="crates.io" src="https://img.shields.io/crates/v/blazen.svg?style=flat-square&logo=rust&label=crates.io" /></a>
  <a href="https://pypi.org/project/blazen/"><img alt="PyPI" src="https://img.shields.io/pypi/v/blazen.svg?style=flat-square&logo=python&label=PyPI" /></a>
  <a href="https://www.npmjs.com/package/blazen"><img alt="npm" src="https://img.shields.io/npm/v/blazen.svg?style=flat-square&logo=npm&label=npm" /></a>
  <a href="https://www.npmjs.com/package/@blazen/sdk"><img alt="npm wasm" src="https://img.shields.io/npm/v/@blazen/sdk.svg?style=flat-square&logo=webassembly&label=wasm" /></a>
  <a href="https://github.com/ZachHandley/Blazen/blob/main/LICENSE"><img alt="License: AGPL-3.0" src="https://img.shields.io/badge/license-AGPL--3.0-blue?style=flat-square" /></a>
</p>

---

## Features

- **Event-driven architecture** -- Type-safe events connect workflow steps with zero boilerplate via derive macros (Rust) or subclassing (Python) or plain objects (TypeScript)
- **15+ LLM providers** -- OpenAI, Anthropic, Gemini, Azure, OpenRouter, Groq, Together AI, Mistral, DeepSeek, Fireworks, Perplexity, xAI, Cohere, AWS Bedrock, and fal.ai -- with streaming, tool calling, structured output, and multimodal support
- **Multi-workflow pipelines** -- Orchestrate sequential and parallel stages with pause/resume and per-workflow streaming
- **Branching and fan-out** -- Conditional branching, parallel fan-out, and real-time streaming within workflows
- **Native Python and TypeScript bindings** -- Python via PyO3/maturin, Node.js/TypeScript via napi-rs. Not wrappers around HTTP -- actual compiled Rust running in-process
- **WebAssembly SDK** -- Run Blazen in the browser, edge workers, Deno, and embedded runtimes via `@blazen/sdk`. Same Rust core compiled to WASM
- **Prompt management** -- Versioned prompt templates with `{{variable}}` interpolation, YAML/JSON registries, and multimodal attachments
- **Persistence** -- Embedded persistence via redb, or bring-your-own via callbacks. Pause a workflow, serialize state to JSON, resume later
- **Identity-preserving live state** -- Pass DB connections, Pydantic models, and other live objects through events and the new `ctx.state` / `ctx.session` namespaces. `StopEvent(result=obj)` round-trips non-JSON Python values with `is`-identity preserved -- the engine no longer silently stringifies unpicklable results
- **Typed error hierarchies** -- Both Python and Node ship a full subclass tree (`BlazenError` plus ~87 leaves like `RateLimitError`, `LlamaCppError`, `MistralRsError`, `CandleLlmError`, `WhisperCppError`, `PiperError`, `DiffusionError`) so callers can write idiomatic `except RateLimitError` / `catch (e instanceof RateLimitError)` instead of string-matching messages
- **Bindings parity** -- `tools/audit_bindings.py` walks every public Rust symbol across all `blazen-*` crates and verifies the Python, Node, and WASM-SDK surfaces mirror it 1:1. The current report is `0 / 0 / 0` gaps, and CI fails on regression, so the bindings stay in lockstep with the Rust core
- **Observability** -- OpenTelemetry spans (OTLP gRPC and OTLP HTTP, the latter wasm-eligible), Prometheus metrics, and Langfuse all ship as opt-in features in `blazen-telemetry` -- enable an exporter, point it at your collector, and every step, LLM call, and pipeline stage is instrumented automatically

## Installation

**Rust:**

```bash
cargo add blazen
```

**Python** (requires Python 3.9+):

```bash
uv add blazen       # recommended
pip install blazen   # also works
```

**Node.js / TypeScript:**

```bash
pnpm add blazen
```

**WebAssembly** (browser, edge, Deno, Cloudflare Workers):

```bash
npm install @blazen/sdk
```

## Quick Start

### Rust

```rust
use blazen::prelude::*;

#[derive(Debug, Clone, Serialize, Deserialize, Event)]
struct GreetEvent {
    name: String,
}

#[step]
async fn parse_input(event: StartEvent, _ctx: Context) -> Result<GreetEvent, WorkflowError> {
    let name = event.data["name"].as_str().unwrap_or("World").to_string();
    Ok(GreetEvent { name })
}

#[step]
async fn greet(event: GreetEvent, _ctx: Context) -> Result<StopEvent, WorkflowError> {
    Ok(StopEvent {
        result: serde_json::json!({ "greeting": format!("Hello, {}!", event.name) }),
    })
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let workflow = WorkflowBuilder::new("greeter")
        .step(parse_input_registration())
        .step(greet_registration())
        .build()?;

    let handler = workflow.run(serde_json::json!({ "name": "Zach" })).await?;
    let result = handler.result().await?;

    if let Some(stop) = result.event.downcast_ref::<StopEvent>() {
        println!("{}", stop.result); // {"greeting": "Hello, Zach!"}
    }
    Ok(())
}
```

### Python

```python
from blazen import Workflow, step, Event, StartEvent, StopEvent, Context

class GreetEvent(Event):
    name: str

@step
async def parse_input(ctx: Context, ev: StartEvent) -> GreetEvent:
    return GreetEvent(name=ev.name or "World")

@step
async def greet(ctx: Context, ev: GreetEvent) -> StopEvent:
    return StopEvent(result={"greeting": f"Hello, {ev.name}!"})

async def main():
    wf = Workflow("greeter", [parse_input, greet])
    handler = await wf.run(name="Zach")
    result = await handler.result()
    print(result.to_dict())  # {"result": {"greeting": "Hello, Zach!"}}

import asyncio
asyncio.run(main())
```

### TypeScript

```typescript
import { Workflow } from "blazen";

const workflow = new Workflow("greeter");

workflow.addStep("parse_input", ["blazen::StartEvent"], async (event, ctx) => {
  const name = event.name ?? "World";
  return { type: "GreetEvent", name };
});

workflow.addStep("greet", ["GreetEvent"], async (event, ctx) => {
  return {
    type: "blazen::StopEvent",
    result: { greeting: `Hello, ${event.name}!` },
  };
});

const result = await workflow.run({ name: "Zach" });
console.log(result.data); // { greeting: "Hello, Zach!" }
```

### Cloudflare Workers

Blazen runs the full workflow engine inside Cloudflare Workers via `@blazen/sdk`. Multi-step LLM workflows, agents, and pipelines all execute on workerd -- Cloudflare's production runtime -- with no special configuration beyond `wasm-pack build --target web --release` and passing the compiled `WebAssembly.Module` to `initSync` at module load.

```typescript
import { initSync, Workflow } from "@blazen/sdk";
// Wrangler resolves `*.wasm` imports as `WebAssembly.Module` instances.
import wasmModule from "@blazen/sdk/blazen_wasm_sdk_bg.wasm";

initSync({ module: wasmModule as WebAssembly.Module });

export default {
  async fetch(): Promise<Response> {
    const wf = new Workflow("greeter");

    wf.addStep("parse", ["blazen::StartEvent"], (event: any) => ({
      type: "GreetEvent",
      name: event?.data?.name ?? "World",
    }));

    wf.addStep("greet", ["GreetEvent"], (event: any) => ({
      type: "StopEvent",
      result: { greeting: `Hello, ${event.name}!` },
    }));

    const result = await wf.run({});
    return Response.json(result);
  },
};
```

A complete runnable setup -- `wrangler.toml`, `vitest` integration test exercising the worker against a real `workerd` instance, and the `wasm-pack` build wiring -- lives in [`examples/cloudflare-worker/`](examples/cloudflare-worker/). CI builds and tests it on every push, so the Workers target is a supported deployment surface, not aspirational.

Note: Cloudflare Workers cap CPU time per request (10ms on the free plan, up to 30s on paid plans). Long-running multi-call LLM flows should either fit within those limits, be split across requests using Blazen's pause/resume snapshots, or run on the WASIp2 component (`blazen-wasm`) for ZLayer edge deployment without the per-request cap.

### WASM SDK feature parity

`@blazen/sdk` is no longer the "lite" sibling. It now matches the Node binding for every workflow, pipeline, and handler primitive that makes sense in a browser or Worker:

- **Pipelines** -- `input_mapper`, `condition`, `onPersist`, and `onPersistJson` callbacks for sequential and parallel stages
- **Workflows** -- `setSessionPausePolicy`, `runStreaming(input, onEvent)`, `runWithHandler(input)`, and `resumeWithSerializableRefs(snapshot, refs)`
- **Handlers** -- `respondToInput`, `snapshot`, `resumeInPlace`, `streamEvents(callback)`, and `abort` on the returned handle
- **Context** -- session-ref serialization round-trips opaque host values across pause/resume the same way the Node and Python bindings do
- **In-browser embeddings** -- `TractEmbedModel.create(modelUrl, tokenizerUrl)` loads an ONNX embedding model and a HuggingFace tokenizer from URLs and runs inference on the CPU via `tract`, so RAG and semantic-memory flows work in the browser with no server round-trip

If a workflow runs against the Node binding, the same code path runs under `@blazen/sdk` -- the only differences are the runtime-specific wiring (Node `fs` vs. browser `fetch`).

## LLM Integration

Every provider implements the same `CompletionModel` trait/interface. Switch providers by changing one line.

### Rust

```rust
use blazen_llm::{CompletionModel, CompletionRequest, ChatMessage};
use blazen_llm::providers::openai::OpenAiProvider;

let model = OpenAiProvider::new("sk-...");
let request = CompletionRequest::new(vec![
    ChatMessage::user("What is the meaning of life?"),
]);
let response = model.complete(request).await?;
println!("{}", response.content.unwrap_or_default());
```

Use any OpenAI-compatible provider with `OpenAiCompatProvider`:

```rust
use blazen_llm::providers::openai_compat::OpenAiCompatProvider;

let groq = OpenAiCompatProvider::groq("gsk-...");
let openrouter = OpenAiCompatProvider::openrouter("sk-or-...");
let together = OpenAiCompatProvider::together("...");
let deepseek = OpenAiCompatProvider::deepseek("...");
```

### Python

```python
from blazen import CompletionModel, ChatMessage, Role, CompletionResponse, ProviderOptions

model = CompletionModel.openai(options=ProviderOptions(api_key="sk-..."))
# or: CompletionModel.anthropic(options=ProviderOptions(api_key="sk-ant-..."))
# or: CompletionModel.groq(options=ProviderOptions(api_key="gsk-..."))
# or: CompletionModel.openrouter(options=ProviderOptions(api_key="sk-or-..."))
# or with env vars: CompletionModel.openai()

response: CompletionResponse = await model.complete([
    ChatMessage.system("You are helpful."),
    ChatMessage.user("What is the meaning of life?"),
])
print(response.content)        # typed attribute access
print(response.model)          # model name used
print(response.usage)          # TokenUsage with .prompt_tokens, .completion_tokens, .total_tokens
print(response.finish_reason)
```

### TypeScript

```typescript
import { CompletionModel, ChatMessage, Role } from "blazen";
import type { CompletionResponse } from "blazen";

const model = CompletionModel.openai({ apiKey: "sk-..." });
// or: CompletionModel.anthropic({ apiKey: "sk-ant-..." })
// or: CompletionModel.groq({ apiKey: "gsk-..." })
// or: CompletionModel.openrouter({ apiKey: "sk-or-..." })
// or with env vars: CompletionModel.openai()

const response: CompletionResponse = await model.complete([
  ChatMessage.system("You are helpful."),
  ChatMessage.user("What is the meaning of life?"),
]);
console.log(response.content);      // string
console.log(response.model);        // model name used
console.log(response.usage);        // { promptTokens, completionTokens, totalTokens }
console.log(response.finishReason);
```

## Streaming

Steps can publish intermediate events to an external stream via `write_event_to_stream` on the context. Consumers subscribe before awaiting the final result.

### Rust

```rust
#[step]
async fn process(event: StartEvent, ctx: Context) -> Result<StopEvent, WorkflowError> {
    for i in 0..3 {
        ctx.write_event_to_stream(ProgressEvent { step: i }).await;
    }
    Ok(StopEvent { result: serde_json::json!({"done": true}) })
}

// Consumer side:
let handler = workflow.run(input).await?;
let mut stream = handler.stream_events();
while let Some(event) = stream.next().await {
    println!("got: {:?}", event.event_type_id());
}
let result = handler.result().await?;
```

### Python

```python
@step
async def process(ctx: Context, ev: StartEvent) -> StopEvent:
    for i in range(3):
        ctx.write_event_to_stream(Event("ProgressEvent", step=i))
    return StopEvent(result={"done": True})

# Consumer side:
handler = await wf.run(message="go")
async for event in handler.stream_events():
    print(event.event_type, event.to_dict())
result = await handler.result()
```

### TypeScript

```typescript
// Using runStreaming with a callback:
const result = await workflow.runStreaming({ message: "go" }, (event) => {
  console.log("stream:", event.type, event);
});

// Or using the handler API:
const handler = await workflow.runWithHandler({ message: "go" });
await handler.streamEvents((event) => {
  console.log("stream:", event.type, event);
});
const result = await handler.result();
```

## Crate / Package Structure

| Crate | Description |
|-------|-------------|
| `blazen` | Umbrella crate re-exporting everything |
| `blazen-events` | Core event traits, `StartEvent`, `StopEvent`, `DynamicEvent`, and derive macro support |
| `blazen-macros` | `#[derive(Event)]` and `#[step]` proc macros |
| `blazen-core` | Workflow engine, context, step registry, pause/resume, and snapshots |
| `blazen-llm` | LLM provider abstraction -- `CompletionModel`, `StructuredOutput`, `EmbeddingModel`, `Tool` |
| `blazen-pipeline` | Multi-workflow pipeline orchestrator with sequential/parallel stages |
| `blazen-prompts` | Prompt template management with versioning and YAML/JSON registries |
| `blazen-memory` | Memory and vector store with LSH-based approximate nearest-neighbor retrieval |
| `blazen-memory-valkey` | Valkey/Redis backend for `blazen-memory` |
| `blazen-persist` | Optional persistence layer (redb) |
| `blazen-telemetry` | Observability: OpenTelemetry spans, Prometheus metrics, Langfuse, and LLM call history |
| `blazen-py` | Python bindings via PyO3/maturin (published to PyPI as `blazen`) |
| `blazen-node` | Node.js/TypeScript bindings via napi-rs (published to npm as `blazen`) |
| [`blazen-wasm-sdk`](crates/blazen-wasm-sdk/) | TypeScript/JS client SDK via WebAssembly (published to npm as `@blazen/sdk`) |
| [`blazen-wasm`](crates/blazen-wasm/) | WASIp2 WASM component for ZLayer edge deployment |
| `blazen-cli` | CLI tool for scaffolding projects (`blazen init`) |

## Supported LLM Providers

| Provider | Constructor | Default Model |
|----------|-------------|---------------|
| OpenAI | `OpenAiProvider::new` / `.openai()` | `gpt-4.1` |
| Anthropic | `AnthropicProvider::new` / `.anthropic()` | `claude-sonnet-4-5-20250929` |
| Google Gemini | `GeminiProvider::new` / `.gemini()` | `gemini-2.5-flash` |
| Azure OpenAI | `AzureOpenAiProvider::new` / `.azure()` | (deployment-specific) |
| OpenRouter | `.openrouter()` | `openai/gpt-4.1` |
| Groq | `.groq()` | `llama-3.3-70b-versatile` |
| Together AI | `.together()` | `meta-llama/Llama-3.3-70B-Instruct-Turbo` |
| Mistral | `.mistral()` | `mistral-large-latest` |
| DeepSeek | `.deepseek()` | `deepseek-chat` |
| Fireworks | `.fireworks()` | `accounts/fireworks/models/llama-v3p3-70b-instruct` |
| Perplexity | `.perplexity()` | `sonar-pro` |
| xAI (Grok) | `.xai()` | `grok-3` |
| Cohere | `.cohere()` | `command-a-08-2025` |
| AWS Bedrock | `.bedrock()` | `anthropic.claude-sonnet-4-5-20250929-v1:0` |
| fal.ai | `FalProvider::new` / `.fal()` | (image generation) |

All OpenAI-compatible providers are accessible through `OpenAiCompatProvider` in Rust, or through static factory methods on `CompletionModel` in Python and TypeScript.

## Typed Errors

Every error the engine, the LLM layer, or a backend can raise has a dedicated subclass in both Python and Node, so callers branch on type instead of parsing strings. The hierarchy is rooted at `BlazenError` (extending the host language's base `Error` / `Exception`) and fans out to ~87 leaves covering provider failures (`RateLimitError`, `AuthError`, `ContextLengthError`), local-inference backends (`LlamaCppError`, `MistralRsError`, `CandleLlmError`, `WhisperCppError`, `PiperError`, `DiffusionError`), persistence (`PersistError`, `SnapshotError`), and workflow control flow (`StepNotFoundError`, `EventTypeMismatchError`, `WorkflowAbortedError`).

```python
from blazen import CompletionModel, RateLimitError, AuthError, BlazenError

try:
    response = await model.complete(messages)
except RateLimitError as e:
    await asyncio.sleep(e.retry_after or 5)
except AuthError:
    rotate_api_key()
except BlazenError as e:
    log.exception("blazen failure", err=e)
```

```typescript
import { RateLimitError, AuthError, BlazenError } from "blazen";

try {
  const response = await model.complete(messages);
} catch (e) {
  if (e instanceof RateLimitError) await sleep(e.retryAfter ?? 5_000);
  else if (e instanceof AuthError) rotateApiKey();
  else if (e instanceof BlazenError) log.error("blazen failure", e);
  else throw e;
}
```

## Telemetry Exporters

`blazen-telemetry` ships four exporters as opt-in Cargo features. Enable the ones you need; the rest stay out of your binary.

| Exporter | Feature flag | Notes |
|----------|--------------|-------|
| OTLP gRPC | `otlp-grpc` | Standard tonic-based exporter for native deployments |
| OTLP HTTP | `otlp-http` | Pure-`reqwest` exporter; works under `wasm32` for browser/Worker telemetry |
| Langfuse | `langfuse` | Native Langfuse trace and observation API for LLM-call attribution |
| Prometheus | `prometheus` | Pull-based metrics endpoint for token counts, step latency, and pipeline stage timings |

All exporters share the same `TelemetryConfig` and per-exporter config structs (`OtlpConfig`, `LangfuseConfig`, `PrometheusConfig`), so swapping backends is a config change, not a code rewrite.

## Documentation

Full documentation, guides, and API reference are available at **[blazen.dev/docs/getting-started/introduction](https://blazen.dev/docs/getting-started/introduction)**.

## License

This project is licensed under the [GNU Affero General Public License v3.0](https://www.gnu.org/licenses/agpl-3.0.en.html) (AGPL-3.0).

## Author

Built by [Zach Handley](https://github.com/ZachHandley).