Features
- Event-driven architecture -- Type-safe events connect workflow steps with zero boilerplate via derive macros (Rust) or subclassing (Python) or plain objects (TypeScript)
- 15+ LLM providers -- OpenAI, Anthropic, Gemini, Azure, OpenRouter, Groq, Together AI, Mistral, DeepSeek, Fireworks, Perplexity, xAI, Cohere, AWS Bedrock, and fal.ai -- with streaming, tool calling, structured output, and multimodal support
- Content handles for tools -- Tools accept multimodal inputs (image, audio, video, document, 3D, CAD) via typed content handles backed by a pluggable
ContentStore(in-memory, local-file, OpenAI Files, Anthropic Files, Gemini Files, fal.ai storage, or your own). Tool results now carry multimodal payloads on every provider, not just Anthropic - Multi-workflow pipelines -- Orchestrate sequential and parallel stages with pause/resume and per-workflow streaming
- Branching and fan-out -- Conditional branching, parallel fan-out, and real-time streaming within workflows
- Native Python and TypeScript bindings -- Python via PyO3/maturin, Node.js/TypeScript via napi-rs. Not wrappers around HTTP -- actual compiled Rust running in-process
- WebAssembly SDK -- Run Blazen in the browser, edge workers, Deno, and embedded runtimes via
@blazen/sdk. Same Rust core compiled to WASM - Prompt management -- Versioned prompt templates with
{{variable}}interpolation, YAML/JSON registries, and multimodal attachments - Persistence -- Embedded persistence via redb, or bring-your-own via callbacks. Pause a workflow, serialize state to JSON, resume later
- Identity-preserving live state -- Pass DB connections, Pydantic models, and other live objects through events and the new
ctx.state/ctx.sessionnamespaces.StopEvent(result=obj)round-trips non-JSON Python values withis-identity preserved -- the engine no longer silently stringifies unpicklable results - Typed error hierarchies -- Both Python and Node ship a full subclass tree (
BlazenErrorplus ~87 leaves likeRateLimitError,LlamaCppError,MistralRsError,CandleLlmError,WhisperCppError,PiperError,DiffusionError) so callers can write idiomaticexcept RateLimitError/catch (e instanceof RateLimitError)instead of string-matching messages - Bindings parity --
tools/audit_bindings.pywalks every public Rust symbol across allblazen-*crates and verifies the Python, Node, and WASM-SDK surfaces mirror it 1:1. The current report is0 / 0 / 0gaps, and CI fails on regression, so the bindings stay in lockstep with the Rust core - Observability -- OpenTelemetry spans (OTLP gRPC and OTLP HTTP, the latter wasm-eligible), Prometheus metrics, and Langfuse all ship as opt-in features in
blazen-telemetry-- enable an exporter, point it at your collector, and every step, LLM call, and pipeline stage is instrumented automatically
Installation
Rust:
Python (requires Python 3.9+):
Node.js / TypeScript:
WebAssembly (browser, edge, Deno, Cloudflare Workers):
Quick Start
Rust
use *;
async
async
async
Python
:
return
return
=
= await
= await
# {"result": {"greeting": "Hello, Zach!"}}
TypeScript
import { Workflow } from "blazen";
const workflow = new Workflow("greeter");
workflow.addStep("parse_input", ["blazen::StartEvent"], async (event, ctx) => {
const name = event.name ?? "World";
return { type: "GreetEvent", name };
});
workflow.addStep("greet", ["GreetEvent"], async (event, ctx) => {
return {
type: "blazen::StopEvent",
result: { greeting: `Hello, ${event.name}!` },
};
});
const result = await workflow.run({ name: "Zach" });
console.log(result.data); // { greeting: "Hello, Zach!" }
Cloudflare Workers
Blazen runs the full workflow engine inside Cloudflare Workers via @blazen/sdk. Multi-step LLM workflows, agents, and pipelines all execute on workerd -- Cloudflare's production runtime -- with no special configuration beyond wasm-pack build --target web --release and passing the compiled WebAssembly.Module to initSync at module load.
import { initSync, Workflow } from "@blazen/sdk";
// Wrangler resolves `*.wasm` imports as `WebAssembly.Module` instances.
import wasmModule from "@blazen/sdk/blazen_wasm_sdk_bg.wasm";
initSync({ module: wasmModule as WebAssembly.Module });
export default {
async fetch(): Promise<Response> {
const wf = new Workflow("greeter");
wf.addStep("parse", ["blazen::StartEvent"], (event: any) => ({
type: "GreetEvent",
name: event?.data?.name ?? "World",
}));
wf.addStep("greet", ["GreetEvent"], (event: any) => ({
type: "StopEvent",
result: { greeting: `Hello, ${event.name}!` },
}));
const result = await wf.run({});
return Response.json(result);
},
};
A complete runnable setup -- wrangler.toml, vitest integration test exercising the worker against a real workerd instance, and the wasm-pack build wiring -- lives in examples/cloudflare-worker/. CI builds and tests it on every push, so the Workers target is a supported deployment surface, not aspirational.
Note: Cloudflare Workers cap CPU time per request (10ms on the free plan, up to 30s on paid plans). Long-running multi-call LLM flows should either fit within those limits, be split across requests using Blazen's pause/resume snapshots, or run on the WASIp2 component (blazen-wasm) for ZLayer edge deployment without the per-request cap.
WASM SDK feature parity
@blazen/sdk is no longer the "lite" sibling. It now matches the Node binding for every workflow, pipeline, and handler primitive that makes sense in a browser or Worker:
- Pipelines --
input_mapper,condition,onPersist, andonPersistJsoncallbacks for sequential and parallel stages - Workflows --
setSessionPausePolicy,runStreaming(input, onEvent),runWithHandler(input), andresumeWithSerializableRefs(snapshot, refs) - Handlers --
respondToInput,snapshot,resumeInPlace,streamEvents(callback), andaborton the returned handle - Context -- session-ref serialization round-trips opaque host values across pause/resume the same way the Node and Python bindings do
- In-browser embeddings --
TractEmbedModel.create(modelUrl, tokenizerUrl)loads an ONNX embedding model and a HuggingFace tokenizer from URLs and runs inference on the CPU viatract, so RAG and semantic-memory flows work in the browser with no server round-trip
If a workflow runs against the Node binding, the same code path runs under @blazen/sdk -- the only differences are the runtime-specific wiring (Node fs vs. browser fetch).
LLM Integration
Every provider implements the same CompletionModel trait/interface. Switch providers by changing one line.
Rust
use ;
use OpenAiProvider;
let model = new;
let request = new;
let response = model.complete.await?;
println!;
Use any OpenAI-compatible provider with OpenAiCompatProvider:
use OpenAiCompatProvider;
let groq = groq;
let openrouter = openrouter;
let together = together;
let deepseek = deepseek;
Python
=
# or: CompletionModel.anthropic(options=ProviderOptions(api_key="sk-ant-..."))
# or: CompletionModel.groq(options=ProviderOptions(api_key="gsk-..."))
# or: CompletionModel.openrouter(options=ProviderOptions(api_key="sk-or-..."))
# or with env vars: CompletionModel.openai()
: = await
# typed attribute access
# model name used
# TokenUsage with .prompt_tokens, .completion_tokens, .total_tokens
TypeScript
import { CompletionModel, ChatMessage, Role } from "blazen";
import type { CompletionResponse } from "blazen";
const model = CompletionModel.openai({ apiKey: "sk-..." });
// or: CompletionModel.anthropic({ apiKey: "sk-ant-..." })
// or: CompletionModel.groq({ apiKey: "gsk-..." })
// or: CompletionModel.openrouter({ apiKey: "sk-or-..." })
// or with env vars: CompletionModel.openai()
const response: CompletionResponse = await model.complete([
ChatMessage.system("You are helpful."),
ChatMessage.user("What is the meaning of life?"),
]);
console.log(response.content); // string
console.log(response.model); // model name used
console.log(response.usage); // { promptTokens, completionTokens, totalTokens }
console.log(response.finishReason);
Multimodal Tool I/O
Tools can declare typed multimodal inputs via the image_input, audio_input, file_input, three_d_input, cad_input, and video_input schema helpers, and return multimodal results by emitting an LlmPayload::Parts value mixing text, images, audio, video, documents, 3D meshes, and CAD geometry. Result payloads round-trip through every provider, not just Anthropic.
Inputs flow through a pluggable ContentStore. You register a blob, URL, or remote-file reference with the store and receive a stable handle id; the model sees that id in the tool's JSON schema and emits it back in the tool call. Blazen's runner resolves the handle against the store and substitutes the typed content into the tool arguments before the user-supplied handler executes -- handlers never deal with raw blob plumbing.
Rust
use ;
use image_input;
use ;
use Arc;
let store: = new;
let handle = store
.put
.await?;
// Declare a tool that accepts a content handle as its `photo` argument.
let tool = ToolDefinition ;
// The model emits {"photo": "<handle-id>"} as a tool call;
// Blazen's runner substitutes the resolved image content before
// the tool handler runs.
Python
=
= await
# Tool declaration uses image_input() to advertise a content-ref input:
=
# -> {"type": "object", "properties": {"photo": {"type": "string", ..., "x-blazen-content-ref": {"kind": "image"}}}, "required": ["photo"]}
TypeScript
import { ContentStore, imageInput } from "blazen";
const store = ContentStore.inMemory();
const handle = await store.put(Buffer.from(pngBytes), {
kind: "image",
mimeType: "image/png",
});
// Tool input schema:
const schema = imageInput("photo", "The image to describe");
See docs/guides/tool-multimodal/ for the cross-cutting guide and docs/guides/{rust,python,node,wasm}/multimodal/ for per-language details.
Streaming
Steps can publish intermediate events to an external stream via write_event_to_stream on the context. Consumers subscribe before awaiting the final result.
Rust
async
// Consumer side:
let handler = workflow.run.await?;
let mut stream = handler.stream_events;
while let Some = stream.next.await
let result = handler.result.await?;
Python
return
# Consumer side:
= await
= await
TypeScript
// Using runStreaming with a callback:
const result = await workflow.runStreaming({ message: "go" }, (event) => {
console.log("stream:", event.type, event);
});
// Or using the handler API:
const handler = await workflow.runWithHandler({ message: "go" });
await handler.streamEvents((event) => {
console.log("stream:", event.type, event);
});
const result = await handler.result();
Crate / Package Structure
| Crate | Description |
|---|---|
blazen |
Umbrella crate re-exporting everything |
blazen-events |
Core event traits, StartEvent, StopEvent, DynamicEvent, and derive macro support |
blazen-macros |
#[derive(Event)] and #[step] proc macros |
blazen-core |
Workflow engine, context, step registry, pause/resume, and snapshots |
blazen-llm |
LLM provider abstraction -- CompletionModel, StructuredOutput, EmbeddingModel, Tool |
blazen-pipeline |
Multi-workflow pipeline orchestrator with sequential/parallel stages |
blazen-prompts |
Prompt template management with versioning and YAML/JSON registries |
blazen-memory |
Memory and vector store with LSH-based approximate nearest-neighbor retrieval |
blazen-memory-valkey |
Valkey/Redis backend for blazen-memory |
blazen-persist |
Optional persistence layer (redb) |
blazen-telemetry |
Observability: OpenTelemetry spans, Prometheus metrics, Langfuse, and LLM call history |
blazen-py |
Python bindings via PyO3/maturin (published to PyPI as blazen) |
blazen-node |
Node.js/TypeScript bindings via napi-rs (published to npm as blazen) |
blazen-wasm-sdk |
TypeScript/JS client SDK via WebAssembly (published to npm as @blazen/sdk) |
blazen-wasm |
WASIp2 WASM component for ZLayer edge deployment |
blazen-cli |
CLI tool for scaffolding projects (blazen init) |
Supported LLM Providers
| Provider | Constructor | Default Model |
|---|---|---|
| OpenAI | OpenAiProvider::new / .openai() |
gpt-4.1 |
| Anthropic | AnthropicProvider::new / .anthropic() |
claude-sonnet-4-5-20250929 |
| Google Gemini | GeminiProvider::new / .gemini() |
gemini-2.5-flash |
| Azure OpenAI | AzureOpenAiProvider::new / .azure() |
(deployment-specific) |
| OpenRouter | .openrouter() |
openai/gpt-4.1 |
| Groq | .groq() |
llama-3.3-70b-versatile |
| Together AI | .together() |
meta-llama/Llama-3.3-70B-Instruct-Turbo |
| Mistral | .mistral() |
mistral-large-latest |
| DeepSeek | .deepseek() |
deepseek-chat |
| Fireworks | .fireworks() |
accounts/fireworks/models/llama-v3p3-70b-instruct |
| Perplexity | .perplexity() |
sonar-pro |
| xAI (Grok) | .xai() |
grok-3 |
| Cohere | .cohere() |
command-a-08-2025 |
| AWS Bedrock | .bedrock() |
anthropic.claude-sonnet-4-5-20250929-v1:0 |
| fal.ai | FalProvider::new / .fal() |
(image generation) |
All OpenAI-compatible providers are accessible through OpenAiCompatProvider in Rust, or through static factory methods on CompletionModel in Python and TypeScript.
Typed Errors
Every error the engine, the LLM layer, or a backend can raise has a dedicated subclass in both Python and Node, so callers branch on type instead of parsing strings. The hierarchy is rooted at BlazenError (extending the host language's base Error / Exception) and fans out to ~87 leaves covering provider failures (RateLimitError, AuthError, ContextLengthError), local-inference backends (LlamaCppError, MistralRsError, CandleLlmError, WhisperCppError, PiperError, DiffusionError), persistence (PersistError, SnapshotError), and workflow control flow (StepNotFoundError, EventTypeMismatchError, WorkflowAbortedError).
= await
await
import { RateLimitError, AuthError, BlazenError } from "blazen";
try {
const response = await model.complete(messages);
} catch (e) {
if (e instanceof RateLimitError) await sleep(e.retryAfter ?? 5_000);
else if (e instanceof AuthError) rotateApiKey();
else if (e instanceof BlazenError) log.error("blazen failure", e);
else throw e;
}
Telemetry Exporters
blazen-telemetry ships four exporters as opt-in Cargo features. Enable the ones you need; the rest stay out of your binary.
| Exporter | Feature flag | Notes |
|---|---|---|
| OTLP gRPC | otlp-grpc |
Standard tonic-based exporter for native deployments |
| OTLP HTTP | otlp-http |
Pure-reqwest exporter; works under wasm32 for browser/Worker telemetry |
| Langfuse | langfuse |
Native Langfuse trace and observation API for LLM-call attribution |
| Prometheus | prometheus |
Pull-based metrics endpoint for token counts, step latency, and pipeline stage timings |
All exporters share the same TelemetryConfig and per-exporter config structs (OtlpConfig, LangfuseConfig, PrometheusConfig), so swapping backends is a config change, not a code rewrite.
Documentation
Full documentation, guides, and API reference are available at blazen.dev/docs/getting-started/introduction.
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Author
Built by Zach Handley.