weavegraph 0.3.0

# Architecture Overview

Comprehensive technical documentation for Weavegraph's internal design and module organization.

**Related Documentation:**
- [Quickstart](QUICKSTART.md) - Core concepts, messages, state, and graphs
- [Operations Guide](OPERATIONS.md) - Event streaming, persistence, testing, and production
- [Documentation Index](INDEX.md) - Complete reference with anchor links

## 🎓 Project Background

Weavegraph originated as a capstone project for a Rust online course, developed by contributors with Python/TypeScript backgrounds and experience with LangGraph and LangChain. The goal was to bring similar graph-based workflow capabilities to Rust while leveraging its performance, safety, and concurrency advantages.

While rooted in educational exploration, Weavegraph continues active development well beyond the classroom setting. The core architecture is solid and the framework is functional, but as an early beta release (v0.2.x), it's still maturing; use with awareness of ongoing API evolution.


| Crate | Purpose | Highlights |
| ----- | ------- | ---------- |
| `weavegraph` | Executes concurrent, stateful graphs with structured observability. | Graph builder + runtime, event bus, checkpointing, reducers, scheduler. |
| `wg-ragsmith` | Provides ingestion, semantic chunking, and storage utilities for RAG workloads. | HTML/JSON parsers, semantic chunkers, SQLite vector store helpers. |


## Overview flowchart of the app (mermaid)

```mermaid
flowchart TB

subgraph Client
  user[Client App or UI]
end

subgraph Build
  gb[GraphBuilder]
end

subgraph Runtime
  app[App]
  sched[Scheduler]
  router[Router: Edges and Commands]
  barrier[Barrier Applier]
end

subgraph Nodes
  usernode[Custom User Nodes]
  llmnode[LLM Node]
  toolnode[Tool Node]
end

subgraph State
  vstate[Versioned State]
  snap[State Snapshot]
end

subgraph Reducers
  redreg[Reducer Registry]
end

subgraph Checkpoint
  cpif[Checkpointer: SQLite/InMemory]
end

subgraph EventBus
  eventbus[Event Bus with Sinks]
end

subgraph Rig
  rigad[Rig Adapter]
  llmprov[LLM Provider: Ollama/MCP]
end

subgraph Tools
  toolreg[Tool Registry]
  exttools[External Tools]
end

user --> gb
gb -->|compile| app

user -->|invoke/invoke_streaming| app

app --> sched
sched -->|creates| snap
vstate --> snap

sched --> usernode
sched --> llmnode
sched --> toolnode

usernode -->|NodePartial| barrier
llmnode -->|NodePartial| barrier
toolnode -->|NodePartial| barrier
redreg --> barrier
barrier -->|merges updates| vstate

snap --> router
app --> router
router -->|next frontier| sched

llmnode --> rigad
rigad --> llmprov
llmprov --> rigad
rigad --> llmnode

toolnode --> toolreg
toolnode --> exttools
exttools --> toolnode

barrier --> cpif

app --> eventbus
eventbus -->|streams events| user
```

---

## Workspace Topology

```
docs/                     → Architectural plans, production hardening roadmap.
weavegraph/               → Core orchestration crate (library + examples + tests).
wg-ragsmith/              → RAG utilities crate (library + examples + tests).
data/                     → Local development databases (ignored in version control).
external/                 → Vendor snapshots (RAGatouille, raptor) kept outside the workspace.
.github/workflows/        → Continuous integration pipelines.
ARCHITECTURE.md           → This document.
```

The workspace targets Rust 1.89 as the minimum supported version and enables 2024 edition
features across both crates.

---

## `weavegraph` Crate

`weavegraph` implements the runtime that powers concurrent, graph-based workflows. The library
is organised around a handful of core modules:

| Module | Highlights |
| ------ | ---------- |
| `graphs::{builder, edges, compilation}` | `GraphBuilder` DSL for wiring nodes, unconditional and conditional edges, and compiling into a runnable `App`. |
| `app` | High-level façade that owns compiled nodes/edges, reducer registry, and runtime config. Provides `invoke`, `invoke_streaming`, and event stream APIs. |
| `runtimes::{runner, checkpointer_*, runtime_config}` | `AppRunner` drives supersteps, coordinates the scheduler, applies barriers, and persists to SQLite (via `sqlx::migrate!`). |
| `schedulers` | Dependency-aware scheduler that fans out runnable nodes and enforces bounded concurrency. |
| `node` | `Node` trait, `NodeContext`, `NodePartial`, and error types used by application code. |
| `state`, `channels`, `reducers` | Versioned state model split across message/extra/error channels with deterministic merge reducers. |
| `event_bus` | Broadcast-based event hub with sinks (stdout, memory, channel, JSON Lines) and streaming helpers for web servers or CLIs. Events support JSON serialization for log aggregation. |
| `telemetry`, `utils` | Tracing helpers, deterministic RNG, clocks, ID generators, and collection utilities. |

### Authoring Nodes & State

Weavegraph applications revolve around three building blocks: nodes, state, and graphs.

> **Note:** `NodeKind::Start` and `NodeKind::End` are virtual structural endpoints.  
> You never register them with `add_node`; attempts to do so are ignored with a warning.  
> Define only your executable (custom) nodes and connect them with edges from `Start` and to `End`.

```rust
use weavegraph::{
    graphs::GraphBuilder,
    message::{Message, Role},
    node::{Node, NodeContext, NodePartial},
    state::VersionedState,
    types::NodeKind,
};
use async_trait::async_trait;

struct GreetingNode;

#[async_trait]
impl Node for GreetingNode {
    async fn run(
        &self,
        _snapshot: weavegraph::state::StateSnapshot,
        ctx: NodeContext,
    ) -> Result<NodePartial, weavegraph::node::NodeError> {
        ctx.emit("greeting", "Saying hi!")?;
        Ok(NodePartial::new().with_messages(vec![Message::with_role(
          Role::Assistant,
          "Hello!",
        )]))
    }
}

let app = GraphBuilder::new()
    .add_node(NodeKind::Custom("greet".into()), GreetingNode)
    .add_edge(NodeKind::Start, NodeKind::Custom("greet".into()))
    .add_edge(NodeKind::Custom("greet".into()), NodeKind::End)
    .compile()?;

let initial = VersionedState::new_with_user_message("Hi?");
let result = app.invoke(initial).await?;
```

**Key practices:**

- Prefer typed roles with `Message::with_role(Role::...)` - see [Messages](QUICKSTART.md#messages)
- Build state with `VersionedState::new_with_user_message` or the builder pattern - see [State Management](QUICKSTART.md#state)
- Use `NodeContext::emit*` helpers for telemetry instead of writing directly to stdout
- Return structured errors (`NodeError::MissingInput`, `NodeError::Provider`, `NodeError::Other`) or populate `NodePartial::with_errors` for recoverable issues - see [Error Handling](OPERATIONS.md#errors)

### Custom Reducers {#custom-reducers}

Weavegraph supports custom reducers for extending or replacing channel update behavior. By default,
three reducers are registered:

- **Message channel**: `AddMessages` – Appends messages to the message list
- **Extra channel**: `MapMerge` – Shallow merges JSON objects in the extra data map
- **Error channel**: `AddErrors` – Appends error events to the error list

To register custom reducers:

```rust
use std::sync::Arc;
use weavegraph::reducers::{Reducer, ReducerRegistry};
use weavegraph::types::ChannelType;

// Define a custom reducer
struct MyCustomReducer;

impl Reducer for MyCustomReducer {
    fn apply(&self, state: &mut VersionedState, update: &NodePartial) {
        // Custom merge logic here
    }
}

// Register during graph building
let app = GraphBuilder::new()
    .add_node(...)
    .with_reducer(ChannelType::Message, Arc::new(MyCustomReducer))
    .compile()?;

// Or replace the entire registry
let custom_registry = ReducerRegistry::new()
    .with_reducer(ChannelType::Message, Arc::new(MyCustomReducer));

let app = GraphBuilder::new()
    .add_node(...)
    .with_reducer_registry(custom_registry)
    .compile()?;
```

Multiple reducers can be registered for the same channel and will be applied in registration order.
This enables middleware-style processing, validation, or transformation of channel updates during
barrier synchronization.

### Execution Flow

1. **Authoring** – Build a graph with `GraphBuilder`, registering nodes (implementations of `Node`)
   and the edges that connect them. Conditional edges can inspect `StateSnapshot` at runtime.
   See [Graph Building](QUICKSTART.md#graphs) for details.
2. **Compilation** – `GraphBuilder::compile()` validates topology and produces an `App`.
3. **Invocation** – `App::invoke()` (or streaming variants like `invoke_streaming`, `invoke_with_channel`)
   constructs an `AppRunner` with the chosen checkpointer (`InMemory` or SQLite), and event bus configuration.
   See [Event Streaming](OPERATIONS.md#event-streaming) for streaming patterns.
4. **Scheduling** – The scheduler selects runnable nodes, issues `NodeContext`s, and executes
   nodes concurrently. Each node returns a `NodePartial` with channel deltas and optional
   control-flow directives.
5. **Barrier & Reduction** – Reducers merge channel updates deterministically, update the
   versioned state, and hand control back to the scheduler for the next superstep.
   See [Custom Reducers](#custom-reducers) above.
6. **Persistence & Observability** – Checkpointer snapshots state into SQLite (when enabled),
   the event bus broadcasts diagnostics / LLM chunk streams, and telemetry surfaces to sinks.
   See [Persistence](OPERATIONS.md#persistence) and [Event Streaming](OPERATIONS.md#event-streaming).

### Optional Features

* `rig` – Enables Rig-based LLM support (Ollama/MCP integrations).
* `llm` – Backward-compatible alias to `rig` for 0.3.x (planned removal in 0.4.0).
* `sqlite-migrations` – Turns on SQLite-backed persistence (default).
* `examples` – Pulls in extra dependencies used by a subset of examples (e.g. `reqwest`, `scraper`).

### Tests & Examples

* `weavegraph/tests/` – Covers state channels, reducers, scheduler semantics, checkpointer, and event bus.
  See [Testing](OPERATIONS.md#testing) for running tests and patterns.
* `examples/` – Progressive walkthroughs:
  * `basic_nodes.rs`, `graph_execution.rs`, `scheduler_fanout.rs` show core messaging and state channels.
    See [Messages](QUICKSTART.md#messages) and [State](QUICKSTART.md#state).
  * `advanced_patterns.rs` covers conditional routing and control-flow helpers.
  * `streaming_events.rs`, `convenience_streaming.rs` demonstrate the
    broadcast event bus and web-friendly streaming patterns.
    See [Event Streaming](OPERATIONS.md#event-streaming).
  * `event_backpressure.rs`, `json_serialization.rs`, `errors_pretty.rs` cover production-facing
    concerns like lag handling, JSON sinks, and pretty diagnostics.

---

### Backpressure and Drop Policy

The event bus uses a bounded broadcast channel (default capacity: 1024 events per subscriber).
When a subscriber falls behind faster producers, the following semantics apply:

- Slow subscribers receive a lag notice and skip older events (no blocking of producers)
- Missed events are counted and exposed via sink diagnostics
- A WARN log entry is emitted with the number of dropped events and the running total
- Streams continue from the most recent position for graceful degradation under load

To adjust capacity, configure the event bus when using `App::invoke_streaming` or construct
an `EventBus` directly with custom capacity via `EventBus::with_capacity`.

For practical guidance and code samples, see:
- [Event Streaming](OPERATIONS.md#event-streaming) for patterns and sink configuration
- `docs/STREAMING.md` for detailed tuning guidance

## `wg-ragsmith` Crate

`wg-ragsmith` contains the ingestion and vector-store tooling used by RAG pipelines. It can be
used standalone or pulled into `weavegraph` via the `examples` feature.

| Module | Highlights |
| ------ | ---------- |
| `ingestion::{cache, chunk, resume}` | Disk-backed document cache, chunk-to-ingestion conversion, and resumable pipeline tracking. |
| `semantic_chunking::{html, json, segmenter, embeddings, service}` | HTML/JSON preprocessors, statistical breakpoint strategies, mock/real embedding providers, and the async chunking service. |
| `stores::sqlite` | `SqliteChunkStore` built on `rig-sqlite` + `sqlite-vec`, including schema, vec3 registration, and helper methods to upsert/search chunks. |
| `types` | `RagError` and supporting data structures for ingestion/persistence. |

### Examples

* `examples/rust_book_pipeline.rs` – Async ingestion pipeline that scrapes the Rust book,
  chunks and embeds sections, and writes them into SQLite.
* `examples/query_chunks.rs` & `query_db.sh` – Smoke tests showing how to query stored chunks.

These examples share environment variables with the weavegraph RAG demo (see `.env.example`).

### Feature Flags

* `semantic-chunking-tiktoken` (default) – OpenAI tiktoken tokeniser.
* `semantic-chunking-rust-bert` – Enables Rust-BERT based embedding pipeline.
* `semantic-chunking-segtok` – Alternative segmentation strategy.

---

## Shared Operational Pieces

* **Tooling** – Standard Rust tooling (`cargo fmt`, `cargo clippy`, `cargo test`,
  `cargo +nightly doc`, `cargo deny`, `cargo machete`) plus `sqlx` migrations keep local
  workflows and CI aligned.
* **CI/CD** – `.github/workflows/ci.yml` runs required checks on `1.90.0`, uses current
  stable as a canary lane, and validates docs on `nightly` with
  `RUSTDOCFLAGS="--cfg docsrs -D warnings"`.
* **Migrations** – `weavegraph/migrations` houses the `sqlx` migration set for the SQLite
  checkpointer. Use `sqlx migrate` to apply or rollback changes.
* **Docs** – `docs/` captures forward-looking design documents (event bus refactor,
  control-flow commands, hybrid RAG pipeline) and the production readiness plan. Use
  this architecture document as the entry point.

---
## petgraph Comparison

Weavegraph's graph implementation was designed with workflow execution in mind, making different
tradeoffs than the general-purpose [petgraph](https://github.com/petgraph/petgraph) crate.
This section documents the architectural differences and integration opportunities.

### Architecture Comparison

| Aspect | Weavegraph | petgraph |
|--------|-----------|----------|
| **Primary Use Case** | Workflow orchestration with async node execution | General graph algorithms and data structures |
| **Graph Type** | Custom `FxHashMap<NodeKind, Vec<NodeKind>>` adjacency | `Graph`, `StableGraph`, `GraphMap`, `MatrixGraph` |
| **Node Identity** | `NodeKind` enum (Start/End/Custom) | `NodeIndex` (u32 handle) |
| **Node Data** | Nodes carry `Arc<dyn Node>` trait objects | Generic node weight type `N` |
| **Edge Storage** | HashMap adjacency list with conditional predicates | Compact edge list with indices |
| **Edge Data** | Unconditional or `EdgePredicate` closures | Generic edge weight type `E` |
| **Cycle Detection** | Custom DFS with three-color marking | `petgraph::algo::is_cyclic_directed` |
| **Reachability** | Custom BFS from Start | `petgraph::algo::has_path_connecting` |
| **Algorithms** | Validation-focused (cycles, reachability, deadends) | Rich library (Dijkstra, MST, SCC, isomorphism, max flow) |
| **Async Support** | First-class (nodes are async) | None (pure data structure) |
| **Serialization** | Custom JSON via serde | `serde-1` feature |

### Key Differences Explained

**Why Weavegraph uses a custom graph:**

1. **Domain-Specific Semantics** — `NodeKind::Start` and `NodeKind::End` are virtual structural
   endpoints that cannot be registered as executable nodes. This enables clear workflow boundaries
   without special-casing in user code.

2. **Conditional Edges** — petgraph edges are static data. Weavegraph edges can be runtime
   predicates (`EdgePredicate`) that inspect state to determine routing. This is fundamental to
   agent decision-making workflows.

3. **Execution Context** — Nodes aren't just data; they're async executables with access to
   `NodeContext` for event emission and metadata. petgraph's node weights are passive data.

4. **Validation Errors** — Compilation produces domain-specific errors like `UnknownNode`,
   `CycleDetected { path }`, `UnreachableFromStart`, and `DeadendNode` with actionable context.

**petgraph advantages:**

1. **Battle-tested** — 3.7k+ GitHub stars, 144 contributors, extensive production usage
2. **Memory-efficient** — Compact edge storage, cache-friendly node indices
3. **Algorithm library** — Dijkstra, topological sort, strongly connected components, etc.
4. **Index stability** — `StableGraph` maintains valid indices through mutations

### Integration Approach

Weavegraph takes a **selective adoption** approach rather than replacing its core graph:

```rust
// Feature-gated behind `petgraph-compat`
#[cfg(feature = "petgraph-compat")]
impl From<&CompiledGraph> for petgraph::Graph<NodeKind, ()> {
    fn from(graph: &CompiledGraph) -> Self {
        // Convert for visualization or analysis
    }
}
```

**Current integrations:**

- **Graph iteration API** — `Graph::nodes()` and `Graph::edges()` iterators mirror petgraph idioms
- **Topological sort** — `Graph::topological_sort()` for deterministic node ordering
- **DOT export** — Optional petgraph-based visualization via `dot` format

**Future opportunities:**

- **Advanced routing** — Use petgraph's shortest path for "fastest path to End" analysis
- **Cycle detection fallback** — Validate against petgraph's implementation
- **Graph visualization** — Generate DOT/GraphViz output for debugging

### When to Use petgraph Directly

Use petgraph when you need:
- Pure graph algorithms without execution semantics
- Memory-optimal large graph storage
- Pre-built algorithms (MST, max flow, isomorphism)
- Static graph analysis tooling

Use Weavegraph when you need:
- Async node execution with state management
- Conditional runtime routing based on state
- Event streaming and observability
- Checkpoint/resume workflow persistence
- LLM agent orchestration patterns

### Code Example: Hybrid Usage

```rust
use weavegraph::graphs::GraphBuilder;
use weavegraph::types::NodeKind;

// Build workflow with Weavegraph
let builder = GraphBuilder::new()
    .add_node(NodeKind::Custom("analyze".into()), AnalyzeNode)
    .add_node(NodeKind::Custom("summarize".into()), SummarizeNode)
    .add_edge(NodeKind::Start, NodeKind::Custom("analyze".into()))
    .add_edge(NodeKind::Custom("analyze".into()), NodeKind::Custom("summarize".into()))
  .add_edge(NodeKind::Custom("summarize".into()), NodeKind::End);

// Convert to petgraph for analysis (feature-gated)
#[cfg(feature = "petgraph-compat")]
{
    use weavegraph::graphs::PetgraphConversion;
    let pg = builder.to_petgraph();

    // Use petgraph algorithms
    let topo_order = petgraph::algo::toposort(&pg.graph, None)?;
    let dot = petgraph::dot::Dot::new(&pg.graph);
    println!("DOT output:\n{:?}", dot);
}

// Execute with Weavegraph
let app = builder.compile()?;
let result = app.invoke(initial_state).await?;
```