fornix

Knowledge storage, retrieval, and graph infrastructure for cognitive systems.

fornix is a modular Rust library for building retrieval-heavy and agentic systems. It combines vector search, BM25 search, hybrid fusion, ontology-constrained extraction, graph reasoning, GraphRAG, routing, prompt tuning, and an autonomous tool-using agent runtime behind feature flags so you can enable only what you need.

Who This Is For

Teams building RAG pipelines that need both lexical and semantic retrieval.
Systems that need graph-aware retrieval and causal traversal.
Agent runtimes that require strict policy controls around tool usage.
Projects that want domain-constrained entity extraction with schema validation.
Projects that prefer composable, feature-gated crates over monolithic frameworks.

Highlights

Feature-gated architecture for lean builds.
Pure in-memory adapters for fast local development and testing.
Typed, composable APIs across storage, retrieval, graph, and agent layers.
Ontology-constrained extraction: validated entity/relation types, alias resolution, per-type LLM prompt guidance.
Built-in evaluation, filtering, and query-gap tracking for RAG workflows.
End-to-end tested with unit tests plus doctests.

Installation

Add the crate with only the features you want.

[dependencies]
fornix = { version = "0.3", features = ["vector", "bm25", "hybrid"] }

Or enable everything:

[dependencies]
fornix = { version = "0.3", features = ["full"] }

Pick Features by Use Case

Basic vector search: vector
Keyword search: bm25
Hybrid retrieval (vector + BM25): vector, bm25, hybrid
Ontology schema: ontology
Graph knowledge layer: graph
Graph + ontology validation: graph, ontology
Graph-augmented retrieval with schema: graphrag (pulls in graph, rag, hybrid, ontology)
Prompt optimization: rag, tuner
Agent runtime with routing: router, agent

Feature Flags

The crate is layered; higher-level modules depend on lower-level ones:

store: base adapter traits, config, health, and error types.
cache: caching adapters (store).
vector: vector adapters and analysis (store).
bm25: keyword retrieval adapters (store).
hybrid: fused vector + BM25 retrieval (vector, bm25).
ontology: domain-aware type schemas with alias resolution, validation, and prompt construction (store).
graph: knowledge graph + temporal/causal APIs (store).
rag: chunking, filters, eval, reranking (hybrid).
graphrag: graph-augmented retrieval with ontology-constrained extraction (graph, rag, hybrid, ontology).
router: model routing strategies + forest (cache).
diff: boundary-aware textual diff snippets.
tuner: prompt optimization strategies (rag).
agent: recursive tool-using agent runtime (router).
full: enables all modules.

The common module is always available.

Module Overview

store: foundational traits such as StorageAdapter and AdapterFactory.
cache: memory/null cache adapters and deterministic cache keying.
vector: vector storage, nearest-neighbor search, and embedding analytics.
bm25: Okapi BM25 tokenization + scoring + indexing.
hybrid: weighted fusion (Rrf, Linear) and confidence scoring.
ontology: domain-aware type schemas — Definition, EntityTypeDefinition, RelationTypeDefinition, PropertyDefinition, OntologyValidator, OntologyPrompt, MemoryOntologyRegistry, alignment types. Mirrors cortex-ontology; JSON-serialisable for the Ruby native extension boundary.
graph: entity/relation graph with bitemporal semantics and causal traversal. When the ontology feature is also enabled, GraphConfig accepts an ontology and ontology_strict flag; OntologyViolation is the error raised in strict mode.
graphrag: local/global/hybrid graph search modes. Always pulls in ontology. GraphRagConfig accepts ontology: Option<Arc<Definition>>; effective_entity_types() / effective_relation_types() derive type lists from the ontology when set.
rag: chunkers, post-filters, rerankers, evaluation metrics, query-gap tracker.
router: regex, round-robin, weighted random, embedding-threshold, and RoRF routing.
diff: focused and stitched snippets with change markers.
tuner: MIPROv2, GEPA, and no-op prompt tuning.
agent: solve loop with tool execution, recursion, policy controls, and token budgeting.

Quick Start

Most adapters are async. Use a Tokio runtime in your app:

use fornix::vector::{VectorAdapter, VectorConfig, adapters::MemoryVectorAdapter};
use fornix::vector::adapter::SearchOptions;

#[tokio::main]
async fn main() {
    let adapter = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2))
        .await
        .unwrap();

    adapter.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let results = adapter
        .nearest_neighbors(&[1.0, 0.0], None, SearchOptions::default())
        .await
        .unwrap();

    println!("top id: {}", results[0].id);
}

Ontology-Constrained GraphRAG

use std::sync::Arc;
use fornix::ontology::{Definition, EntityTypeDefinition, RelationTypeDefinition,
                       PropertyDefinition, OntologyValidator, OntologyPrompt};
use fornix::graphrag::GraphRagConfig;

fn main() {
    // Build an ontology
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    def.entity_types.push(EntityTypeDefinition {
        name: "Regulation".to_string(),
        description: Some("A codified rule in the CFR.".to_string()),
        extraction_strategy: Some("llm".to_string()),
        extraction_patterns: Vec::new(),
        aliases: vec!["Provision".to_string()],
        properties: vec![PropertyDefinition::required("cfr_citation", "string")],
    });
    def.relation_types.push(RelationTypeDefinition {
        name: "ISSUED_BY".to_string(),
        description: None,
        source_types: vec!["Regulation".to_string()],
        target_types: vec!["Agency".to_string()],
        properties: Vec::new(),
    });

    let ont = Arc::new(def);

    // Validate during extraction
    let validator = OntologyValidator::new(&ont);
    assert!(validator.known_entity_type("Regulation"));
    assert_eq!(validator.canonical_entity_type("Provision"), Some("Regulation"));

    // Build per-type prompt guidance
    let prompt = OntologyPrompt::build_entity_prompt(&ont, "Regulation").unwrap();
    println!("{}", prompt);

    // Configure GraphRAG to use the ontology
    let config = GraphRagConfig {
        ontology: Some(Arc::clone(&ont)),
        ..Default::default()
    };

    // effective_entity_types() now derives from the ontology
    let types = config.effective_entity_types();
    assert!(types.contains(&"Regulation".to_string()));
    assert!(!types.contains(&"Person".to_string())); // default fallback not used
}

Ontology-Validated Graph Writes

use std::sync::Arc;
use fornix::ontology::Definition;
use fornix::graph::GraphConfig;

fn main() {
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    // ... populate entity/relation types ...

    let config = GraphConfig {
        ontology: Some(Arc::new(def)),
        ontology_strict: true, // violations raise OntologyViolation
        ..Default::default()
    };

    // Call config.validate_entity_type("SomeType") before create_entity
    // to get the canonical name or an OntologyViolation error.
    match config.validate_entity_type("Provision") {
        Ok(canonical) => println!("canonical: {}", canonical), // "Regulation"
        Err(e) => eprintln!("violation: {}", e),
    }
}

JSON Boundary (Ruby Native Extension)

Definition serialises cleanly for the Ruby–Rust boundary:

use fornix::ontology::Definition;

let json = my_definition.to_json().unwrap();
// Pass JSON string to Ruby via Magnus; deserialise on the way back:
let def = Definition::from_json(&json).unwrap();

Hybrid Retrieval

use fornix::bm25::{adapters::MemoryBm25Adapter, adapter::IndexDocument, Bm25Adapter, Bm25Config};
use fornix::vector::{adapters::MemoryVectorAdapter, VectorAdapter, VectorConfig};
use fornix::hybrid::{HybridConfig, HybridSearch, search::HybridSearchOptions};

#[tokio::main]
async fn main() {
    let bm25 = MemoryBm25Adapter::connect(Bm25Config::default()).await.unwrap();
    let vector = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2)).await.unwrap();

    bm25.index(IndexDocument::new("doc-1", "rust systems programming"), None)
        .await
        .unwrap();
    vector.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let search = HybridSearch::new(bm25, vector, HybridConfig::default());
    let results = search
        .search("rust", &[1.0, 0.0], None, HybridSearchOptions::new())
        .await
        .unwrap();

    println!("hybrid top: {}", results[0].id);
}

Graph + Causal Traversal

use fornix::graph::{adapters::MemoryGraphAdapter, GraphAdapter, GraphConfig};
use fornix::graph::adapter::CausalOptions;

#[tokio::main]
async fn main() {
    let graph = MemoryGraphAdapter::connect(GraphConfig::default()).await.unwrap();

    let rain  = graph.create_entity("Heavy Rain", "Weather", None, None).await.unwrap();
    let flood = graph.create_entity("Flooding",   "Event",   None, None).await.unwrap();

    graph.create_relation(rain.id, flood.id, "CAUSES", None, None)
        .await
        .unwrap();

    let paths = graph
        .causal_descendants(rain.id, CausalOptions::default(), None)
        .await
        .unwrap();

    println!("first relation: {}", paths[0].edges[0].relation_type);
}

Agent Runtime

The agent module provides an autonomous solve loop that can:

call an injected model client,
dispatch registered tools,
recurse through sub-tasks,
enforce policy constraints,
track token/time/step budgets,
compact memory when context grows.

You provide implementations for ModelClient and ToolRegistry.

Design Notes

common is always available and not feature-gated.
ontology has no async surface — all operations are synchronous and Send + Sync.
In-memory adapters are intentionally first-class for deterministic tests and local prototyping.
Higher-level modules (graphrag, agent) build on lower-level primitives rather than hiding them.
APIs favor explicit typed options (SearchOptions, CausalOptions, config structs) over implicit globals.
The ontology module is JSON-serialisable end-to-end to support the Ruby native extension boundary without any unsafe code.

Development

Run all tests:

cargo test --features full

Run clippy across all targets/features:

cargo clippy --all-targets --all-features

Run just the ontology module tests:

cargo test --features ontology -p fornix

Notes on Adapters

In-memory adapters are ideal for local development, integration tests, and reference behavior.
Backend adapters (Postgres/Qdrant/Redis) are exposed as module surfaces; some implementations are currently stubs in this crate layout.
For production deployments, verify backend adapter completeness against your selected features before rollout.

Versioning

Current crate version: 0.3.2.

As the module surface is broad and evolving, pin a minor version in production environments and review changelogs before upgrading.

License

MIT

fornix 0.4.0