fornix 0.4.0

Knowledge storage, retrieval, and graph infrastructure for cognitive systems
Documentation

fornix

Knowledge storage, retrieval, and graph infrastructure for cognitive systems.

fornix is a modular Rust library for building retrieval-heavy and agentic systems. It combines vector search, BM25 search, hybrid fusion, ontology-constrained extraction, graph reasoning, GraphRAG, routing, prompt tuning, and an autonomous tool-using agent runtime behind feature flags so you can enable only what you need.

Who This Is For

  • Teams building RAG pipelines that need both lexical and semantic retrieval.
  • Systems that need graph-aware retrieval and causal traversal.
  • Agent runtimes that require strict policy controls around tool usage.
  • Projects that want domain-constrained entity extraction with schema validation.
  • Projects that prefer composable, feature-gated crates over monolithic frameworks.

Highlights

  • Feature-gated architecture for lean builds.
  • Pure in-memory adapters for fast local development and testing.
  • Typed, composable APIs across storage, retrieval, graph, and agent layers.
  • Ontology-constrained extraction: validated entity/relation types, alias resolution, per-type LLM prompt guidance.
  • Built-in evaluation, filtering, and query-gap tracking for RAG workflows.
  • End-to-end tested with unit tests plus doctests.

Installation

Add the crate with only the features you want.

[dependencies]
fornix = { version = "0.3", features = ["vector", "bm25", "hybrid"] }

Or enable everything:

[dependencies]
fornix = { version = "0.3", features = ["full"] }

Pick Features by Use Case

  • Basic vector search: vector
  • Keyword search: bm25
  • Hybrid retrieval (vector + BM25): vector, bm25, hybrid
  • Ontology schema: ontology
  • Graph knowledge layer: graph
  • Graph + ontology validation: graph, ontology
  • Graph-augmented retrieval with schema: graphrag (pulls in graph, rag, hybrid, ontology)
  • Prompt optimization: rag, tuner
  • Agent runtime with routing: router, agent

Feature Flags

The crate is layered; higher-level modules depend on lower-level ones:

  • store: base adapter traits, config, health, and error types.
  • cache: caching adapters (store).
  • vector: vector adapters and analysis (store).
  • bm25: keyword retrieval adapters (store).
  • hybrid: fused vector + BM25 retrieval (vector, bm25).
  • ontology: domain-aware type schemas with alias resolution, validation, and prompt construction (store).
  • graph: knowledge graph + temporal/causal APIs (store).
  • rag: chunking, filters, eval, reranking (hybrid).
  • graphrag: graph-augmented retrieval with ontology-constrained extraction (graph, rag, hybrid, ontology).
  • router: model routing strategies + forest (cache).
  • diff: boundary-aware textual diff snippets.
  • tuner: prompt optimization strategies (rag).
  • agent: recursive tool-using agent runtime (router).
  • full: enables all modules.

The common module is always available.

Module Overview

  • store: foundational traits such as StorageAdapter and AdapterFactory.
  • cache: memory/null cache adapters and deterministic cache keying.
  • vector: vector storage, nearest-neighbor search, and embedding analytics.
  • bm25: Okapi BM25 tokenization + scoring + indexing.
  • hybrid: weighted fusion (Rrf, Linear) and confidence scoring.
  • ontology: domain-aware type schemas — Definition, EntityTypeDefinition, RelationTypeDefinition, PropertyDefinition, OntologyValidator, OntologyPrompt, MemoryOntologyRegistry, alignment types. Mirrors cortex-ontology; JSON-serialisable for the Ruby native extension boundary.
  • graph: entity/relation graph with bitemporal semantics and causal traversal. When the ontology feature is also enabled, GraphConfig accepts an ontology and ontology_strict flag; OntologyViolation is the error raised in strict mode.
  • graphrag: local/global/hybrid graph search modes. Always pulls in ontology. GraphRagConfig accepts ontology: Option<Arc<Definition>>; effective_entity_types() / effective_relation_types() derive type lists from the ontology when set.
  • rag: chunkers, post-filters, rerankers, evaluation metrics, query-gap tracker.
  • router: regex, round-robin, weighted random, embedding-threshold, and RoRF routing.
  • diff: focused and stitched snippets with change markers.
  • tuner: MIPROv2, GEPA, and no-op prompt tuning.
  • agent: solve loop with tool execution, recursion, policy controls, and token budgeting.

Quick Start

Most adapters are async. Use a Tokio runtime in your app:

use fornix::vector::{VectorAdapter, VectorConfig, adapters::MemoryVectorAdapter};
use fornix::vector::adapter::SearchOptions;

#[tokio::main]
async fn main() {
    let adapter = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2))
        .await
        .unwrap();

    adapter.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let results = adapter
        .nearest_neighbors(&[1.0, 0.0], None, SearchOptions::default())
        .await
        .unwrap();

    println!("top id: {}", results[0].id);
}

Ontology-Constrained GraphRAG

use std::sync::Arc;
use fornix::ontology::{Definition, EntityTypeDefinition, RelationTypeDefinition,
                       PropertyDefinition, OntologyValidator, OntologyPrompt};
use fornix::graphrag::GraphRagConfig;

fn main() {
    // Build an ontology
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    def.entity_types.push(EntityTypeDefinition {
        name: "Regulation".to_string(),
        description: Some("A codified rule in the CFR.".to_string()),
        extraction_strategy: Some("llm".to_string()),
        extraction_patterns: Vec::new(),
        aliases: vec!["Provision".to_string()],
        properties: vec![PropertyDefinition::required("cfr_citation", "string")],
    });
    def.relation_types.push(RelationTypeDefinition {
        name: "ISSUED_BY".to_string(),
        description: None,
        source_types: vec!["Regulation".to_string()],
        target_types: vec!["Agency".to_string()],
        properties: Vec::new(),
    });

    let ont = Arc::new(def);

    // Validate during extraction
    let validator = OntologyValidator::new(&ont);
    assert!(validator.known_entity_type("Regulation"));
    assert_eq!(validator.canonical_entity_type("Provision"), Some("Regulation"));

    // Build per-type prompt guidance
    let prompt = OntologyPrompt::build_entity_prompt(&ont, "Regulation").unwrap();
    println!("{}", prompt);

    // Configure GraphRAG to use the ontology
    let config = GraphRagConfig {
        ontology: Some(Arc::clone(&ont)),
        ..Default::default()
    };

    // effective_entity_types() now derives from the ontology
    let types = config.effective_entity_types();
    assert!(types.contains(&"Regulation".to_string()));
    assert!(!types.contains(&"Person".to_string())); // default fallback not used
}

Ontology-Validated Graph Writes

use std::sync::Arc;
use fornix::ontology::Definition;
use fornix::graph::GraphConfig;

fn main() {
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    // ... populate entity/relation types ...

    let config = GraphConfig {
        ontology: Some(Arc::new(def)),
        ontology_strict: true, // violations raise OntologyViolation
        ..Default::default()
    };

    // Call config.validate_entity_type("SomeType") before create_entity
    // to get the canonical name or an OntologyViolation error.
    match config.validate_entity_type("Provision") {
        Ok(canonical) => println!("canonical: {}", canonical), // "Regulation"
        Err(e) => eprintln!("violation: {}", e),
    }
}

JSON Boundary (Ruby Native Extension)

Definition serialises cleanly for the Ruby–Rust boundary:

use fornix::ontology::Definition;

let json = my_definition.to_json().unwrap();
// Pass JSON string to Ruby via Magnus; deserialise on the way back:
let def = Definition::from_json(&json).unwrap();

Hybrid Retrieval

use fornix::bm25::{adapters::MemoryBm25Adapter, adapter::IndexDocument, Bm25Adapter, Bm25Config};
use fornix::vector::{adapters::MemoryVectorAdapter, VectorAdapter, VectorConfig};
use fornix::hybrid::{HybridConfig, HybridSearch, search::HybridSearchOptions};

#[tokio::main]
async fn main() {
    let bm25 = MemoryBm25Adapter::connect(Bm25Config::default()).await.unwrap();
    let vector = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2)).await.unwrap();

    bm25.index(IndexDocument::new("doc-1", "rust systems programming"), None)
        .await
        .unwrap();
    vector.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let search = HybridSearch::new(bm25, vector, HybridConfig::default());
    let results = search
        .search("rust", &[1.0, 0.0], None, HybridSearchOptions::new())
        .await
        .unwrap();

    println!("hybrid top: {}", results[0].id);
}

Graph + Causal Traversal

use fornix::graph::{adapters::MemoryGraphAdapter, GraphAdapter, GraphConfig};
use fornix::graph::adapter::CausalOptions;

#[tokio::main]
async fn main() {
    let graph = MemoryGraphAdapter::connect(GraphConfig::default()).await.unwrap();

    let rain  = graph.create_entity("Heavy Rain", "Weather", None, None).await.unwrap();
    let flood = graph.create_entity("Flooding",   "Event",   None, None).await.unwrap();

    graph.create_relation(rain.id, flood.id, "CAUSES", None, None)
        .await
        .unwrap();

    let paths = graph
        .causal_descendants(rain.id, CausalOptions::default(), None)
        .await
        .unwrap();

    println!("first relation: {}", paths[0].edges[0].relation_type);
}

Agent Runtime

The agent module provides an autonomous solve loop that can:

  • call an injected model client,
  • dispatch registered tools,
  • recurse through sub-tasks,
  • enforce policy constraints,
  • track token/time/step budgets,
  • compact memory when context grows.

You provide implementations for ModelClient and ToolRegistry.

Design Notes

  • common is always available and not feature-gated.
  • ontology has no async surface — all operations are synchronous and Send + Sync.
  • In-memory adapters are intentionally first-class for deterministic tests and local prototyping.
  • Higher-level modules (graphrag, agent) build on lower-level primitives rather than hiding them.
  • APIs favor explicit typed options (SearchOptions, CausalOptions, config structs) over implicit globals.
  • The ontology module is JSON-serialisable end-to-end to support the Ruby native extension boundary without any unsafe code.

Development

Run all tests:

cargo test --features full

Run clippy across all targets/features:

cargo clippy --all-targets --all-features

Run just the ontology module tests:

cargo test --features ontology -p fornix

Notes on Adapters

  • In-memory adapters are ideal for local development, integration tests, and reference behavior.
  • Backend adapters (Postgres/Qdrant/Redis) are exposed as module surfaces; some implementations are currently stubs in this crate layout.
  • For production deployments, verify backend adapter completeness against your selected features before rollout.

Versioning

Current crate version: 0.3.2.

As the module surface is broad and evolving, pin a minor version in production environments and review changelogs before upgrading.

License

MIT