fornix 0.4.0

Knowledge storage, retrieval, and graph infrastructure for cognitive systems
Documentation
# fornix

Knowledge storage, retrieval, and graph infrastructure for cognitive systems.

fornix is a modular Rust library for building retrieval-heavy and agentic systems. It combines vector search, BM25 search, hybrid fusion, ontology-constrained extraction, graph reasoning, GraphRAG, routing, prompt tuning, and an autonomous tool-using agent runtime behind feature flags so you can enable only what you need.

## Who This Is For

- Teams building RAG pipelines that need both lexical and semantic retrieval.
- Systems that need graph-aware retrieval and causal traversal.
- Agent runtimes that require strict policy controls around tool usage.
- Projects that want domain-constrained entity extraction with schema validation.
- Projects that prefer composable, feature-gated crates over monolithic frameworks.

## Highlights

- Feature-gated architecture for lean builds.
- Pure in-memory adapters for fast local development and testing.
- Typed, composable APIs across storage, retrieval, graph, and agent layers.
- Ontology-constrained extraction: validated entity/relation types, alias resolution, per-type LLM prompt guidance.
- Built-in evaluation, filtering, and query-gap tracking for RAG workflows.
- End-to-end tested with unit tests plus doctests.

## Installation

Add the crate with only the features you want.

```toml
[dependencies]
fornix = { version = "0.3", features = ["vector", "bm25", "hybrid"] }
```

Or enable everything:

```toml
[dependencies]
fornix = { version = "0.3", features = ["full"] }
```

## Pick Features by Use Case

- Basic vector search: `vector`
- Keyword search: `bm25`
- Hybrid retrieval (vector + BM25): `vector`, `bm25`, `hybrid`
- Ontology schema: `ontology`
- Graph knowledge layer: `graph`
- Graph + ontology validation: `graph`, `ontology`
- Graph-augmented retrieval with schema: `graphrag` (pulls in `graph`, `rag`, `hybrid`, `ontology`)
- Prompt optimization: `rag`, `tuner`
- Agent runtime with routing: `router`, `agent`

## Feature Flags

The crate is layered; higher-level modules depend on lower-level ones:

- `store`: base adapter traits, config, health, and error types.
- `cache`: caching adapters (`store`).
- `vector`: vector adapters and analysis (`store`).
- `bm25`: keyword retrieval adapters (`store`).
- `hybrid`: fused vector + BM25 retrieval (`vector`, `bm25`).
- `ontology`: domain-aware type schemas with alias resolution, validation, and prompt construction (`store`).
- `graph`: knowledge graph + temporal/causal APIs (`store`).
- `rag`: chunking, filters, eval, reranking (`hybrid`).
- `graphrag`: graph-augmented retrieval with ontology-constrained extraction (`graph`, `rag`, `hybrid`, `ontology`).
- `router`: model routing strategies + forest (`cache`).
- `diff`: boundary-aware textual diff snippets.
- `tuner`: prompt optimization strategies (`rag`).
- `agent`: recursive tool-using agent runtime (`router`).
- `full`: enables all modules.

The `common` module is always available.

## Module Overview

- `store`: foundational traits such as `StorageAdapter` and `AdapterFactory`.
- `cache`: memory/null cache adapters and deterministic cache keying.
- `vector`: vector storage, nearest-neighbor search, and embedding analytics.
- `bm25`: Okapi BM25 tokenization + scoring + indexing.
- `hybrid`: weighted fusion (`Rrf`, `Linear`) and confidence scoring.
- `ontology`: domain-aware type schemas — `Definition`, `EntityTypeDefinition`, `RelationTypeDefinition`, `PropertyDefinition`, `OntologyValidator`, `OntologyPrompt`, `MemoryOntologyRegistry`, alignment types. Mirrors `cortex-ontology`; JSON-serialisable for the Ruby native extension boundary.
- `graph`: entity/relation graph with bitemporal semantics and causal traversal. When the `ontology` feature is also enabled, `GraphConfig` accepts an `ontology` and `ontology_strict` flag; `OntologyViolation` is the error raised in strict mode.
- `graphrag`: local/global/hybrid graph search modes. Always pulls in `ontology`. `GraphRagConfig` accepts `ontology: Option<Arc<Definition>>`; `effective_entity_types()` / `effective_relation_types()` derive type lists from the ontology when set.
- `rag`: chunkers, post-filters, rerankers, evaluation metrics, query-gap tracker.
- `router`: regex, round-robin, weighted random, embedding-threshold, and RoRF routing.
- `diff`: focused and stitched snippets with change markers.
- `tuner`: MIPROv2, GEPA, and no-op prompt tuning.
- `agent`: solve loop with tool execution, recursion, policy controls, and token budgeting.

## Quick Start

Most adapters are async. Use a Tokio runtime in your app:

```rust,no_run
use fornix::vector::{VectorAdapter, VectorConfig, adapters::MemoryVectorAdapter};
use fornix::vector::adapter::SearchOptions;

#[tokio::main]
async fn main() {
    let adapter = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2))
        .await
        .unwrap();

    adapter.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let results = adapter
        .nearest_neighbors(&[1.0, 0.0], None, SearchOptions::default())
        .await
        .unwrap();

    println!("top id: {}", results[0].id);
}
```

### Ontology-Constrained GraphRAG

```rust,no_run
use std::sync::Arc;
use fornix::ontology::{Definition, EntityTypeDefinition, RelationTypeDefinition,
                       PropertyDefinition, OntologyValidator, OntologyPrompt};
use fornix::graphrag::GraphRagConfig;

fn main() {
    // Build an ontology
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    def.entity_types.push(EntityTypeDefinition {
        name: "Regulation".to_string(),
        description: Some("A codified rule in the CFR.".to_string()),
        extraction_strategy: Some("llm".to_string()),
        extraction_patterns: Vec::new(),
        aliases: vec!["Provision".to_string()],
        properties: vec![PropertyDefinition::required("cfr_citation", "string")],
    });
    def.relation_types.push(RelationTypeDefinition {
        name: "ISSUED_BY".to_string(),
        description: None,
        source_types: vec!["Regulation".to_string()],
        target_types: vec!["Agency".to_string()],
        properties: Vec::new(),
    });

    let ont = Arc::new(def);

    // Validate during extraction
    let validator = OntologyValidator::new(&ont);
    assert!(validator.known_entity_type("Regulation"));
    assert_eq!(validator.canonical_entity_type("Provision"), Some("Regulation"));

    // Build per-type prompt guidance
    let prompt = OntologyPrompt::build_entity_prompt(&ont, "Regulation").unwrap();
    println!("{}", prompt);

    // Configure GraphRAG to use the ontology
    let config = GraphRagConfig {
        ontology: Some(Arc::clone(&ont)),
        ..Default::default()
    };

    // effective_entity_types() now derives from the ontology
    let types = config.effective_entity_types();
    assert!(types.contains(&"Regulation".to_string()));
    assert!(!types.contains(&"Person".to_string())); // default fallback not used
}
```

### Ontology-Validated Graph Writes

```rust,no_run
use std::sync::Arc;
use fornix::ontology::Definition;
use fornix::graph::GraphConfig;

fn main() {
    let mut def = Definition::new("regulatory");
    def.version = Some("1.0.0".to_string());
    // ... populate entity/relation types ...

    let config = GraphConfig {
        ontology: Some(Arc::new(def)),
        ontology_strict: true, // violations raise OntologyViolation
        ..Default::default()
    };

    // Call config.validate_entity_type("SomeType") before create_entity
    // to get the canonical name or an OntologyViolation error.
    match config.validate_entity_type("Provision") {
        Ok(canonical) => println!("canonical: {}", canonical), // "Regulation"
        Err(e) => eprintln!("violation: {}", e),
    }
}
```

### JSON Boundary (Ruby Native Extension)

`Definition` serialises cleanly for the Ruby–Rust boundary:

```rust,no_run
use fornix::ontology::Definition;

let json = my_definition.to_json().unwrap();
// Pass JSON string to Ruby via Magnus; deserialise on the way back:
let def = Definition::from_json(&json).unwrap();
```

### Hybrid Retrieval

```rust,no_run
use fornix::bm25::{adapters::MemoryBm25Adapter, adapter::IndexDocument, Bm25Adapter, Bm25Config};
use fornix::vector::{adapters::MemoryVectorAdapter, VectorAdapter, VectorConfig};
use fornix::hybrid::{HybridConfig, HybridSearch, search::HybridSearchOptions};

#[tokio::main]
async fn main() {
    let bm25 = MemoryBm25Adapter::connect(Bm25Config::default()).await.unwrap();
    let vector = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2)).await.unwrap();

    bm25.index(IndexDocument::new("doc-1", "rust systems programming"), None)
        .await
        .unwrap();
    vector.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();

    let search = HybridSearch::new(bm25, vector, HybridConfig::default());
    let results = search
        .search("rust", &[1.0, 0.0], None, HybridSearchOptions::new())
        .await
        .unwrap();

    println!("hybrid top: {}", results[0].id);
}
```

### Graph + Causal Traversal

```rust,no_run
use fornix::graph::{adapters::MemoryGraphAdapter, GraphAdapter, GraphConfig};
use fornix::graph::adapter::CausalOptions;

#[tokio::main]
async fn main() {
    let graph = MemoryGraphAdapter::connect(GraphConfig::default()).await.unwrap();

    let rain  = graph.create_entity("Heavy Rain", "Weather", None, None).await.unwrap();
    let flood = graph.create_entity("Flooding",   "Event",   None, None).await.unwrap();

    graph.create_relation(rain.id, flood.id, "CAUSES", None, None)
        .await
        .unwrap();

    let paths = graph
        .causal_descendants(rain.id, CausalOptions::default(), None)
        .await
        .unwrap();

    println!("first relation: {}", paths[0].edges[0].relation_type);
}
```

## Agent Runtime

The `agent` module provides an autonomous solve loop that can:

- call an injected model client,
- dispatch registered tools,
- recurse through sub-tasks,
- enforce policy constraints,
- track token/time/step budgets,
- compact memory when context grows.

You provide implementations for `ModelClient` and `ToolRegistry`.

## Design Notes

- `common` is always available and not feature-gated.
- `ontology` has no async surface — all operations are synchronous and `Send + Sync`.
- In-memory adapters are intentionally first-class for deterministic tests and local prototyping.
- Higher-level modules (`graphrag`, `agent`) build on lower-level primitives rather than hiding them.
- APIs favor explicit typed options (`SearchOptions`, `CausalOptions`, config structs) over implicit globals.
- The `ontology` module is JSON-serialisable end-to-end to support the Ruby native extension boundary without any unsafe code.

## Development

Run all tests:

```bash
cargo test --features full
```

Run clippy across all targets/features:

```bash
cargo clippy --all-targets --all-features
```

Run just the ontology module tests:

```bash
cargo test --features ontology -p fornix
```

## Notes on Adapters

- In-memory adapters are ideal for local development, integration tests, and reference behavior.
- Backend adapters (Postgres/Qdrant/Redis) are exposed as module surfaces; some implementations are currently stubs in this crate layout.
- For production deployments, verify backend adapter completeness against your selected features before rollout.

## Versioning

Current crate version: `0.3.2`.

As the module surface is broad and evolving, pin a minor version in production environments and review changelogs before upgrading.

## License

MIT