# fornix
Knowledge storage, retrieval, and graph infrastructure for cognitive systems.
fornix is a modular Rust library for building retrieval-heavy and agentic systems. It combines vector search, BM25 search, hybrid fusion, ontology-constrained extraction, graph reasoning, GraphRAG, routing, prompt tuning, and an autonomous tool-using agent runtime behind feature flags so you can enable only what you need.
## Who This Is For
- Teams building RAG pipelines that need both lexical and semantic retrieval.
- Systems that need graph-aware retrieval and causal traversal.
- Agent runtimes that require strict policy controls around tool usage.
- Projects that want domain-constrained entity extraction with schema validation.
- Projects that prefer composable, feature-gated crates over monolithic frameworks.
## Highlights
- Feature-gated architecture for lean builds.
- Pure in-memory adapters for fast local development and testing.
- Typed, composable APIs across storage, retrieval, graph, and agent layers.
- Ontology-constrained extraction: validated entity/relation types, alias resolution, per-type LLM prompt guidance.
- Built-in evaluation, filtering, and query-gap tracking for RAG workflows.
- End-to-end tested with unit tests plus doctests.
## Installation
Add the crate with only the features you want.
```toml
[dependencies]
fornix = { version = "0.3", features = ["vector", "bm25", "hybrid"] }
```
Or enable everything:
```toml
[dependencies]
fornix = { version = "0.3", features = ["full"] }
```
## Pick Features by Use Case
- Basic vector search: `vector`
- Keyword search: `bm25`
- Hybrid retrieval (vector + BM25): `vector`, `bm25`, `hybrid`
- Ontology schema: `ontology`
- Graph knowledge layer: `graph`
- Graph + ontology validation: `graph`, `ontology`
- Graph-augmented retrieval with schema: `graphrag` (pulls in `graph`, `rag`, `hybrid`, `ontology`)
- Prompt optimization: `rag`, `tuner`
- Agent runtime with routing: `router`, `agent`
## Feature Flags
The crate is layered; higher-level modules depend on lower-level ones:
- `store`: base adapter traits, config, health, and error types.
- `cache`: caching adapters (`store`).
- `vector`: vector adapters and analysis (`store`).
- `bm25`: keyword retrieval adapters (`store`).
- `hybrid`: fused vector + BM25 retrieval (`vector`, `bm25`).
- `ontology`: domain-aware type schemas with alias resolution, validation, and prompt construction (`store`).
- `graph`: knowledge graph + temporal/causal APIs (`store`).
- `rag`: chunking, filters, eval, reranking (`hybrid`).
- `graphrag`: graph-augmented retrieval with ontology-constrained extraction (`graph`, `rag`, `hybrid`, `ontology`).
- `router`: model routing strategies + forest (`cache`).
- `diff`: boundary-aware textual diff snippets.
- `tuner`: prompt optimization strategies (`rag`).
- `agent`: recursive tool-using agent runtime (`router`).
- `full`: enables all modules.
The `common` module is always available.
## Module Overview
- `store`: foundational traits such as `StorageAdapter` and `AdapterFactory`.
- `cache`: memory/null cache adapters and deterministic cache keying.
- `vector`: vector storage, nearest-neighbor search, and embedding analytics.
- `bm25`: Okapi BM25 tokenization + scoring + indexing.
- `hybrid`: weighted fusion (`Rrf`, `Linear`) and confidence scoring.
- `ontology`: domain-aware type schemas — `Definition`, `EntityTypeDefinition`, `RelationTypeDefinition`, `PropertyDefinition`, `OntologyValidator`, `OntologyPrompt`, `MemoryOntologyRegistry`, alignment types. Mirrors `cortex-ontology`; JSON-serialisable for the Ruby native extension boundary.
- `graph`: entity/relation graph with bitemporal semantics and causal traversal. When the `ontology` feature is also enabled, `GraphConfig` accepts an `ontology` and `ontology_strict` flag; `OntologyViolation` is the error raised in strict mode.
- `graphrag`: local/global/hybrid graph search modes. Always pulls in `ontology`. `GraphRagConfig` accepts `ontology: Option<Arc<Definition>>`; `effective_entity_types()` / `effective_relation_types()` derive type lists from the ontology when set.
- `rag`: chunkers, post-filters, rerankers, evaluation metrics, query-gap tracker.
- `router`: regex, round-robin, weighted random, embedding-threshold, and RoRF routing.
- `diff`: focused and stitched snippets with change markers.
- `tuner`: MIPROv2, GEPA, and no-op prompt tuning.
- `agent`: solve loop with tool execution, recursion, policy controls, and token budgeting.
## Quick Start
Most adapters are async. Use a Tokio runtime in your app:
```rust,no_run
use fornix::vector::{VectorAdapter, VectorConfig, adapters::MemoryVectorAdapter};
use fornix::vector::adapter::SearchOptions;
#[tokio::main]
async fn main() {
let adapter = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2))
.await
.unwrap();
adapter.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();
let results = adapter
.nearest_neighbors(&[1.0, 0.0], None, SearchOptions::default())
.await
.unwrap();
println!("top id: {}", results[0].id);
}
```
### Ontology-Constrained GraphRAG
```rust,no_run
use std::sync::Arc;
use fornix::ontology::{Definition, EntityTypeDefinition, RelationTypeDefinition,
PropertyDefinition, OntologyValidator, OntologyPrompt};
use fornix::graphrag::GraphRagConfig;
fn main() {
// Build an ontology
let mut def = Definition::new("regulatory");
def.version = Some("1.0.0".to_string());
def.entity_types.push(EntityTypeDefinition {
name: "Regulation".to_string(),
description: Some("A codified rule in the CFR.".to_string()),
extraction_strategy: Some("llm".to_string()),
extraction_patterns: Vec::new(),
aliases: vec!["Provision".to_string()],
properties: vec![PropertyDefinition::required("cfr_citation", "string")],
});
def.relation_types.push(RelationTypeDefinition {
name: "ISSUED_BY".to_string(),
description: None,
source_types: vec!["Regulation".to_string()],
target_types: vec!["Agency".to_string()],
properties: Vec::new(),
});
let ont = Arc::new(def);
// Validate during extraction
let validator = OntologyValidator::new(&ont);
assert!(validator.known_entity_type("Regulation"));
assert_eq!(validator.canonical_entity_type("Provision"), Some("Regulation"));
// Build per-type prompt guidance
let prompt = OntologyPrompt::build_entity_prompt(&ont, "Regulation").unwrap();
println!("{}", prompt);
// Configure GraphRAG to use the ontology
let config = GraphRagConfig {
ontology: Some(Arc::clone(&ont)),
..Default::default()
};
// effective_entity_types() now derives from the ontology
let types = config.effective_entity_types();
assert!(types.contains(&"Regulation".to_string()));
assert!(!types.contains(&"Person".to_string())); // default fallback not used
}
```
### Ontology-Validated Graph Writes
```rust,no_run
use std::sync::Arc;
use fornix::ontology::Definition;
use fornix::graph::GraphConfig;
fn main() {
let mut def = Definition::new("regulatory");
def.version = Some("1.0.0".to_string());
// ... populate entity/relation types ...
let config = GraphConfig {
ontology: Some(Arc::new(def)),
ontology_strict: true, // violations raise OntologyViolation
..Default::default()
};
// Call config.validate_entity_type("SomeType") before create_entity
// to get the canonical name or an OntologyViolation error.
match config.validate_entity_type("Provision") {
Ok(canonical) => println!("canonical: {}", canonical), // "Regulation"
Err(e) => eprintln!("violation: {}", e),
}
}
```
### JSON Boundary (Ruby Native Extension)
`Definition` serialises cleanly for the Ruby–Rust boundary:
```rust,no_run
use fornix::ontology::Definition;
let json = my_definition.to_json().unwrap();
// Pass JSON string to Ruby via Magnus; deserialise on the way back:
let def = Definition::from_json(&json).unwrap();
```
### Hybrid Retrieval
```rust,no_run
use fornix::bm25::{adapters::MemoryBm25Adapter, adapter::IndexDocument, Bm25Adapter, Bm25Config};
use fornix::vector::{adapters::MemoryVectorAdapter, VectorAdapter, VectorConfig};
use fornix::hybrid::{HybridConfig, HybridSearch, search::HybridSearchOptions};
#[tokio::main]
async fn main() {
let bm25 = MemoryBm25Adapter::connect(Bm25Config::default()).await.unwrap();
let vector = MemoryVectorAdapter::connect(VectorConfig::with_dimension(2)).await.unwrap();
bm25.index(IndexDocument::new("doc-1", "rust systems programming"), None)
.await
.unwrap();
vector.upsert("doc-1", vec![1.0, 0.0], None, None).await.unwrap();
let search = HybridSearch::new(bm25, vector, HybridConfig::default());
let results = search
.search("rust", &[1.0, 0.0], None, HybridSearchOptions::new())
.await
.unwrap();
println!("hybrid top: {}", results[0].id);
}
```
### Graph + Causal Traversal
```rust,no_run
use fornix::graph::{adapters::MemoryGraphAdapter, GraphAdapter, GraphConfig};
use fornix::graph::adapter::CausalOptions;
#[tokio::main]
async fn main() {
let graph = MemoryGraphAdapter::connect(GraphConfig::default()).await.unwrap();
let rain = graph.create_entity("Heavy Rain", "Weather", None, None).await.unwrap();
let flood = graph.create_entity("Flooding", "Event", None, None).await.unwrap();
graph.create_relation(rain.id, flood.id, "CAUSES", None, None)
.await
.unwrap();
let paths = graph
.causal_descendants(rain.id, CausalOptions::default(), None)
.await
.unwrap();
println!("first relation: {}", paths[0].edges[0].relation_type);
}
```
## Agent Runtime
The `agent` module provides an autonomous solve loop that can:
- call an injected model client,
- dispatch registered tools,
- recurse through sub-tasks,
- enforce policy constraints,
- track token/time/step budgets,
- compact memory when context grows.
You provide implementations for `ModelClient` and `ToolRegistry`.
## Design Notes
- `common` is always available and not feature-gated.
- `ontology` has no async surface — all operations are synchronous and `Send + Sync`.
- In-memory adapters are intentionally first-class for deterministic tests and local prototyping.
- Higher-level modules (`graphrag`, `agent`) build on lower-level primitives rather than hiding them.
- APIs favor explicit typed options (`SearchOptions`, `CausalOptions`, config structs) over implicit globals.
- The `ontology` module is JSON-serialisable end-to-end to support the Ruby native extension boundary without any unsafe code.
## Development
Run all tests:
```bash
cargo test --features full
```
Run clippy across all targets/features:
```bash
cargo clippy --all-targets --all-features
```
Run just the ontology module tests:
```bash
cargo test --features ontology -p fornix
```
## Notes on Adapters
- In-memory adapters are ideal for local development, integration tests, and reference behavior.
- Backend adapters (Postgres/Qdrant/Redis) are exposed as module surfaces; some implementations are currently stubs in this crate layout.
- For production deployments, verify backend adapter completeness against your selected features before rollout.
## Versioning
Current crate version: `0.3.2`.
As the module surface is broad and evolving, pin a minor version in production environments and review changelogs before upgrading.
## License
MIT