vectorless 0.1.15

Ultra performant document intelligence engine for RAG, with written in Rust. Zero vector database, zero embedding model — just LLM-powered tree navigation. Incremental indexing and multi-format support out-of-box.

Early Development: This project is in active development. The API and features are likely to evolve, and breaking changes may occur.

Why Vectorless?

Traditional RAG systems have a fundamental problem: they lose document structure.

When you chunk a document into vectors, you lose:

The hierarchical relationship between sections
The context of where information lives
The ability to navigate based on reasoning

Vectorless takes a different approach:

It preserves your document's tree structure and uses an LLM to navigate it — just like a human would skim a table of contents, then drill into relevant sections.

Result: More accurate retrieval with zero infrastructure complexity.

How It Works

Architecture

Vectorless preserves your document's hierarchical structure and uses a multi-stage pipeline for intelligent retrieval:

Index Pipeline

Transforms documents into a navigable tree structure:

Parse — Parse documents (Markdown, PDF, DOCX, HTML) into structured content
Build — Construct document tree with metadata
Enhance — Add table of contents and section detection
Enrich — Generate AI summaries for tree nodes
Optimize — Optimize tree structure for efficient retrieval

Retrieval Pipeline

Uses adaptive, multi-stage retrieval with backtracking:

Analyze — Detect query complexity, extract keywords
Plan — Select optimal strategy (keyword/semantic/LLM) and algorithm
Search — Execute tree traversal (greedy/beam/MCTS)
Judge — Evaluate sufficiency, trigger backtracking if needed

This mimics how humans navigate documentation: skim the TOC, drill into relevant sections, and backtrack when needed.

Pilot: The Brain

Pilot is the intelligence layer that guides retrieval:

Intervention Points — Pilot acts at key decision moments:
- START — Analyze query intent, set initial direction
- FORK — Rank candidates at branch points
- BACKTRACK — Suggest alternatives when search fails
- EVALUATE — Assess content sufficiency
Score Merging — Combines algorithm scores with LLM reasoning:
```
final_score = α × algorithm_score + β × llm_score
```
Fallback Strategy — 4-level degradation (Normal → Retry → Simplified → Algorithm-only)
Budget Control — Token and call limits with intelligent allocation

Comparison

Aspect	Vectorless	Traditional RAG
Infrastructure	Zero	Vector DB + Embedding Model
Setup Time	Minutes	Hours to Days
Reasoning	Native navigation	Similarity search only
Document Structure	Preserved	Lost in chunking
Incremental Updates	Supported	Full re-index required
Debugging	Traceable navigation path	Black box similarity scores
Best For	Structured documents	Unstructured text

Installation

Add to your Cargo.toml:

[dependencies]
vectorless = "0.1"

Quick Start

Create a configuration file vectorless.toml in your project root:

cp vectorless.example.toml ./vectorless.toml

Basic usage:

use vectorless::Engine;

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    // Create client
    let client = Engine::builder()
        .with_workspace("./workspace")
        .build()
        .map_err(|e| vectorless::Error::Config(e.to_string()))?;

    // Index a document
    let doc_id = client.index("./document.md").await?;

    // Query
    let result = client.query(&doc_id, "What is this about?").await?;
    println!("{}", result.content);

    Ok(())
}

Examples

See the examples/ directory for complete working examples.

Architecture

Contributing

Contributions are welcome!

If you find this project useful, please consider giving it a star on GitHub — it helps others discover it and supports ongoing development.

Star History

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.