vectorless 0.1.20

Hierarchical, reasoning-native document intelligence engine
Documentation

PyPI Python PyPI Downloads Crates.io Crates.io Downloads Docs License Rust

What is Vectorless?

Vectorless is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.

Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses a hybrid algorithm + LLM approach to navigate it — like how a human reads a table of contents:

  • Pilot (LLM) handles "where to go"
  • Algorithm handles "how to walk"

How It Works

1. Index: Build a Navigable Tree

Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│   ├── 2.1 System Design
│   └── 2.2 Implementation
└── Chapter 3: API Reference

Each node gets an AI-generated summary, enabling fast navigation.

2. Query: Navigate with LLM

When you ask "How do I reset the device?":

  1. Analyze — Understand query intent and complexity
  2. Navigate — LLM guides tree traversal (like reading a TOC)
  3. Retrieve — Return the exact section with context
  4. Verify — Check if more information is needed (backtracking)

Traditional RAG vs Vectorless

Aspect Traditional RAG Vectorless
Infrastructure Vector DB + Embedding Model Just LLM API
Document Structure Lost in chunking Preserved
Context Fragment only Section + surrounding context
Setup Time Hours to Days Minutes
Best For Unstructured text Structured documents

Example

Input:

Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"

Output:

Answer: "To reset the device, hold the power button for 10 seconds 
until the LED flashes blue, then release..."

Source: Chapter 4 > Section 4.2 > Reset Procedure

When to Use

Good fit:

  • Technical documentation
  • Manuals and guides
  • Structured reports
  • Policy documents
  • Any document with clear hierarchy

Not ideal:

  • Unstructured text (tweets, chat logs)
  • Very short documents (< 1 page)
  • Pure Q&A datasets without structure

Quick Start

pip install vectorless
from vectorless import Engine, IndexContext

# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")

# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
[dependencies]
vectorless = "0.1"
cp vectorless.example.toml ./vectorless.toml
use vectorless::Engine;

#[tokio::main]
async fn main() -> vectorless::Result<()> {
    let client = Engine::builder()
        .with_workspace("./workspace")
        .build()?;

    let doc_id = client.index("./document.pdf").await?;

    let result = client.query(&doc_id,
        "What are the system requirements?").await?;

    println!("Answer: {}", result.content);
    println!("Source: {}", result.path);

    Ok(())
}

Features

Feature Description
Zero Infrastructure No vector DB, no embedding model — just an LLM API
Multi-format Support PDF, Markdown, DOCX, HTML out of the box
Incremental Updates Add/remove documents without full re-index
Traceable Results See the exact navigation path taken
Feedback Learning Improves from user feedback over time
Multi-turn Queries Handles complex questions with decomposition

Configuration

Zero Configuration (Recommended)

Just set OPENAI_API_KEY and you're ready to go:

export OPENAI_API_KEY="sk-..."
from vectorless import Engine

# Uses OPENAI_API_KEY from environment
engine = Engine(workspace="./data")
use vectorless::Engine;

let client = Engine::builder()
    .with_workspace("./workspace")
    .build().await?;

Environment Variables

Variable Description
OPENAI_API_KEY LLM API key
VECTORLESS_MODEL Default model (e.g., gpt-4o-mini)
VECTORLESS_ENDPOINT API endpoint URL
VECTORLESS_WORKSPACE Workspace directory

Advanced Configuration

For fine-grained control, use a config file:

cp config.toml ./vectorless.toml
from vectorless import Engine

# Use full configuration file
engine = Engine(config_path="./vectorless.toml")

# Or override specific settings
engine = Engine(
    config_path="./vectorless.toml",
    model="gpt-4o",  # Override model from config
)
use vectorless::Engine;

// Use full configuration file
let client = Engine::builder()
    .with_config_path("./vectorless.toml")
    .build().await?;

// Or override specific settings
let client = Engine::builder()
    .with_config_path("./vectorless.toml")
    .with_model("gpt-4o", None)  // Override model
    .build().await?;

Configuration Priority

Later overrides earlier:

  1. Default configuration
  2. Auto-detected config file (vectorless.toml, config.toml, .vectorless.toml)
  3. Explicit config file (config_path / with_config_path)
  4. Environment variables
  5. Constructor/builder parameters (highest priority)

Architecture

Core Components

  • Index Pipeline — Parses documents, builds tree, generates summaries
  • Retrieval Pipeline — Analyzes query, navigates tree, returns results
  • Pilot — LLM-powered navigator that guides retrieval decisions
  • Metrics Hub — Unified observability for LLM calls, retrieval, and feedback

Examples

See the examples/ directory for more usage patterns.

Contributing

Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.

Star History

License

Apache License 2.0