Expand description
§Vectorless
§Vectorless
A hierarchical, reasoning-native document intelligence engine.
Replace your vector database with LLM-powered tree navigation. No embeddings. No vector search. Just reasoning.
§Overview
Traditional RAG systems chunk documents into flat vectors, losing structure. Vectorless preserves your document’s hierarchy and uses an LLM to navigate it — like a human skimming a table of contents, then drilling into relevant sections.
§Architecture
┌─────────────────────────────────────────────────┐
│ USER │
│ (Query / Index) │
└────────────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Engine / EngineBuilder │ │
│ │ (Unified API for Index + Query) │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
│
┌────────────────────────┴────────────────────────┐
│ │
▼ ▼
┌──────────────────────────────────────────────┐ ┌──────────────────────────────────────────────┐
│ INDEX PIPELINE │ │ RETRIEVAL ENGINE │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │ ┌─────────────────────────────────────┐ │
│ │ Parse │─▶│ Build │─▶│ Enhance │ │ │ │ Pilot (LLM) │ │
│ │ (Doc) │ │ (Tree) │ │ (Summaries) │ │ │ │ ┌───────────────────────┐ │ │
│ └─────────┘ └────┬────┘ └──────┬──────┘ │ │ │ │ Navigation Agent │ │ │
│ │ │ │ │ │ │ │ ┌─────┐ ┌─────────┐ │ │ │
│ ▼ ▼ ▼ │ │ │ │ │Decide│▶│Traverse │ │ │ │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │ │ │ │Path │ │ Tree │ │ │ │
│ │ Enrich │─▶│ Optimize│─▶│ Persist │ │ │ │ │ └─────┘ └─────────┘ │ │ │
│ │(Meta) │ │ (Tree) │ │ (Storage) │ │ │ │ └───────────────────────┘ │ │
│ └─────────┘ └─────────┘ └─────────────┘ │ │ └─────────────────────────────────────┘ │
│ │ │ │ │ │
│ │ ┌──────────────────────┐ │ │ ▼ │
│ └────────▶│ Change Detector │◀─────┼───┤ ┌─────────────────────────────────────┐ │
│ │ (Fingerprint-based) │ │ │ │ Context Assembler │ │
│ └──────────────────────┘ │ │ │ ┌─────────┐ ┌─────────────────┐ │ │
│ │ │ │ │ Pruning │ │ Token Budget │ │ │
└──────────────────────────────────────────────┘ │ │ │Strategy │ │ Management │ │ │
│ │ │ └─────────┘ └─────────────────┘ │ │
│ │ └─────────────────────────────────────┘ │
│ │ │ │
▼ │ ▼ │
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ DOMAIN LAYER (Core) │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
│ │ DocumentTree │ │ TreeNode │ │ NodeId │ │
│ │ (Arena-based) │────▶│ - title │ │ (indextree) │ │
│ │ │ │ - content │ │ │ │
│ └───────────────────┘ │ - summary │ └───────────────────┘ │
│ │ │ - depth │ │ │
│ ▼ │ - token_count │ │ │
│ ┌───────────────────┐ └───────────────────┘ │ │
│ │ TocView │ │ │ │
│ │ (Table of │ │ │ │
│ │ Contents) │ │ │ │
│ └───────────────────┘ │ │ │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
│ │
┌─────────────────┴─────────────────────────┴─────────────────┐
│ │
▼ ▼
┌─────────────────────────────────────────┐ ┌─────────────────────────────────────────────────┐
│ SUPPORT LAYER │ │ STORAGE LAYER │
│ │ │ │
│ ┌─────────────┐ ┌──────────────────┐ │ │ ┌────────────────┐ ┌─────────────────────┐ │
│ │ LLM │ │ Parser │ │ │ │ Workspace │ │ MemoStore │ │
│ │ (OpenAI) │ │ - Markdown │ │ │ │ (Persistence) │ │ (LLM Cache) │ │
│ │ │ │ - PDF │ │ │ │ │ │ - LRU Eviction │ │
│ │ ┌─────────┐ │ │ - DOCX │ │ │ │ ┌────────────┐ │ │ - TTL Expiration │ │
│ │ │ Pool │ │ │ │ │ │ │ │ LRU │ │ │ - Disk Persist │ │
│ │ │ Retry │ │ └──────────────────┘ │ │ │ │ Cache │ │ │ │ │
│ │ │ Fallback│ │ │ │ │ └────────────┘ │ └─────────────────────┘ │
│ │ └─────────┘ │ ┌──────────────────┐ │ │ │ │ │
│ └─────────────┘ │ Fingerprint │ │ │ │ ┌────────────┐ │ ┌─────────────────────┐ │
│ │ (BLAKE2b) │ │ │ │ │ Atomic │ │ │ ChangeDetector │ │
│ ┌─────────────┐ │ │ │ │ │ │ Writes │ │ │ (Incremental) │ │
│ │ Config │ │ ┌──────────────┐ │ │ │ │ └────────────┘ │ │ │ │
│ │ Loader │ │ │ Content FP │ │ │ │ │ │ │ ┌─────────────────┐ │ │
│ │ │ │ │ Subtree FP │ │ │ │ └────────────────┘ │ │ Processing Ver │ │ │
│ └─────────────┘ │ │ Node FP │ │ │ │ │ └─────────────────┘ │ │
│ │ └──────────────┘ │ │ │ └─────────────────────┘ │
│ ┌─────────────┐ └──────────────────┘ │ │ │
│ │ Throttle │ │ │ ┌────────────────────────────────────────────┐ │
│ │ (Rate Limit)│ ┌──────────────────┐ │ │ │ DocumentMeta │ │
│ └─────────────┘ │ Throttle │ │ │ │ - content_fingerprint │ │
│ │ (Concurrency) │ │ │ │ - processing_version │ │
│ └──────────────────┘ │ │ │ - node_count, total_summary_tokens │ │
│ │ │ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────┘ └─────────────────────────────────────────────────┘§Data Flow
§Indexing Flow
Document ──▶ Parse ──▶ Build Tree ──▶ Generate Summaries ──▶ Detect Changes ──▶ Persist
│ │ │
│ └──▶ MemoStore ◀───────┘
│ (Cache)
└──▶ Fingerprint ──▶ ChangeDetector§Query Flow
Query ──▶ Pilot Agent ──▶ Navigate Tree ──▶ Assemble Context ──▶ Return Result
│ │ │
└──▶ LLM ◀────────┘ │
(Decide) │
└──▶ MemoStore (Cached Summaries)§Features
- 🌳 Tree-Based Indexing — Documents as hierarchical trees, not flat chunks
- 🧠 LLM Navigation — Reasoning-based traversal to find relevant content
- 🚀 Zero Infrastructure — No vector database, no embedding models
- 📄 Multi-Format — Markdown, PDF, DOCX support
- 💾 Persistent Workspace — LRU-cached storage with lazy loading
- 🔄 Retry & Fallback — Resilient LLM calls with automatic recovery
- 🔍 Incremental Updates — Fingerprint-based change detection
- ⚡ LLM Memoization — Cache summaries and decisions to reduce costs
§Quick Start
use vectorless::{EngineBuilder, Engine};
use vectorless::client::IndexContext;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create client
let client = EngineBuilder::new()
.with_workspace("./workspace")
.build()
.await?;
// Index a document
let doc_id = client.index(IndexContext::from_path("./document.md")).await?;
// Query with natural language
let result = client.query(&doc_id, "What is this about?").await?;
println!("{}", result.content);
Ok(())
}§Modules
| Module | Description |
|---|---|
client | High-level API (Engine, EngineBuilder) |
document | Core domain types (DocumentTree, TreeNode, NodeId) |
index | Document indexing pipeline with incremental updates |
retrieval | Retrieval strategies and LLM-based navigation |
config | Configuration management |
llm | LLM client with retry & fallback |
parser | Document parsers (Markdown, PDF, DOCX) |
storage | Workspace persistence with LRU caching |
throttle | Rate limiting and concurrency control |
[fingerprint] | Content and subtree fingerprinting |
memo | LLM result memoization and caching |
Re-exports§
pub use client::BuildError;pub use client::DocumentInfo;pub use client::Engine;pub use client::EngineBuilder;pub use client::IndexContext;pub use client::IndexMode;pub use client::IndexOptions;pub use client::IndexSource;pub use client::IndexedDocument;pub use error::Error;pub use error::Result;pub use document::DocumentStructure;pub use document::DocumentTree;pub use document::NodeId;pub use document::StructureNode;pub use document::TocConfig;pub use document::TocEntry;pub use document::TocNode;pub use document::TocView;pub use document::TreeNode;pub use utils::estimate_tokens;pub use utils::estimate_tokens_fast;pub use config::Config;pub use config::ConfigLoader;pub use config::RetrievalConfig;pub use config::SummaryConfig;pub use llm::LlmClient;pub use llm::LlmConfig;pub use llm::LlmConfigs;pub use llm::LlmError;pub use llm::LlmPool;pub use llm::RetryConfig;pub use parser::DocumentFormat;pub use parser::DocumentParser;pub use parser::DocxParser;pub use parser::MarkdownParser;pub use parser::ParseResult;pub use parser::PdfParser;pub use parser::RawNode;pub use index::pipeline::CustomStageBuilder;pub use index::pipeline::PipelineOrchestrator;pub use index::ChangeDetector;pub use index::ChangeSet;pub use index::IndexContext as PipelineIndexContext;pub use index::IndexInput;pub use index::IndexMetrics;pub use index::IndexMode as PipelineIndexMode;pub use index::IndexResult;pub use index::IndexStage;pub use index::PartialUpdater;pub use index::PipelineExecutor;pub use index::PipelineOptions;pub use index::SummaryStrategy;pub use retrieval::ContextBuilder;pub use retrieval::PipelineRetriever;pub use retrieval::PruningStrategy;pub use retrieval::QueryComplexity;pub use retrieval::RetrievalContext;pub use retrieval::RetrievalResult;pub use retrieval::RetrieveOptions;pub use retrieval::RetrieveResponse;pub use retrieval::Retriever;pub use retrieval::RetrieverError;pub use retrieval::RetrieverResult;pub use retrieval::SearchPath;pub use retrieval::StrategyPreference;pub use retrieval::SufficiencyLevel;pub use retrieval::TokenEstimation;pub use retrieval::format_for_llm;pub use retrieval::format_for_llm_async;pub use retrieval::format_tree_for_llm;pub use retrieval::format_tree_for_llm_async;pub use storage::DocumentMeta as StorageDocumentMeta;pub use storage::PersistedDocument;pub use storage::Workspace;pub use throttle::ConcurrencyConfig;pub use throttle::ConcurrencyController;pub use throttle::RateLimiter;pub use memo::MemoEntry;pub use memo::MemoKey;pub use memo::MemoOpType;pub use memo::MemoStats;pub use memo::MemoStore;pub use memo::MemoValue;
Modules§
- client
- High-level client API for document indexing and retrieval.
- config
- Configuration management for vectorless.
- document
- Document types - pure data structures for document tree representation.
- error
- Error types for the vectorless library.
- index
- Index Pipeline module.
- llm
- Unified LLM client module.
- memo
- LLM Memoization system for caching expensive LLM calls.
- metrics
- Unified metrics collection for Vectorless.
- parser
- Document parsing module.
- retrieval
- Retrieval system for Vectorless document trees.
- storage
- Storage module for persisting document indices.
- throttle
- Concurrency control for LLM API calls.
- utils
- Utility functions and helpers.