Skip to main content

Crate vectorless

Crate vectorless 

Source
Expand description

§vectorless

Hierarchical, Reasoning-Native Document Intelligence Engine.

A document indexing and retrieval library that uses tree-based navigation instead of vector embeddings for RAG applications.

§Features

  • Tree-Based Indexing — Documents organized as hierarchical trees
  • LLM Navigation — Intelligent traversal using LLM to find relevant content
  • No Vector Database — Eliminates infrastructure complexity
  • Multiple Formats — Support for Markdown, PDF, HTML, and more

§Quick Start

use vectorless::core::{DocumentTree, TreeNode};

// Create a document tree
let mut tree = DocumentTree::new("Root", "Root content");

// Add children
let root = tree.root();
let child = tree.add_child(root, "Section 1", "Content for section 1");

// Navigate the tree
for node_id in tree.children(root) {
    if let Some(node) = tree.get(node_id) {
        println!("Title: {}", node.title);
    }
}

§Architecture

The crate is organized into the following modules:

  • core — Core types: TreeNode, DocumentTree, NodeId
  • llm — Unified LLM client with retry support
  • concurrency — Rate limiting and concurrency control
  • document — Document parsing: Markdown, PDF, HTML
  • indexer — Index building: tree construction, thinning, merging
  • summarizer — Summary generation
  • retriever — Retrieval strategies
  • ranking — Result ranking
  • storage — Persistence and caching
  • client — High-level API

Re-exports§

pub use core::DocumentTree;
pub use core::DocumentStructure;
pub use core::NodeId;
pub use core::StructureNode;
pub use core::TreeNode;
pub use core::Error;
pub use core::Result;
pub use core::Retriever;
pub use config::Config;
pub use config::ConfigLoader;
pub use config::SummaryConfig;
pub use config::RetrievalConfig;
pub use llm::LlmClient;
pub use llm::LlmConfig;
pub use llm::LlmConfigs;
pub use llm::LlmError;
pub use llm::LlmPool;
pub use llm::RetryConfig;
pub use concurrency::ConcurrencyConfig;
pub use concurrency::ConcurrencyController;
pub use concurrency::RateLimiter;
pub use document::DocumentParser;
pub use document::DocumentFormat;
pub use document::MarkdownParser;
pub use document::RawNode;
pub use document::ParseResult;
pub use summarizer::summarize;
pub use indexer::TreeBuilder;
pub use storage::Workspace;
pub use storage::PersistedDocument;
pub use storage::DocumentMeta as StorageDocumentMeta;
pub use client::Vectorless;
pub use client::VectorlessBuilder;
pub use client::IndexedDocument;
pub use client::DocumentInfo;
pub use retriever::LlmNavigator;
pub use retriever::RetrieveOptions;
pub use retriever::RetrievalResult;
pub use retriever::ContextBuilder;
pub use registry::ParserRegistry;
pub use registry::SummarizerRegistry;
pub use registry::RetrieverRegistry;
pub use ranking::Scorer;
pub use ranking::Merger;
pub use ranking::ScoredResult;
pub use ranking::ScoringStrategy;
pub use ranking::MergeStrategy;
pub use token::estimate_tokens;
pub use token::estimate_tokens_fast;

Modules§

client
High-level client API for document indexing and retrieval.
concurrency
Concurrency control for LLM API calls.
config
Configuration management for vectorless.
core
Core module containing fundamental types and traits.
document
Document parsing module.
indexer
Document indexing module.
llm
Unified LLM client module.
ranking
Result ranking and merging module.
registry
Registry module for managing pluggable components.
retriever
Document retrieval strategies.
storage
Storage module for persisting document indices.
summarizer
Document summarization module.
token
Unified token estimation module.