HAT: Hierarchical Attention Tree
A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.
Also: A new database paradigm for any domain with known hierarchy + semantic similarity.
Architecture
HAT exploits the known hierarchy in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall.
Key Results
| Metric | HAT | HNSW | Improvement |
|---|---|---|---|
| Recall@10 | 100% | 70% | +30% |
| Build Time | 30ms | 2.1s | 70x faster |
| Query Latency | 3.1ms | - | Production-ready |
Benchmarked on hierarchically-structured AI conversation data
Recall Comparison
HAT achieves 100% recall where HNSW achieves only ~70% on hierarchically-structured data.
Build Time
HAT builds indexes 70x faster than HNSW - critical for real-time applications.
The Problem
Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history.
Current solutions fall short:
- Longer context models: Expensive to train and run
- Summarization: Lossy compression that discards detail
- RAG retrieval: Re-embeds and recomputes attention every query
The HAT Solution
HAT exploits known structure in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy:
Session (conversation boundary)
└── Document (topic or turn)
└── Chunk (individual message)
The Hippocampus Analogy
HAT mirrors human memory architecture - functioning as an artificial hippocampus for AI systems.
How It Works
Beam Search Query
HAT uses beam search through the hierarchy:
1. Start at root
2. At each level, score children by cosine similarity to query
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
Complexity: O(b · d · c) = O(log n) when balanced
Consolidation Phases
Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation.
Scale Performance
HAT maintains 100% recall across all tested scales while HNSW degrades significantly.
| Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 |
|---|---|---|---|---|
| 500 | 16ms | 1.0s | 100% | 55% |
| 1000 | 25ms | 2.0s | 100% | 44.5% |
| 2000 | 50ms | 4.3s | 100% | 67.5% |
| 5000 | 127ms | 11.9s | 100% | 55% |
End-to-End Pipeline
Core Claim
A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.
| Messages | Tokens | Context % | Recall | Latency | Memory |
|---|---|---|---|---|---|
| 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB |
| 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB |
Quick Start
Python
# Create index (1536 dimensions for OpenAI embeddings)
=
# Add messages with automatic hierarchy
# Returns ID
# Session/document management
# Start new conversation
# Start new topic
# Query
=
# Persistence
=
Rust
use ;
// Create index
let config = default;
let mut index = new;
// Add points
let id = index.add;
// Query
let results = index.search;
Installation
Python
From Source (Rust)
Python Development
Project Structure
hat/
├── src/ # Rust implementation
│ ├── lib.rs # Library entry point
│ ├── index.rs # HatIndex implementation
│ ├── container.rs # Tree node types
│ ├── consolidation.rs # Background maintenance
│ └── persistence.rs # Save/load functionality
├── python/ # Python bindings (PyO3)
│ └── arms_hat/ # Python package
├── benchmarks/ # Performance comparisons
├── examples/ # Usage examples
├── paper/ # Research paper (PDF)
├── images/ # Figures and diagrams
└── tests/ # Test suite
Reproducing Results
# Run HAT vs HNSW benchmark
# Run real embedding dimension tests
# Run persistence tests
# Run end-to-end LLM demo
When to Use HAT
HAT is ideal for:
- AI conversation memory (chatbots, agents)
- Session-based retrieval systems
- Any hierarchically-structured vector data
- Systems requiring deterministic behavior
- Cold-start scenarios (no training needed)
Use HNSW instead for:
- Unstructured point clouds (random embeddings)
- Static knowledge bases (handbooks, catalogs)
- When approximate recall is acceptable
Beyond AI Memory: A New Database Paradigm
HAT represents a fundamentally new approach to indexing: exploiting known structure rather than learning it.
| Database Type | Structure | Semantics |
|---|---|---|
| Relational | Explicit (foreign keys) | None |
| Document | Implicit (nesting) | None |
| Vector (HNSW) | Learned from data | Yes |
| HAT | Explicit + exploited | Yes |
Traditional vector databases treat embeddings as unstructured point clouds, spending compute to discover topology. HAT inverts this: known hierarchy is free information - use it.
General Applications
Any domain with hierarchical structure + semantic similarity benefits from HAT:
- Legal/Medical Documents: Case → Filing → Paragraph → Sentence
- Code Search: Repository → Module → Function → Line
- IoT/Sensor Networks: Facility → Zone → Device → Reading
- E-commerce: Catalog → Category → Product → Variant
- Research Corpora: Journal → Paper → Section → Citation
The Core Insight
"Position IS relationship. No foreign keys needed - proximity defines connection."
HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch.
Citation
Paper
License
MIT License - see LICENSE for details.
Links
- Research Site: research.automate-capture.com/hat
- Main Site: automate-capture.com