DistX
DistX does not store vectors that represent objects.
It stores objects, and derives vectors from their structure.
A high-performance vector database with the Similarity Contract — a schema-driven approach to structured similarity search that is deterministic, explainable, and requires no external ML.
The Similarity Contract
The schema is not just configuration — it's a contract that governs:
| Aspect | What the Schema Controls |
|---|---|
| Ingest | How objects are converted to vectors (deterministic, reproducible) |
| Query | How similarity is computed across multiple field types |
| Ranking | How results are scored with structured distance functions |
| Explainability | How each field contributes to the final score |
This is an architectural difference, not just an API feature. You cannot replicate this with Qdrant hybrid queries without replicating half this codebase client-side.
What DistX Is (and Is Not)
| DistX IS | DistX is NOT |
|---|---|
| A contract-based similarity engine | A neural embedding model |
| Deterministic and reproducible | A probabilistic LLM system |
| Designed for structured/tabular data | A black-box recommender |
| Fully explainable (per-field scores) | Dependent on external ML APIs |
Target domains: ERP, e-commerce, CRM, financial data, any tabular dataset.
How It Works
┌──────────────────────────────────────────────────────────────────────────┐
│ Traditional Vector Database │
│ ─────────────────────────── │
│ Your Data → External ML API → Embeddings → Vector DB → Score: 0.87 │
│ (cost per call) (black box) (unexplained) │
│ (model drift) (retraining) (no breakdown) │
├──────────────────────────────────────────────────────────────────────────┤
│ DistX with Similarity Contract │
│ ────────────────────────────── │
│ Your Data → Schema (JSON) → Deterministic → Explainable Results │
│ (contract) (no drift) (name: 0.25, price: 0.22) │
│ (stable) (reproducible) (auditable) │
└──────────────────────────────────────────────────────────────────────────┘
📖 Detailed comparison with Qdrant, Pinecone, Elasticsearch →
Similarity Contract Engine
The first schema-driven structured similarity engine with built-in explainability.
Define a Similarity Contract, insert your data, and query by example — vectors are derived automatically from object structure. No external ML, no embedding pipelines, no black-box scores.
# 1. Define similarity schema
# 2. Insert data (vectors auto-generated)
# 3. Query by example
Response includes per-field explainability:
┌──────┬─────────────────────────────┬───────┬──────────────────────────────────┐
│ Rank │ Product │ Score │ Contribution Breakdown │
├──────┼─────────────────────────────┼───────┼──────────────────────────────────┤
│ 1 │ Prosciutto di Parma DOP │ 0.71 │ name: 0.22, price: 0.22 │
│ 2 │ Prosciutto cotto │ 0.68 │ name: 0.25, category: 0.20 │
│ 3 │ Coppa di Parma │ 0.53 │ category: 0.20, price: 0.25 │
└──────┴─────────────────────────────┴───────┴──────────────────────────────────┘
Key Capabilities
| Capability | Description |
|---|---|
| Schema-Driven | Declarative field definitions with typed similarity (text, number, categorical, boolean) |
| Auto-Embedding | Deterministic vector generation from structured payloads |
| Query by Example | Natural JSON queries instead of raw vectors |
| Explainable Scoring | Per-field contribution breakdown for every result |
| Dynamic Weights | Override field importance at query time without re-indexing |
| Zero External Dependencies | Fully self-contained, works offline and air-gapped |
What You Can Do That Qdrant Cannot
Example 1: Change Similarity Semantics Without Re-embedding
# Same data, different meaning of "similar" — no re-indexing required
# Query 1: "Find similar products" (balanced)
# Query 2: "Find cheaper alternatives" (boost price)
# Query 3: "Find same brand, any price" (boost brand)
In Qdrant: You would need to re-embed everything or build complex client-side logic.
Example 2: Same Schema, Different Datasets
# One Similarity Contract works across domains:
# Products
}
# Suppliers
}
# Financial Assets
}
# Same schema, same queries, same explainability — across all datasets
This is not product-specific. The Similarity Contract is domain-agnostic.
📖 Documentation · Interactive Demo · Comparison with Alternatives
100% Qdrant API Compatible
DistX maintains full compatibility with the Qdrant API, so you can:
- ✅ Use existing Qdrant client libraries (Python, JavaScript, Rust, Go)
- ✅ Drop-in replace Qdrant in your stack
- ✅ Use Qdrant's Web Dashboard UI
- ✅ Migrate with zero code changes
The Similarity Engine is additive — all standard vector operations work exactly like Qdrant:
# Standard Qdrant-compatible vector search still works!
Quick Start
1. Start DistX with Docker
# Pull and run (with persistent storage)
# Or with docker-compose
DistX is now running at:
- REST API: http://localhost:6333
- Web Dashboard: http://localhost:6333/dashboard
- gRPC: localhost:6334
2. Create a Collection with Similarity Schema
3. Insert Data (No Vectors Needed!)
4. Query by Example
# Find products similar to "prosciutto crudo around $8"
Response with explainable scores:
5. Dynamic Weight Overrides
# Find cheaper alternatives (boost price importance)
6. Query by Existing Point ID
# Find products similar to ID 4 (Parmigiano Reggiano)
7. Run the Interactive Demo
# Full demo with sample data
# Or run specific demos
Alternative: Traditional Vector Search
DistX also supports standard Qdrant-compatible vector operations:
# Create collection with vectors
# Insert with vectors
# Vector search
Installation Alternatives
# From crates.io
# From source
&&
Performance
| Metric | Performance |
|---|---|
| Vector Insert | ~8,000 ops/sec |
| Vector Search | ~400-500 ops/sec |
| Search Latency (p50) | ~2ms |
| Search Latency (p99) | ~5ms |
| Similarity Query | <1ms overhead |
Benchmarks: 5,000 vectors, 128 dimensions, Cosine distance
All Features
Similarity Engine (NEW)
- Schema-driven similarity — Define what fields matter
- Auto-embedding — Vectors generated from payload
- Multi-type support — Text, number, categorical, boolean
- Explainable results — Per-field score breakdown
- Dynamic weights — Override at query time
Vector Database
- HNSW Index — Fast ANN with SIMD (AVX2, SSE, NEON)
- BM25 Text Search — Full-text ranking
- Payload Filtering — JSON metadata queries
- Dual API — REST + gRPC
- Persistence — WAL, snapshots, LMDB
Operations
- Single Binary — ~6MB, no dependencies
- Docker Ready — Single command deployment
- Web Dashboard — Qdrant-compatible UI
Documentation
| Guide | Description |
|---|---|
| Similarity Engine | Schema-driven similarity for tabular data |
| Similarity Demo | Interactive walkthrough with examples |
| Comparison | DistX vs Qdrant, Pinecone, Elasticsearch |
| Quick Start | Get started in 5 minutes |
| Docker Guide | Container deployment |
| API Reference | REST and gRPC endpoints |
| Architecture | System design |
Use Cases
🛒 E-Commerce & Retail
Problem: "Show me products similar to this one" — but similarity means different things (style, price, brand).
# Similar products for "customers also viewed"
# Budget alternatives (boost price importance)
Use cases:
- "Similar products" on product pages
- "You might also like" recommendations
- Competitor price matching (find similar products, compare prices)
- Inventory substitution (out of stock → suggest alternatives)
🏭 ERP & Supply Chain
Problem: Find the best supplier match based on multiple criteria without building ML pipelines.
# Find suppliers similar to your top performer
Use cases:
- Supplier discovery and matching
- Vendor risk assessment (find similar vendors to flagged ones)
- Partner recommendations
- RFQ (Request for Quote) matching
👥 CRM & Customer Data
Problem: Find similar customers for segmentation, lead scoring, or churn prediction.
# Find customers similar to your best ones
# Find leads similar to closed-won deals
Use cases:
- Lead scoring (similar to converted leads?)
- Customer segmentation
- Churn prediction (similar to churned customers?)
- Account-based marketing (find lookalike companies)
🔍 Data Quality & Deduplication
Problem: Find duplicate or near-duplicate records without exact matching.
# Find potential duplicates
# Response shows WHY records might be duplicates
# → name: 0.35 (similar names)
# → company: 0.25 (same company)
# → email: 0.15 (different email domain)
Use cases:
- Contact/account deduplication
- Data cleansing before migration
- Master data management
- Merge candidate identification
📊 Data Analysis & Exploration
Problem: Explore datasets by finding similar records without writing complex SQL.
# "Find transactions similar to this suspicious one"
# "Find properties similar to this sold one"
Use cases:
- Fraud pattern detection
- Anomaly investigation
- Comparable analysis (real estate, finance)
- Research dataset exploration
⚖️ Regulated Industries (Finance, Healthcare, Legal)
Problem: Need similarity search with full auditability — can't use black-box ML.
Why DistX:
- Explainable scores — Per-field contribution breakdown
- Deterministic — Same query always returns same explanation
- Auditable — Schema defines what matters, weights are transparent
- No external APIs — Data never leaves your infrastructure
# Healthcare: Find similar patient cases
# Response includes full explanation for audit trail:
# {
# "score": 0.78,
# "explain": {
# "diagnosis_code": 0.35, ← Same diagnosis
# "age_group": 0.25, ← Same age bracket
# "comorbidities": 0.18 ← Similar complexity
# }
# }
Use cases:
- Clinical trial patient matching
- Insurance claim similarity
- Legal case precedent search
- Compliance reporting
🏠 Real Estate & Property
# Find comparable properties for valuation
Use cases:
- Comparable property analysis (comps)
- Property valuation
- Investment opportunity matching
- Tenant-property matching
🎯 HR & Recruiting
# Find candidates similar to your top performers
Use cases:
- Candidate matching to job requirements
- Internal mobility (find similar roles)
- Team composition analysis
- Succession planning
Use as a Library
[]
= "0.2.5"
= "0.2.5" # Similarity Engine
= "0.2.5" # Core data structures
use ;
use HashMap;
// Define schema
let mut fields = new;
fields.insert;
fields.insert;
fields.insert;
let schema = new;
let embedder = new;
// Auto-generate vector from payload
let payload = json!;
let vector = embedder.embed;
Links
- Crates.io: https://crates.io/crates/distx
- Documentation: https://docs.rs/distx
- GitHub: https://github.com/antonellof/distx
License
Licensed under MIT OR Apache-2.0 at your option.