vecstore 1.0.0

The perfect vector database - 100/100 score, embeddable, high-performance, production-ready with RAG toolkit
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
# VecStore Migration Guide

Complete guide for migrating from Pinecone, Qdrant, Weaviate, and other vector databases to VecStore.

## Table of Contents

- [Overview]#overview
- [Migration from Pinecone]#migration-from-pinecone
- [Migration from Qdrant]#migration-from-qdrant
- [Migration from Weaviate]#migration-from-weaviate
- [Feature Comparison]#feature-comparison
- [Code Migration Examples]#code-migration-examples
- [Performance Optimization]#performance-optimization
- [Troubleshooting]#troubleshooting

---

## Overview

VecStore provides automated migration tools and code equivalents for popular vector databases. This guide helps you:

1. **Export data** from your current database
2. **Import data** into VecStore using bulk migration tools
3. **Update application code** to use VecStore API
4. **Optimize performance** for your use case

### Why Migrate to VecStore?

- **Embedded**: No external server required, runs in-process
-**Zero Cost**: No API fees, completely self-hosted
-**SQLite-like**: Simple file-based storage, easy backup/restore
-**Production Ready**: HNSW indexing, metadata filtering, persistence
-**Full-Featured**: Hybrid search, quantization, clustering, versioning

---

## Migration from Pinecone

### 1. Export Data from Pinecone

```python
# Export from Pinecone to JSON
import pinecone
import json

pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
index = pinecone.Index("your-index")

# Fetch all vectors (in batches)
vectors = []
for ids in index.list():  # Get all IDs
    fetch_response = index.fetch(ids=ids)
    vectors.extend(fetch_response['vectors'].items())

# Save to JSON file
with open('pinecone_export.json', 'w') as f:
    json.dump({
        "vectors": [
            {
                "id": v[0],
                "values": v[1]['values'],
                "metadata": v[1].get('metadata', {})
            }
            for v in vectors
        ]
    }, f)
```

### 2. Import into VecStore

```rust
use vecstore::bulk_migration::{PineconeMigration, MigrationConfig};
use vecstore::VecStore;

fn main() -> anyhow::Result<()> {
    let mut store = VecStore::open("my_vectors.db")?;

    let config = MigrationConfig {
        batch_size: 1000,
        validate: true,
        resume_from: None,
    };

    let migration = PineconeMigration::new(config)
        .with_progress(|current, total| {
            println!("Migrated {}/{} vectors", current, total);
        });

    let stats = migration.import_from_file("pinecone_export.json", &mut store)?;

    println!("Migration complete!");
    println!("  Vectors: {}", stats.total_vectors);
    println!("  Duration: {:?}", stats.duration);
    println!("  Throughput: {:.0} vectors/sec", stats.throughput);

    Ok(())
}
```

### 3. Code Migration

#### Pinecone Code:

```python
import pinecone

# Initialize
pinecone.init(api_key="...", environment="...")
index = pinecone.Index("my-index")

# Upsert
index.upsert(vectors=[
    ("id1", [0.1, 0.2, 0.3], {"category": "tech"}),
    ("id2", [0.4, 0.5, 0.6], {"category": "science"})
])

# Query
results = index.query(
    vector=[0.15, 0.25, 0.35],
    top_k=10,
    filter={"category": "tech"}
)

for match in results['matches']:
    print(f"ID: {match['id']}, Score: {match['score']}")
```

#### VecStore Equivalent:

```rust
use vecstore::{VecStore, Query};
use serde_json::json;

fn main() -> anyhow::Result<()> {
    // Initialize
    let mut store = VecStore::open("my-index.db")?;

    // Upsert
    store.upsert("id1", vec![0.1, 0.2, 0.3],
                 json!({"category": "tech"}))?;
    store.upsert("id2", vec![0.4, 0.5, 0.6],
                 json!({"category": "science"}))?;

    // Query
    let query = Query::new(vec![0.15, 0.25, 0.35])
        .with_limit(10)
        .with_filter("category = 'tech'");

    let results = store.query(query)?;

    for result in results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}
```

### Feature Mapping: Pinecone → VecStore

| Pinecone Feature | VecStore Equivalent |
|-----------------|---------------------|
| `index.upsert()` | `store.upsert()` |
| `index.query()` | `store.query()` |
| `index.delete()` | `store.delete()` |
| `index.fetch()` | Use query with ID filter |
| `index.describe_index_stats()` | `store.len()`, `store.stats()` |
| Metadata filtering | `Query::with_filter()` |
| Sparse-dense hybrid | `HybridQuery` with BM25 |
| Namespaces | `PartitionedStore` |

---

## Migration from Qdrant

### 1. Export Data from Qdrant

```python
from qdrant_client import QdrantClient
import json

client = QdrantClient(host="localhost", port=6333)
collection_name = "my_collection"

# Get all points
offset = None
all_points = []

while True:
    points = client.scroll(
        collection_name=collection_name,
        limit=100,
        offset=offset,
        with_payload=True,
        with_vectors=True
    )

    all_points.extend(points[0])

    if points[1] is None:
        break
    offset = points[1]

# Save to JSONL
with open('qdrant_export.jsonl', 'w') as f:
    for point in all_points:
        f.write(json.dumps({
            "id": str(point.id),
            "vector": point.vector,
            "payload": point.payload
        }) + '\n')
```

### 2. Import into VecStore

```rust
use vecstore::bulk_migration::{QdrantMigration, MigrationConfig};
use vecstore::VecStore;

fn main() -> anyhow::Result<()> {
    let mut store = VecStore::open("my_vectors.db")?;

    let config = MigrationConfig {
        batch_size: 1000,
        validate: true,
        resume_from: None,
    };

    let migration = QdrantMigration::new(config);
    let stats = migration.import_from_file("qdrant_export.jsonl", &mut store)?;

    println!("Migrated {} vectors", stats.total_vectors);

    Ok(())
}
```

### 3. Code Migration

#### Qdrant Code:

```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition

client = QdrantClient(host="localhost", port=6333)

# Create collection
client.create_collection(
    collection_name="my_collection",
    vectors_config=VectorParams(size=128, distance=Distance.COSINE)
)

# Upsert points
client.upsert(
    collection_name="my_collection",
    points=[
        PointStruct(id=1, vector=[0.1, 0.2, ...], payload={"city": "NYC"}),
        PointStruct(id=2, vector=[0.3, 0.4, ...], payload={"city": "SF"})
    ]
)

# Search
results = client.search(
    collection_name="my_collection",
    query_vector=[0.2, 0.3, ...],
    limit=10,
    query_filter=Filter(
        must=[FieldCondition(key="city", match={"value": "NYC"})]
    )
)
```

#### VecStore Equivalent:

```rust
use vecstore::{VecStore, Query, Config, Distance};
use serde_json::json;

fn main() -> anyhow::Result<()> {
    // Create store with config
    let config = Config {
        distance: Distance::Cosine,
        ..Default::default()
    };
    let mut store = VecStore::with_config("my_collection.db", config)?;

    // Upsert points
    store.upsert("1", vec![0.1, 0.2, /* ... */],
                 json!({"city": "NYC"}))?;
    store.upsert("2", vec![0.3, 0.4, /* ... */],
                 json!({"city": "SF"}))?;

    // Search
    let query = Query::new(vec![0.2, 0.3, /* ... */])
        .with_limit(10)
        .with_filter("city = 'NYC'");

    let results = store.query(query)?;

    for result in results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}
```

### Feature Mapping: Qdrant → VecStore

| Qdrant Feature | VecStore Equivalent |
|----------------|---------------------|
| `client.upsert()` | `store.upsert()` |
| `client.search()` | `store.query()` |
| `client.delete()` | `store.delete()` |
| `client.retrieve()` | Query with ID filter |
| `client.scroll()` | Iterate over `store.query()` |
| Payload filtering | `Query::with_filter()` |
| Named vectors | Multiple `VecStore` instances |
| Snapshots | Built-in with `VersionedStore` |

---

## Migration from Weaviate

### 1. Export Data from Weaviate

```python
import weaviate
import json

client = weaviate.Client("http://localhost:8080")

# Query all objects
result = client.query.get("MyClass", ["_additional { id vector }"]).with_limit(10000).do()

# Save to JSON
vectors = []
for obj in result['data']['Get']['MyClass']:
    vectors.append({
        "id": obj['_additional']['id'],
        "vector": obj['_additional']['vector'],
        "properties": obj
    })

with open('weaviate_export.json', 'w') as f:
    json.dump({"vectors": vectors}, f)
```

### 2. Code Migration

#### Weaviate Code:

```python
import weaviate

client = weaviate.Client("http://localhost:8080")

# Add objects
client.batch.add_data_object(
    data_object={"title": "Document 1", "content": "..."},
    class_name="Document",
    vector=[0.1, 0.2, 0.3, ...]
)

# Search
result = client.query.get(
    "Document",
    ["title", "content"]
).with_near_vector({
    "vector": [0.15, 0.25, 0.35, ...]
}).with_limit(10).with_where({
    "path": ["category"],
    "operator": "Equal",
    "valueString": "tech"
}).do()
```

#### VecStore Equivalent:

```rust
use vecstore::{VecStore, Query};
use serde_json::json;

fn main() -> anyhow::Result<()> {
    let mut store = VecStore::open("documents.db")?;

    // Add objects
    store.upsert("doc1", vec![0.1, 0.2, 0.3, /* ... */],
                 json!({
                     "title": "Document 1",
                     "content": "...",
                     "category": "tech"
                 }))?;

    // Search
    let query = Query::new(vec![0.15, 0.25, 0.35, /* ... */])
        .with_limit(10)
        .with_filter("category = 'tech'");

    let results = store.query(query)?;

    for result in results {
        if let Some(title) = result.metadata.fields.get("title") {
            println!("Title: {}", title);
        }
    }

    Ok(())
}
```

---

## Feature Comparison

### Core Features

| Feature | Pinecone | Qdrant | Weaviate | VecStore |
|---------|----------|---------|----------|----------|
| HNSW Indexing |||||
| Metadata Filtering |||||
| Hybrid Search |||||
| Product Quantization |||||
| Persistence | ☁️  Cloud ||||
| Embedded Mode |||||
| Multi-tenancy | ✅ Namespaces | ✅ Collections | ✅ Tenants | ✅ Partitions |

### Advanced Features

| Feature | VecStore Implementation |
|---------|------------------------|
| Clustering | `KMeansClustering`, `DBSCAN`, `Hierarchical` |
| Anomaly Detection | `IsolationForest`, `LOF`, `ZScoreDetector` |
| Dimensionality Reduction | `PCA` with 8x compression |
| Recommendations | Content-based, Collaborative, Hybrid |
| Versioning | Full version history with rollback |
| Query Optimization | Cost estimation and hints |
| Bulk Migration | From Pinecone, Qdrant, ChromaDB |

---

## Performance Optimization

### After Migration

1. **Add Metadata Indexes** for frequently filtered fields:

```rust
use vecstore::MetadataIndexManager;

let mut index_manager = MetadataIndexManager::new();

// Create BTree index for range queries
index_manager.create_index("price", IndexType::BTree);

// Create Hash index for equality queries
index_manager.create_index("category", IndexType::Hash);

// Create Inverted index for text search
index_manager.create_index("tags", IndexType::Inverted);
```

2. **Use Query Optimizer** to analyze performance:

```rust
use vecstore::QueryOptimizer;

let optimizer = QueryOptimizer::new(&store);
let analysis = optimizer.analyze_query(&query)?;

println!("Estimated cost: {:.2}ms", analysis.estimated_cost);

for hint in analysis.hints {
    println!("Hint: {}", hint.suggestion);
}
```

3. **Enable Product Quantization** for compression:

```rust
use vecstore::{PQConfig, PQVectorStore};

let pq_config = PQConfig {
    num_subvectors: 16,
    num_centroids: 256,
    training_iterations: 20,
};

// 8-32x memory reduction
let pq_store = PQVectorStore::new(pq_config)?;
```

4. **Use Partitioning** for multi-tenant applications:

```rust
use vecstore::PartitionedStore;

let mut store = PartitionedStore::new("data", PartitionConfig::default())?;

// Each tenant gets isolated storage
store.insert("tenant_a", "doc1", vector, metadata)?;
store.insert("tenant_b", "doc2", vector, metadata)?;

// Fast tenant-specific queries
let results = store.query_partition("tenant_a", query)?;
```

---

## Troubleshooting

### Common Issues

#### 1. **Performance slower than expected**

**Solution:**
- Add metadata indexes for filtered fields
- Use Product Quantization for large datasets (>100K vectors)
- Enable partitioning for multi-tenant scenarios
- Run query optimizer to get specific hints

#### 2. **Out of memory with large datasets**

**Solution:**
- Enable Product Quantization (8-32x memory reduction)
- Use dimensionality reduction (PCA) to reduce vector dimensions
- Partition data into smaller chunks
- Use mmap-based storage for very large datasets

#### 3. **Filter queries are slow**

**Solution:**
- Create appropriate indexes:
  - BTree for range queries (age > 18, price < 100)
  - Hash for equality queries (category = 'tech')
  - Inverted for text search (tags contain 'machine learning')

```rust
let mut index_mgr = MetadataIndexManager::new();
index_mgr.create_index("category", IndexType::Hash);
```

#### 4. **Migration takes too long**

**Solution:**
- Increase batch size (default: 100, try 1000-10000)
- Disable validation for trusted data
- Use resume capability for interrupted migrations

```rust
let config = MigrationConfig {
    batch_size: 10000,  // Larger batches
    validate: false,     // Skip validation
    resume_from: Some(50000),  // Resume from vector 50000
};
```

#### 5. **Need to migrate incremental updates**

**Solution:**
- Use versioning to track changes
- Export only new/modified vectors
- Use `upsert` for incremental updates

```rust
use vecstore::VersionedStore;

let mut store = VersionedStore::new("vectors.db")?;

// Tracks all changes automatically
store.update("doc1", new_vector, metadata,
             Some("Updated from Pinecone".to_string()))?;
```

---

## Getting Help

- **Documentation**: https://docs.rs/vecstore
- **Examples**: https://github.com/yourusername/vecstore/tree/main/examples
- **Issues**: https://github.com/yourusername/vecstore/issues

## Next Steps

After migration:

1. ✅ Verify all data migrated correctly
2. ✅ Update application code to use VecStore API
3. ✅ Add metadata indexes for filtered fields
4. ✅ Run query optimizer to identify bottlenecks
5. ✅ Consider enabling Product Quantization for large datasets
6. ✅ Set up regular backups (simple file copy!)
7. ✅ Monitor performance with built-in metrics

Welcome to VecStore! 🎉