ruvector-postgres 2.0.5

High-performance PostgreSQL vector database extension v2 - pgvector drop-in replacement with 230+ SQL functions, SIMD acceleration, Flash Attention, GNN layers, hybrid search, multi-tenancy, self-healing, and self-learning capabilities
# IVFFlat Index - Quick Reference

## Installation

```sql
-- 1. Load extension
CREATE EXTENSION ruvector;

-- 2. Create access method (run once)
\i sql/ivfflat_am.sql

-- 3. Verify
SELECT * FROM pg_am WHERE amname = 'ruivfflat';
```

## Create Index

```sql
-- Small dataset (< 10K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 50);

-- Medium dataset (10K-100K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 100);

-- Large dataset (> 100K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 500);
```

## Distance Metrics

```sql
-- Euclidean (L2)
CREATE INDEX ON table USING ruivfflat (embedding vector_l2_ops);
SELECT * FROM table ORDER BY embedding <-> '[...]' LIMIT 10;

-- Cosine
CREATE INDEX ON table USING ruivfflat (embedding vector_cosine_ops);
SELECT * FROM table ORDER BY embedding <=> '[...]' LIMIT 10;

-- Inner Product
CREATE INDEX ON table USING ruivfflat (embedding vector_ip_ops);
SELECT * FROM table ORDER BY embedding <#> '[...]' LIMIT 10;
```

## Performance Tuning

```sql
-- Fast (70% recall)
SET ruvector.ivfflat_probes = 1;

-- Balanced (85% recall)
SET ruvector.ivfflat_probes = 5;

-- Accurate (95% recall)
SET ruvector.ivfflat_probes = 10;

-- Very accurate (98% recall)
SET ruvector.ivfflat_probes = 20;
```

## Common Operations

```sql
-- Get index stats
SELECT * FROM ruvector_ivfflat_stats('idx_name');

-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_name'));

-- Rebuild index
REINDEX INDEX idx_name;

-- Drop index
DROP INDEX idx_name;
```

## File Structure

```
Implementation Files (2,106 lines total):
├── src/index/ivfflat_am.rs (673 lines)      - Access method callbacks
├── src/index/ivfflat_storage.rs (347 lines) - Storage management
├── sql/ivfflat_am.sql (61 lines)            - SQL installation
├── docs/ivfflat_access_method.md (304 lines)- Architecture docs
├── examples/ivfflat_usage.md (472 lines)    - Usage examples
└── tests/ivfflat_am_test.sql (249 lines)    - Test suite
```

## Key Implementation Features

✅ **PostgreSQL Access Method**: Full IndexAmRoutine with all callbacks
✅ **Storage Layout**: Page 0 (metadata), 1-N (centroids), N+1-M (lists)
✅ **K-means Clustering**: K-means++ init + Lloyd's algorithm
✅ **Search Algorithm**: Probe nearest centroids, re-rank candidates
✅ **Zero-Copy**: Direct heap tuple access
✅ **GUC Variables**: Configurable via ruvector.ivfflat_probes
✅ **Multiple Metrics**: L2, Cosine, Inner Product, Manhattan

## Performance Guidelines

| Dataset Size | Lists | Probes | Expected QPS | Recall |
|--------------|-------|--------|--------------|--------|
| 10K          | 50    | 5      | 1000         | 85%    |
| 100K         | 100   | 10     | 500          | 92%    |
| 1M           | 500   | 10     | 250          | 95%    |
| 10M          | 1000  | 10     | 125          | 95%    |

## Troubleshooting

**Slow queries?**
```sql
SET ruvector.ivfflat_probes = 1;  -- Reduce probes
```

**Low recall?**
```sql
SET ruvector.ivfflat_probes = 20;  -- Increase probes
-- OR
CREATE INDEX ... WITH (lists = 1000);  -- More lists
```

**Index build fails?**
```sql
-- Reduce lists if memory constrained
CREATE INDEX ... WITH (lists = 50);
```

## Documentation

- **Architecture**: `docs/ivfflat_access_method.md`
- **Usage Examples**: `examples/ivfflat_usage.md`
- **Test Suite**: `tests/ivfflat_am_test.sql`
- **Overview**: `README_IVFFLAT.md`
- **Summary**: `IMPLEMENTATION_SUMMARY.md`