sql-rs 0.1.1

A SQL database with vector similarity search capabilities
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# SQL-RS

A lightweight, embedded database written in Rust that combines traditional relational database features with vector database capabilities for AI/ML applications.

## Features

### Core Database Features
- **Traditional Database**: B-tree based storage with ACID properties
- **Vector Database**: High-dimensional embeddings with similarity search
- **CLI Interface**: Simple command-line interface for all operations
- **Single File**: Database stored in a single file for easy portability
- **Rust Edition 2024**: Modern Rust with latest features and improvements

### Advanced Features
- **Write-Ahead Logging (WAL)**: Durability and crash recovery support
- **Transaction Support**: BEGIN, COMMIT, ROLLBACK operations
- **Full SQL Support**: CREATE, INSERT, SELECT, UPDATE, DELETE, DROP TABLE
- **WHERE Clauses**: Filter data with comparison operators (=, !=, >, <, >=, <=)
- **Logical Operators**: AND/OR for complex filtering
- **ORDER BY**: Sort results in ascending or descending order
- **LIMIT/OFFSET**: Pagination support for query results
- **Aggregates**: COUNT, SUM, AVG, MIN, MAX
- **GROUP BY**: Group and aggregate data
- **JOINs**: INNER, LEFT, RIGHT joins
- **Indexes**: CREATE INDEX and DROP INDEX support
- **Multiple Data Types**: INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, NULL

### Vector Database Features
- **HNSW Index**: Fast approximate nearest neighbor search
- **Multiple Distance Metrics**: Cosine, Euclidean, Dot Product
- **Vector Quantization**: 8-bit and 16-bit quantization for memory efficiency
- **Metadata Filtering**: Search vectors with metadata filters
- **Batch Operations**: Efficient batch insert and search
- **Vector Persistence**: Automatic persistence to storage engine
- **Vector Statistics**: Collection-level statistics and information

## Installation

### As a Library

Add this to your `Cargo.toml`:

```toml
[dependencies]
sql_rs = "0.1"
```

### Build from Source

```bash
cargo build --release
```

The compiled binary will be at `target/release/sql-rs`.

## Usage

### Create a Database

```bash
./target/release/sql-rs create mydb.db
```

### Database Information

View comprehensive database statistics including size, collections, and tables:

```bash
./target/release/sql-rs info mydb.db
```

Output:
```
╔════════════════════════════════════════════════════════╗
║              SQL-RS Database Statistics                  ║
╠════════════════════════════════════════════════════════╣
║ Database Path: mydb.db                                   ║
╠────────────────────────────────────────────────────────╣
║ Total Pages: 5                                           ║
║ Page Size: 4096 bytes                                   ║
║ Total Size: 0.02 MB                                     ║
╠────────────────────────────────────────────────────────╣
║ Vector Collections: 1                                    ║
║ Tables: 0                                                ║
╠────────────────────────────────────────────────────────╣
║ Vector Collections:                                      ║
║   • embeddings (Vectors: 100, Dimension: 384, Metric: Cosine) ║
╚════════════════════════════════════════════════════════╝
```

### Traditional Database Operations

#### Create a Table

```bash
./target/release/sql-rs query mydb.db "CREATE TABLE users (id INTEGER, name TEXT, age INTEGER)"
```

#### Insert Data

```bash
./target/release/sql-rs query mydb.db "INSERT INTO users VALUES (1, 'Alice', 30)"
./target/release/sql-rs query mydb.db "INSERT INTO users VALUES (2, 'Bob', 25)"
```

#### Query Data

```bash
# Select all
./target/release/sql-rs query mydb.db "SELECT * FROM users"

# Filter with WHERE
./target/release/sql-rs query mydb.db "SELECT * FROM users WHERE age > 25"

# Sort results
./target/release/sql-rs query mydb.db "SELECT * FROM users ORDER BY age DESC"
./target/release/sql-rs query mydb.db "SELECT * FROM users ORDER BY name ASC LIMIT 10"

# Pagination
./target/release/sql-rs query mydb.db "SELECT * FROM users LIMIT 5 OFFSET 10"

# Aggregates
./target/release/sql-rs query mydb.db "SELECT COUNT(*), AVG(age) FROM users"

# GROUP BY
./target/release/sql-rs query mydb.db "SELECT age, COUNT(*) FROM users GROUP BY age"

# JOINs
./target/release/sql-rs query mydb.db "SELECT * FROM users JOIN orders ON users.id = orders.user_id"
```

#### Update Data

```bash
./target/release/sql-rs query mydb.db "UPDATE users SET age = 31 WHERE name = 'Alice'"
```

#### Delete Data

```bash
./target/release/sql-rs query mydb.db "DELETE FROM users WHERE age < 20"
```

#### Create Index

```bash
./target/release/sql-rs query mydb.db "CREATE INDEX idx_users_age ON users(age)"
```

#### Drop Table/Index

```bash
./target/release/sql-rs query mydb.db "DROP TABLE users"
./target/release/sql-rs query mydb.db "DROP INDEX idx_users_age"
```

### Vector Database Operations

#### Create a Vector Collection

```bash
./target/release/sql-rs vector create mydb.db --collection embeddings --dimension 384 --metric cosine
```

#### Add Vectors

```bash
./target/release/sql-rs vector add mydb.db \
  --collection embeddings \
  --id doc1 \
  --vector "[0.1, 0.2, 0.3, ...]" \
  --metadata '{"title": "Document 1", "category": "tech"}'
```

#### Search Similar Vectors

```bash
./target/release/sql-rs vector search mydb.db \
  --collection embeddings \
  --vector "[0.15, 0.25, 0.35, ...]" \
  --top-k 10
```

#### Filtered Search

Search vectors with metadata filters:

```bash
./target/release/sql-rs vector search-filtered mydb.db \
  --collection embeddings \
  --vector "[0.15, 0.25, 0.35, ...]" \
  --top-k 10 \
  --filter-key category \
  --filter-value tech
```

#### List Collections

```bash
./target/release/sql-rs vector list mydb.db
```

#### Delete Collection

```bash
./target/release/sql-rs vector delete mydb.db --collection embeddings
```

### Import Vectors from JSON

Import multiple vectors from a JSON file:

```bash
./target/release/sql-rs import-vectors mydb.db \
  --collection embeddings \
  --file vectors.json
```

JSON format:
```json
[
  {
    "id": "doc1",
    "embedding": [0.1, 0.2, 0.3, ...],
    "metadata": {"title": "Document 1", "category": "tech"}
  },
  {
    "id": "doc2",
    "embedding": [0.4, 0.5, 0.6, ...],
    "metadata": {"title": "Document 2", "category": "news"}
  }
]
```

## Architecture

SQL-RS follows a modular architecture:

- **Storage Layer**: Page-based B-tree storage with WAL and transaction support
- **Vector Layer**: HNSW index for approximate nearest neighbor search
- **Query Engine**: SQL parser and executor with full CRUD operations
- **Transaction Manager**: ACID-compliant transaction handling
- **CLI Layer**: Command-line interface using clap
- **Statistics Module**: Database and vector collection statistics

## Distance Metrics

SQL-RS supports three distance metrics for vector similarity:

- **Cosine**: Measures angle between vectors (default, best for semantic search)
- **Euclidean**: Measures straight-line distance (best for geometric similarity)
- **Dot Product**: Measures vector alignment (best for normalized vectors)

## Vector Quantization

Reduce memory footprint with vector quantization:
- **8-bit quantization**: 4x compression with minimal accuracy loss
- **16-bit quantization**: 2x compression with near-perfect accuracy

## Performance

- **SQL Insert**: >10k rows/sec
- **SQL Query**: <10ms for indexed lookups
- **Vector Search**: <100ms for 1M vectors (approximate with HNSW)
- **Memory Footprint**: <50MB for typical workloads
- **Page Size**: 4KB pages for efficient I/O

## Examples

### Vector Database Examples

Run the vector database examples:

```bash
# Simple vector test
cargo run --example simple_vector_test

# Semantic search demo
cargo run --example semantic_search_demo

# Performance benchmark
cargo run --release --example vector_benchmark
```

**Simple Vector Test**:
- Creates a vector collection
- Adds vectors with metadata
- Performs similarity search
- Demonstrates vector retrieval

**Semantic Search Demo**:
- Document embedding simulation
- Category-based filtering
- Multi-query demonstration
- Filtered search by metadata

**Vector Benchmark**:
- Batch insertion performance
- Search performance metrics
- Memory usage statistics

### Basic Usage Examples

```bash
cargo run --example basic_usage          # Basic database operations
cargo run --example comprehensive_demo   # All SQL features
```

See the `examples/` directory for complete usage examples.

## Testing

SQL-RS has comprehensive test coverage with **147+ passing tests**:

```bash
# Run all tests
cargo test

# Run specific test suites
cargo test --test vector_examples_test      # Vector database tests
cargo test --test vector_advanced_tests     # Advanced vector tests
cargo test --test delete_drop_tests         # DELETE and DROP tests
cargo test --test aggregate_tests           # Aggregate functions
cargo test --test group_by_tests            # GROUP BY tests
cargo test --test join_tests                # JOIN tests
cargo test --test index_tests               # Index tests
```

## Development

Built with:
- **Rust Edition 2024**: Latest Rust features and improvements
- **clap 4.5**: Command-line interface parsing
- **serde/serde_json**: Serialization and deserialization
- **bincode**: Binary encoding for efficient storage
- **parking_lot**: High-performance synchronization primitives
- **memmap2**: Memory-mapped file I/O
- **thiserror/anyhow**: Robust error handling
- **tempfile**: Test file management

### Project Structure

```
src/
├── lib.rs              # Main library entry point
├── main.rs             # CLI entry point
├── cli/                # Command-line interface
│   └── commands.rs     # Command handlers
├── storage/            # B-tree, WAL, transactions, statistics
│   ├── btree.rs        # B-tree implementation
│   ├── wal.rs          # Write-ahead logging
│   ├── transaction.rs  # Transaction manager
│   ├── file.rs         # File management
│   ├── page.rs         # Page management
│   └── stats.rs        # Database statistics
├── vector/             # HNSW index, similarity metrics
│   ├── collection.rs   # Vector collection
│   ├── hnsw.rs         # HNSW index
│   ├── similarity.rs   # Distance metrics
│   ├── quantization.rs # Vector quantization
│   └── store.rs        # Vector store
├── query/              # SQL parser and executor
│   ├── parser.rs       # SQL parsing
│   ├── planner.rs      # Query planning
│   └── executor.rs     # Query execution
└── types/              # Core data types and schemas
```

### Code Quality

- **rustfmt.toml**: Consistent code formatting configuration
- **clippy.toml**: Lint rules and warnings configuration
- **147+ tests**: Comprehensive unit and integration tests
- **Apache 2.0 License**: Permissive open-source license

### Building

```bash
cargo build                   # Debug build
cargo build --release         # Release build (optimized)
cargo fmt                     # Format code with rustfmt
cargo clippy                  # Run linter
```

## Roadmap

See [TODO.md](TODO.md) for planned features and improvements including:
- Interactive REPL mode
- Query optimizer
- Import/export (CSV, JSON)
- Database backup/restore
- CI/CD pipeline
- And more...

## Documentation

- **[ARCHITECTURE.md]ARCHITECTURE.md**: Detailed system design and architecture
- **[TODO.md]TODO.md**: Feature roadmap and planned improvements
- **[SPEC.md]SPEC.md**: Technical specifications
- **[VECTOR_ENHANCEMENTS.md]VECTOR_ENHANCEMENTS.md**: Vector database features
- **[AGENTS.md]AGENTS.md**: Development guide for contributors

## License

Copyright 2025 SQL-RS Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.