sql-rs 0.1.1

A SQL database with vector similarity search capabilities
Documentation

SQL-RS

A lightweight, embedded database written in Rust that combines traditional relational database features with vector database capabilities for AI/ML applications.

Features

Core Database Features

  • Traditional Database: B-tree based storage with ACID properties
  • Vector Database: High-dimensional embeddings with similarity search
  • CLI Interface: Simple command-line interface for all operations
  • Single File: Database stored in a single file for easy portability
  • Rust Edition 2024: Modern Rust with latest features and improvements

Advanced Features

  • Write-Ahead Logging (WAL): Durability and crash recovery support
  • Transaction Support: BEGIN, COMMIT, ROLLBACK operations
  • Full SQL Support: CREATE, INSERT, SELECT, UPDATE, DELETE, DROP TABLE
  • WHERE Clauses: Filter data with comparison operators (=, !=, >, <, >=, <=)
  • Logical Operators: AND/OR for complex filtering
  • ORDER BY: Sort results in ascending or descending order
  • LIMIT/OFFSET: Pagination support for query results
  • Aggregates: COUNT, SUM, AVG, MIN, MAX
  • GROUP BY: Group and aggregate data
  • JOINs: INNER, LEFT, RIGHT joins
  • Indexes: CREATE INDEX and DROP INDEX support
  • Multiple Data Types: INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, NULL

Vector Database Features

  • HNSW Index: Fast approximate nearest neighbor search
  • Multiple Distance Metrics: Cosine, Euclidean, Dot Product
  • Vector Quantization: 8-bit and 16-bit quantization for memory efficiency
  • Metadata Filtering: Search vectors with metadata filters
  • Batch Operations: Efficient batch insert and search
  • Vector Persistence: Automatic persistence to storage engine
  • Vector Statistics: Collection-level statistics and information

Installation

As a Library

Add this to your Cargo.toml:

[dependencies]
sql_rs = "0.1"

Build from Source

cargo build --release

The compiled binary will be at target/release/sql-rs.

Usage

Create a Database

./target/release/sql-rs create mydb.db

Database Information

View comprehensive database statistics including size, collections, and tables:

./target/release/sql-rs info mydb.db

Output:

╔════════════════════════════════════════════════════════╗
║              SQL-RS Database Statistics                  ║
╠════════════════════════════════════════════════════════╣
║ Database Path: mydb.db                                   ║
╠────────────────────────────────────────────────────────╣
║ Total Pages: 5                                           ║
║ Page Size: 4096 bytes                                   ║
║ Total Size: 0.02 MB                                     ║
╠────────────────────────────────────────────────────────╣
║ Vector Collections: 1                                    ║
║ Tables: 0                                                ║
╠────────────────────────────────────────────────────────╣
║ Vector Collections:                                      ║
║   • embeddings (Vectors: 100, Dimension: 384, Metric: Cosine) ║
╚════════════════════════════════════════════════════════╝

Traditional Database Operations

Create a Table

./target/release/sql-rs query mydb.db "CREATE TABLE users (id INTEGER, name TEXT, age INTEGER)"

Insert Data

./target/release/sql-rs query mydb.db "INSERT INTO users VALUES (1, 'Alice', 30)"
./target/release/sql-rs query mydb.db "INSERT INTO users VALUES (2, 'Bob', 25)"

Query Data

# Select all
./target/release/sql-rs query mydb.db "SELECT * FROM users"

# Filter with WHERE
./target/release/sql-rs query mydb.db "SELECT * FROM users WHERE age > 25"

# Sort results
./target/release/sql-rs query mydb.db "SELECT * FROM users ORDER BY age DESC"
./target/release/sql-rs query mydb.db "SELECT * FROM users ORDER BY name ASC LIMIT 10"

# Pagination
./target/release/sql-rs query mydb.db "SELECT * FROM users LIMIT 5 OFFSET 10"

# Aggregates
./target/release/sql-rs query mydb.db "SELECT COUNT(*), AVG(age) FROM users"

# GROUP BY
./target/release/sql-rs query mydb.db "SELECT age, COUNT(*) FROM users GROUP BY age"

# JOINs
./target/release/sql-rs query mydb.db "SELECT * FROM users JOIN orders ON users.id = orders.user_id"

Update Data

./target/release/sql-rs query mydb.db "UPDATE users SET age = 31 WHERE name = 'Alice'"

Delete Data

./target/release/sql-rs query mydb.db "DELETE FROM users WHERE age < 20"

Create Index

./target/release/sql-rs query mydb.db "CREATE INDEX idx_users_age ON users(age)"

Drop Table/Index

./target/release/sql-rs query mydb.db "DROP TABLE users"
./target/release/sql-rs query mydb.db "DROP INDEX idx_users_age"

Vector Database Operations

Create a Vector Collection

./target/release/sql-rs vector create mydb.db --collection embeddings --dimension 384 --metric cosine

Add Vectors

./target/release/sql-rs vector add mydb.db \
  --collection embeddings \
  --id doc1 \
  --vector "[0.1, 0.2, 0.3, ...]" \
  --metadata '{"title": "Document 1", "category": "tech"}'

Search Similar Vectors

./target/release/sql-rs vector search mydb.db \
  --collection embeddings \
  --vector "[0.15, 0.25, 0.35, ...]" \
  --top-k 10

Filtered Search

Search vectors with metadata filters:

./target/release/sql-rs vector search-filtered mydb.db \
  --collection embeddings \
  --vector "[0.15, 0.25, 0.35, ...]" \
  --top-k 10 \
  --filter-key category \
  --filter-value tech

List Collections

./target/release/sql-rs vector list mydb.db

Delete Collection

./target/release/sql-rs vector delete mydb.db --collection embeddings

Import Vectors from JSON

Import multiple vectors from a JSON file:

./target/release/sql-rs import-vectors mydb.db \
  --collection embeddings \
  --file vectors.json

JSON format:

[
  {
    "id": "doc1",
    "embedding": [0.1, 0.2, 0.3, ...],
    "metadata": {"title": "Document 1", "category": "tech"}
  },
  {
    "id": "doc2",
    "embedding": [0.4, 0.5, 0.6, ...],
    "metadata": {"title": "Document 2", "category": "news"}
  }
]

Architecture

SQL-RS follows a modular architecture:

  • Storage Layer: Page-based B-tree storage with WAL and transaction support
  • Vector Layer: HNSW index for approximate nearest neighbor search
  • Query Engine: SQL parser and executor with full CRUD operations
  • Transaction Manager: ACID-compliant transaction handling
  • CLI Layer: Command-line interface using clap
  • Statistics Module: Database and vector collection statistics

Distance Metrics

SQL-RS supports three distance metrics for vector similarity:

  • Cosine: Measures angle between vectors (default, best for semantic search)
  • Euclidean: Measures straight-line distance (best for geometric similarity)
  • Dot Product: Measures vector alignment (best for normalized vectors)

Vector Quantization

Reduce memory footprint with vector quantization:

  • 8-bit quantization: 4x compression with minimal accuracy loss
  • 16-bit quantization: 2x compression with near-perfect accuracy

Performance

  • SQL Insert: >10k rows/sec
  • SQL Query: <10ms for indexed lookups
  • Vector Search: <100ms for 1M vectors (approximate with HNSW)
  • Memory Footprint: <50MB for typical workloads
  • Page Size: 4KB pages for efficient I/O

Examples

Vector Database Examples

Run the vector database examples:

# Simple vector test
cargo run --example simple_vector_test

# Semantic search demo
cargo run --example semantic_search_demo

# Performance benchmark
cargo run --release --example vector_benchmark

Simple Vector Test:

  • Creates a vector collection
  • Adds vectors with metadata
  • Performs similarity search
  • Demonstrates vector retrieval

Semantic Search Demo:

  • Document embedding simulation
  • Category-based filtering
  • Multi-query demonstration
  • Filtered search by metadata

Vector Benchmark:

  • Batch insertion performance
  • Search performance metrics
  • Memory usage statistics

Basic Usage Examples

cargo run --example basic_usage          # Basic database operations
cargo run --example comprehensive_demo   # All SQL features

See the examples/ directory for complete usage examples.

Testing

SQL-RS has comprehensive test coverage with 147+ passing tests:

# Run all tests
cargo test

# Run specific test suites
cargo test --test vector_examples_test      # Vector database tests
cargo test --test vector_advanced_tests     # Advanced vector tests
cargo test --test delete_drop_tests         # DELETE and DROP tests
cargo test --test aggregate_tests           # Aggregate functions
cargo test --test group_by_tests            # GROUP BY tests
cargo test --test join_tests                # JOIN tests
cargo test --test index_tests               # Index tests

Development

Built with:

  • Rust Edition 2024: Latest Rust features and improvements
  • clap 4.5: Command-line interface parsing
  • serde/serde_json: Serialization and deserialization
  • bincode: Binary encoding for efficient storage
  • parking_lot: High-performance synchronization primitives
  • memmap2: Memory-mapped file I/O
  • thiserror/anyhow: Robust error handling
  • tempfile: Test file management

Project Structure

src/
├── lib.rs              # Main library entry point
├── main.rs             # CLI entry point
├── cli/                # Command-line interface
│   └── commands.rs     # Command handlers
├── storage/            # B-tree, WAL, transactions, statistics
│   ├── btree.rs        # B-tree implementation
│   ├── wal.rs          # Write-ahead logging
│   ├── transaction.rs  # Transaction manager
│   ├── file.rs         # File management
│   ├── page.rs         # Page management
│   └── stats.rs        # Database statistics
├── vector/             # HNSW index, similarity metrics
│   ├── collection.rs   # Vector collection
│   ├── hnsw.rs         # HNSW index
│   ├── similarity.rs   # Distance metrics
│   ├── quantization.rs # Vector quantization
│   └── store.rs        # Vector store
├── query/              # SQL parser and executor
│   ├── parser.rs       # SQL parsing
│   ├── planner.rs      # Query planning
│   └── executor.rs     # Query execution
└── types/              # Core data types and schemas

Code Quality

  • rustfmt.toml: Consistent code formatting configuration
  • clippy.toml: Lint rules and warnings configuration
  • 147+ tests: Comprehensive unit and integration tests
  • Apache 2.0 License: Permissive open-source license

Building

cargo build                   # Debug build
cargo build --release         # Release build (optimized)
cargo fmt                     # Format code with rustfmt
cargo clippy                  # Run linter

Roadmap

See TODO.md for planned features and improvements including:

  • Interactive REPL mode
  • Query optimizer
  • Import/export (CSV, JSON)
  • Database backup/restore
  • CI/CD pipeline
  • And more...

Documentation

License

Copyright 2025 SQL-RS Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.