sql-rs 0.1.0

A SQL database with vector similarity search capabilities
Documentation
# sql-rs Development Guide for Agentic Coding Agents

This file contains essential information for agentic coding agents working on the sql-rs codebase.

## Build, Test, and Development Commands

### Core Commands

```bash
# Build the project
cargo build

# Build release version
cargo build --release

# Run all tests
cargo test

# Run tests with verbose output
cargo test -- --nocapture

# Run a specific test
cargo test test_name

# Run tests in a specific file
cargo test --test storage_tests

# Run tests with specific pattern
cargo test test_put_and_get

# Run example
cargo run --example basic_usage

# Run the CLI
cargo run -- --help
```

### No Lint/Formatting Commands

This project doesn't have rustfmt or clippy configured. If you need to format code, use standard rustfmt:

```bash
rustfmt src/**/*.rs
```

## Project Structure

```
src/
├── lib.rs          # Main library entry point, error types
├── main.rs         # CLI entry point
├── cli/            # Command-line interface implementation
├── storage/        # B-tree based storage engine
├── vector/         # Vector database functionality (HNSW)
├── query/          # SQL parser and query executor
└── types/          # Core data types (Value, Schema, Column)
```

## Code Style Guidelines

### Rust Edition and Standards

- **Rust Edition 2024** - Use latest Rust features
- Follow standard Rust idioms and conventions
- Use `cargo fmt` for code formatting when available

### Imports and Dependencies

- Organize imports: std crates first, then external crates, then local modules
- Group related imports together
- Use `use` statements for frequently used items
- External dependencies include: clap, serde, serde_json, bincode, thiserror, anyhow, memmap2, parking_lot
- Dev dependencies: tempfile, criterion

### Error Handling

- Use the custom `sql-rsError` enum defined in `src/lib.rs`
- All functions return `Result<T>` type alias
- Error types: Io, Serialization, Storage, Query, Vector, NotFound, InvalidOperation
- Use `thiserror` for error derives and `anyhow` for context where appropriate

### Naming Conventions

- **Types**: PascalCase (e.g., `StorageEngine`, `VectorCollection`)
- **Functions**: snake_case (e.g., `create_database`, `search_vectors`)
- **Constants**: SCREAMING_SNAKE_CASE (e.g., `DEFAULT_PAGE_SIZE`)
- **Modules**: snake_case (e.g., `storage`, `vector_module`)
- **Files**: snake_case (e.g., `storage_engine.rs`, `value.rs`)

### Data Types and Patterns

- Core `Value` enum: Null, Integer(i64), Float(f64), Text(String), Blob(Vec<u8>), Boolean(bool)
- Use `From` traits for type conversions
- Serialize/deserialize with serde where needed
- Use `Option<T>` for nullable values

### API Design

- Traits: `Storage`, `StorageEngine`, `VectorStore`
- Builder patterns for complex objects (Vector with_metadata)
- Consistent error handling across all APIs
- Use references for read-only access, mutable references for modifications

### Testing Patterns

- Unit tests in `src/` directories alongside implementation
- Integration tests in `tests/` directory
- Use `tempfile` for test database files
- Test naming: `test_[feature]_[scenario]`
- Test both success and error cases

### Documentation

- Use `///` for public API documentation
- Use `//!` for module-level documentation
- Include examples in doc comments where helpful
- Document error conditions and edge cases

## CLI Architecture

The CLI uses `clap` with derive macros:

- Main `Cli` struct with subcommands
- `Commands` enum: Create, Query, Insert, Info, Vector
- `VectorCommands` enum: Add, Search, Create
- All commands return `Result<()>` for consistent error handling

## Database Operations

### Storage Engine

- Page-based B-tree storage (4KB pages default)
- Write-Ahead Logging (WAL) for durability
- ACID properties with basic transaction support

### Vector Operations

- HNSW indexing for approximate nearest neighbor search
- Distance metrics: Cosine, Euclidean, DotProduct
- Metadata storage alongside vectors

### Query Engine

- SQL parser for basic operations (CREATE, INSERT, SELECT)
- Query executor with basic optimization
- JSON data insertion support

## Performance Considerations

- Targets: >10k inserts/sec, <10ms queries, <100ms vector search
- Memory footprint: <50MB typical workloads
- Use efficient data structures (B-trees, HNSW)
- Minimize allocations in hot paths

## Development Workflow

1. Always run tests before committing: `cargo test`
2. Test CLI functionality: `cargo run -- --help`
3. Verify examples work: `cargo run --example basic_usage`
4. Check that database files are properly cleaned up in tests
5. Ensure error messages are descriptive and helpful

## Common Patterns

```rust
// Error handling
pub fn operation() -> Result<()> {
    risky_operation()?;
    Ok(())
}

// JSON parsing with error context
let data: T = serde_json::from_str(json_str)
    .map_err(|e| sql-rsError::Serialization(format!("Invalid JSON: {}", e)))?;

// Vector operations
let vector = Vector::new("id", vec![1.0, 2.0, 3.0])
    .with_metadata("key", "value");
```

## Dependencies Philosophy

- Keep dependencies minimal and focused
- Prefer standard library over external crates when possible
- Use well-maintained, widely-used crates
- Avoid dependency bloat in this lightweight database

## License

Copyright 2025 SQL-RS Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.