# Git-Prolly SQL Manual
## Overview
The `git prolly sql` command enables SQL query capabilities on top of ProllyTree's versioned key-value store. This integration combines the power of relational databases with Git's version control, allowing you to query your data using standard SQL while maintaining full version history.
## Table of Contents
1. [Getting Started](#getting-started)
2. [Command Options](#command-options)
3. [Basic Usage](#basic-usage)
4. [SQL Operations](#sql-operations)
5. [Output Formats](#output-formats)
6. [Interactive Mode](#interactive-mode)
7. [Advanced Examples](#advanced-examples)
8. [Git Integration](#git-integration)
9. [Best Practices](#best-practices)
10. [Troubleshooting](#troubleshooting)
## Getting Started
### Prerequisites
1. Ensure you have initialized a git-prolly repository:
```bash
git prolly init
```
2. Verify the SQL feature is enabled (it's enabled by default):
```bash
git prolly sql --help
```
### Quick Start
```bash
# Create a table
git prolly sql "CREATE TABLE users (id INTEGER, name TEXT, email TEXT)"
# Insert data
git prolly sql "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com')"
# Query data
git prolly sql "SELECT * FROM users"
# Commit your changes
git prolly commit -m "Added users table with initial data"
```
## Command Options
```
git prolly sql [OPTIONS] [QUERY]
Arguments:
[QUERY] SQL query to execute
Options:
-f, --file <FILE> Execute query from file
-o, --format <FORMAT> Output format (table, json, csv)
-i, --interactive Start interactive SQL shell
--verbose Show detailed error messages
-b, --branch <BRANCH> Execute against specific branch or commit (SELECT queries only, requires clean status)
-h, --help Print help
```
## Basic Usage
### 1. Direct Query Execution
Execute a single SQL query directly from the command line:
```bash
git prolly sql "SELECT * FROM users WHERE age > 25"
```
### 2. File-based Execution
Execute SQL commands from a file:
```bash
# Create a SQL file
cat > schema.sql << EOF
CREATE TABLE products (
id INTEGER,
name TEXT,
price INTEGER,
category TEXT
);
INSERT INTO products VALUES
(1, 'Laptop', 1200, 'Electronics'),
(2, 'Book', 25, 'Education');
EOF
# Execute the file
git prolly sql -f schema.sql
```
### 3. Interactive Mode
Start an interactive SQL shell:
```bash
git prolly sql -i
```
In interactive mode:
- Type SQL queries and press Enter to execute
- Type `help` for available commands
- Type `exit` or `quit` to leave the shell
### 4. Historical Data Querying
Query data from specific branches or commits using the `-b` parameter:
```bash
# Query data from main branch
git prolly sql -b main "SELECT * FROM users"
# Query data from a specific commit
git prolly sql -b a1b2c3d4 "SELECT COUNT(*) FROM products"
# Query data from a feature branch
git prolly sql -b feature/new-schema "SELECT * FROM categories"
```
**Important Requirements:**
- Only `SELECT` statements are allowed when using `-b` parameter
- Your working directory must have clean status (no uncommitted staging changes)
- The branch/commit will be temporarily checked out and restored after execution
**Example with staging changes:**
```bash
# This will be blocked if you have uncommitted changes
git prolly set user:123 "John Doe" # Creates staging changes
git prolly sql -b main "SELECT * FROM users"
# Error: Cannot use -b/--branch parameter with uncommitted staging changes
# Commit your changes first
git prolly commit -m "Add new user"
git prolly sql -b main "SELECT * FROM users" # Now works
```
## SQL Operations
### Supported SQL Features
#### DDL (Data Definition Language)
- `CREATE TABLE` - Create new tables
- `DROP TABLE` - Remove tables (coming soon)
#### DML (Data Manipulation Language)
- `INSERT` - Add new rows
- `SELECT` - Query data
- `UPDATE` - Modify existing rows
- `DELETE` - Remove rows
#### Query Features
- `WHERE` clauses
- `ORDER BY` sorting
- `JOIN` operations (INNER, LEFT, RIGHT)
- `GROUP BY` aggregation
- Aggregate functions: `COUNT()`, `AVG()`, `MAX()`, `MIN()`
- `LIMIT` for result limiting
### Examples
#### Creating Tables
```sql
-- Simple table
CREATE TABLE users (
id INTEGER,
name TEXT,
email TEXT,
created_at TEXT
);
-- Table with more data types
CREATE TABLE products (
id INTEGER,
name TEXT,
price INTEGER,
in_stock BOOLEAN,
description TEXT
);
```
#### Inserting Data
```sql
-- Single row insert
INSERT INTO users (id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com');
-- Multiple row insert
INSERT INTO users (id, name, email) VALUES
(2, 'Bob Smith', 'bob@example.com'),
(3, 'Charlie Brown', 'charlie@example.com');
-- Insert without specifying columns (must match table structure)
INSERT INTO products VALUES
(1, 'Laptop', 1200, true, 'High-performance laptop');
```
#### Querying Data
```sql
-- Select all columns
SELECT * FROM users;
-- Select specific columns
SELECT name, email FROM users;
-- With WHERE clause
SELECT * FROM users WHERE id > 1;
SELECT * FROM products WHERE price < 100 AND in_stock = true;
-- With ORDER BY
SELECT * FROM users ORDER BY name ASC;
SELECT * FROM products ORDER BY price DESC;
-- With LIMIT
SELECT * FROM products ORDER BY price DESC LIMIT 5;
```
#### Joins
```sql
-- Create related tables
CREATE TABLE orders (
id INTEGER,
user_id INTEGER,
product_id INTEGER,
quantity INTEGER,
order_date TEXT
);
-- Inner join
SELECT u.name, p.name as product, o.quantity
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id;
-- Left join (shows all users, even without orders)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
```
#### Aggregation
```sql
-- Count rows
SELECT COUNT(*) FROM users;
-- Group by with aggregation
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products
GROUP BY category;
-- Having clause
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
```
#### Updates and Deletes
```sql
-- Update single row
UPDATE users SET email = 'newemail@example.com' WHERE id = 1;
-- Update multiple rows
UPDATE products SET price = price * 1.1 WHERE category = 'Electronics';
-- Delete rows
DELETE FROM orders WHERE order_date < '2023-01-01';
```
## Output Formats
### 1. Table Format (Default)
```bash
git prolly sql "SELECT * FROM users"
```
Output:
```
│ id │ name │ email │
├────────┼───────────────────┼──────────────────────────┤
│ I64(1) │ Str("Alice") │ Str("alice@example.com") │
│ I64(2) │ Str("Bob") │ Str("bob@example.com") │
```
### 2. JSON Format
```bash
git prolly sql -o json "SELECT * FROM users"
```
Output:
```json
[
{
"id": 1,
"name": "Alice",
"email": "alice@example.com"
},
{
"id": 2,
"name": "Bob",
"email": "bob@example.com"
}
]
```
### 3. CSV Format
```bash
git prolly sql -o csv "SELECT * FROM users"
```
Output:
```csv
id,name,email
I64(1),Str("Alice"),Str("alice@example.com")
I64(2),Str("Bob"),Str("bob@example.com")
```
## Interactive Mode
The interactive SQL shell provides a convenient environment for exploring your data:
```bash
git prolly sql -i
```
```
🌟 ProllyTree SQL Interactive Shell
====================================
Type 'exit' or 'quit' to exit
Type 'help' for available commands
prolly-sql> CREATE TABLE test (id INTEGER, value TEXT);
✓ Table created successfully
prolly-sql> INSERT INTO test VALUES (1, 'Hello'), (2, 'World');
✓ Inserted 2 rows
prolly-sql> SELECT * FROM test;
│ id │ value │
├────────┼───────────────┤
│ I64(1) │ Str("Hello") │
│ I64(2) │ Str("World") │
prolly-sql> exit
Goodbye!
```
### Interactive Mode with Historical Data
Use interactive mode to explore historical data:
```bash
# Start interactive mode against a specific branch
git prolly sql -b feature/analytics -i
```
```
🌟 ProllyTree SQL Interactive Shell
====================================
Executing against branch/commit: feature/analytics
⚠️ Only SELECT statements are allowed in this mode
Type 'exit' or 'quit' to exit
Type 'help' for available commands
prolly-sql> SELECT COUNT(*) FROM new_analytics_table;
│ COUNT(*) │
├──────────┤
│ I64(150) │
prolly-sql> SELECT * FROM products WHERE price > 1000;
│ id │ name │ price │
├────────┼─────────────────────┼───────────┤
│ I64(1) │ Str("Gaming PC") │ I64(1500) │
│ I64(2) │ Str("MacBook Pro") │ I64(2000) │
prolly-sql> INSERT INTO products VALUES (3, 'iPad', 800);
Error: Only SELECT statements are allowed when using -b/--branch parameter
Historical commits/branches are read-only for data integrity
prolly-sql> exit
Goodbye!
Restored to original branch: main
```
## Advanced Examples
### 1. Historical Data Analysis
Compare data across different points in time:
```bash
# Query current data
git prolly sql "SELECT COUNT(*) as current_users FROM users"
# Query data from last week's commit
git prolly sql -b 7d1a2b3c "SELECT COUNT(*) as users_last_week FROM users"
# Compare product prices between branches
git prolly sql -b main "SELECT name, price FROM products WHERE category = 'Electronics'"
git prolly sql -b feature/price-update "SELECT name, price FROM products WHERE category = 'Electronics'"
# Analyze data growth over time
git prolly sql -b v1.0 "SELECT COUNT(*) as v1_orders FROM orders"
git prolly sql -b v2.0 "SELECT COUNT(*) as v2_orders FROM orders"
```
### 2. Complex Data Analysis
```sql
-- Create sales data
CREATE TABLE sales (
id INTEGER,
product_id INTEGER,
customer_id INTEGER,
quantity INTEGER,
price INTEGER,
sale_date TEXT
);
-- Top selling products
SELECT p.name, SUM(s.quantity) as total_sold, SUM(s.quantity * s.price) as revenue
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.id, p.name
ORDER BY revenue DESC
LIMIT 10;
-- Customer purchase history
SELECT c.name, COUNT(DISTINCT s.id) as purchase_count,
SUM(s.quantity * s.price) as total_spent
FROM customers c
JOIN sales s ON c.id = s.customer_id
GROUP BY c.id, c.name
HAVING total_spent > 1000;
```
### 2. Data Migration Script
```bash
# Create migration script
cat > migrate_v2.sql << EOF
-- Add new columns to existing table
CREATE TABLE users_new (
id INTEGER,
name TEXT,
email TEXT,
created_at TEXT,
updated_at TEXT,
status TEXT
);
-- Copy existing data
INSERT INTO users_new (id, name, email)
SELECT id, name, email FROM users;
-- Update new fields
UPDATE users_new SET
created_at = '2024-01-01',
updated_at = '2024-01-01',
status = 'active';
-- Verify migration
SELECT COUNT(*) as migrated_count FROM users_new;
EOF
# Execute migration
git prolly sql -f migrate_v2.sql
```
### 3. Reporting and Analytics
```bash
# Generate daily report
git prolly sql -o json "
SELECT
DATE(order_date) as date,
COUNT(*) as orders,
SUM(quantity * price) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE(order_date)
ORDER BY date
" > daily_revenue.json
# Export customer list
git prolly sql -o csv "
SELECT id, name, email, created_at
FROM users
WHERE status = 'active'
ORDER BY name
" > active_customers.csv
```
## Git Integration
### Version Control for Your Data
All SQL operations are stored in ProllyTree and can be versioned with Git:
```bash
# Make changes
git prolly sql "INSERT INTO users VALUES (4, 'David', 'david@example.com')"
git prolly sql "UPDATE products SET price = 1100 WHERE id = 1"
# Check status
git prolly status
# Commit changes
git prolly commit -m "Added new user David and updated laptop price"
# View history
git prolly show HEAD
# Diff between commits
git prolly diff HEAD~1 HEAD
```
### Working with Branches
```bash
# Create a new branch for experimental changes
git checkout -b feature/new-schema
# Make schema changes
git prolly sql "CREATE TABLE categories (id INTEGER, name TEXT)"
git prolly sql "ALTER TABLE products ADD COLUMN category_id INTEGER"
# Commit on feature branch
git prolly commit -m "Added categories support"
# Switch back to main
git checkout main
# The new tables don't exist on main branch
git prolly sql "SELECT * FROM categories" # Error: table not found
# Query the new schema without switching branches
git prolly sql -b feature/new-schema "SELECT * FROM categories"
# Merge when ready
git merge feature/new-schema
```
### Cross-Branch Data Comparison
Compare data between branches without switching contexts:
```bash
# Compare user counts between branches
echo "Main branch users:"
git prolly sql -b main "SELECT COUNT(*) FROM users"
echo "Feature branch users:"
git prolly sql -b feature/user-management "SELECT COUNT(*) FROM users"
# Generate reports from different branches
git prolly sql -b production -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > prod_metrics.json
git prolly sql -b staging -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > staging_metrics.json
# Compare table schemas between versions
git prolly sql -b v1.0 "SELECT name FROM sqlite_master WHERE type='table'"
git prolly sql -b v2.0 "SELECT name FROM sqlite_master WHERE type='table'"
```
## Best Practices
### 1. Schema Design
- Use meaningful table and column names
- Define appropriate data types for columns
- Consider adding indexes for frequently queried columns (when supported)
### 2. Query Optimization
- Use specific column names instead of `SELECT *` when possible
- Add appropriate WHERE clauses to limit result sets
- Use JOINs efficiently - avoid cartesian products
### 3. Data Integrity
- Commit your changes regularly with meaningful messages
- Test queries in interactive mode before running in scripts
- Use transactions for multi-step operations (when supported)
### 4. Migration Strategy
```bash
# Always backup before major changes
git prolly sql "SELECT * FROM important_table" -o json > backup.json
# Test migrations on a branch first
git checkout -b migration-test
# ... run migration ...
# If successful, merge to main
```
### 5. Historical Data Querying
- **Commit changes before using `-b`**: Always commit your staging changes before querying historical data
- **Use for read-only analysis**: The `-b` parameter is perfect for generating reports without affecting current work
- **Branch-specific schemas**: Use `-b` to query data from branches with different table structures
- **Performance**: Historical queries access committed data, so they may be slower than current branch queries
```bash
# Good practice: commit first
git prolly commit -m "Save current work"
git prolly sql -b production "SELECT * FROM metrics"
# Avoid: Don't leave uncommitted changes
git prolly set user:new "data" # Uncommitted change
git prolly sql -b main "SELECT * FROM users" # Will be blocked
```
## Troubleshooting
### Common Issues
1. **"Failed to open dataset"**
- Ensure you're in a git-prolly initialized directory
- Run `git prolly init` if needed
2. **"Table not found"**
- Check table name spelling
- Ensure the table was created and changes were committed
- Use `git prolly list` to see all keys (tables are stored with `:__schema__` suffix)
3. **Query errors**
- Use `--verbose` flag for detailed error messages
- Check SQL syntax - the parser is strict about formatting
- Ensure column names match exactly (case-sensitive)
4. **"Cannot use -b/--branch parameter with uncommitted staging changes"**
- Check staging status with `git prolly status`
- Commit your changes first: `git prolly commit -m "Save changes"`
- Or discard changes if not needed
5. **"Only SELECT statements are allowed when using -b/--branch parameter"**
- Historical data is read-only for safety
- Use regular `git prolly sql` (without `-b`) for data modifications
- Switch to the target branch if you need to make changes there
6. **"Failed to checkout branch/commit"**
- Verify the branch/commit exists: `git branch -a` or `git log --oneline`
- Check branch name spelling (case-sensitive)
- Ensure you have access to the specified commit
### Performance Tips
1. **Large Result Sets**: Use LIMIT to restrict output
```bash
git prolly sql "SELECT * FROM large_table LIMIT 100"
```
2. **Complex Queries**: Break them into steps
```bash
git prolly sql "CREATE TABLE temp_results AS SELECT ..."
```
3. **Bulk Operations**: Use file-based execution
```bash
git prolly sql -f bulk_insert.sql
```
### Debug Mode
Use verbose mode for debugging:
```bash
git prolly sql --verbose "SELECT * FROM users"
```
This will show:
- Detailed error messages
- Query execution time
- Additional context for troubleshooting
## Limitations
Current limitations (may be addressed in future versions):
1. **No ALTER TABLE** - Create new table and migrate data instead
2. **Limited aggregate functions** - SUM() may have issues with empty groups
3. **No DISTINCT in some contexts** - Use GROUP BY as workaround
4. **No subqueries in some positions** - Use JOINs or temporary tables
5. **Case-sensitive identifiers** - Table and column names are case-sensitive
## Conclusion
The `git prolly sql` command brings the power of SQL to ProllyTree's versioned storage, enabling complex data analysis while maintaining Git's version control benefits. This unique combination allows you to:
- Query your data with familiar SQL syntax
- Version control your data changes
- Branch and merge data schemas
- Track data history over time
- Export data in multiple formats
For more examples and advanced usage, see the `examples/sql_example.rs` file in the repository.