prollytree 0.3.2

A prolly (probabilistic) tree for efficient storage, retrieval, and modification of ordered data.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
# Git-Prolly SQL Manual

## Overview

The `git prolly sql` command enables SQL query capabilities on top of ProllyTree's versioned key-value store. This integration combines the power of relational databases with Git's version control, allowing you to query your data using standard SQL while maintaining full version history.

## Table of Contents

1. [Getting Started]#getting-started
2. [Command Options]#command-options
3. [Basic Usage]#basic-usage
4. [SQL Operations]#sql-operations
5. [Output Formats]#output-formats
6. [Interactive Mode]#interactive-mode
7. [Advanced Examples]#advanced-examples
8. [Git Integration]#git-integration
9. [Best Practices]#best-practices
10. [Troubleshooting]#troubleshooting

## Getting Started

### Prerequisites

1. Ensure you have initialized a git-prolly repository:
```bash
git prolly init
```

2. Verify the SQL feature is enabled (it's enabled by default):
```bash
git prolly sql --help
```

### Quick Start

```bash
# Create a table
git prolly sql "CREATE TABLE users (id INTEGER, name TEXT, email TEXT)"

# Insert data
git prolly sql "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com')"

# Query data
git prolly sql "SELECT * FROM users"

# Commit your changes
git prolly commit -m "Added users table with initial data"
```

## Command Options

```
git prolly sql [OPTIONS] [QUERY]

Arguments:
  [QUERY]  SQL query to execute

Options:
  -f, --file <FILE>      Execute query from file
  -o, --format <FORMAT>  Output format (table, json, csv)
  -i, --interactive      Start interactive SQL shell
      --verbose          Show detailed error messages
  -b, --branch <BRANCH>  Execute against specific branch or commit (SELECT queries only, requires clean status)
  -h, --help             Print help
```

## Basic Usage

### 1. Direct Query Execution

Execute a single SQL query directly from the command line:

```bash
git prolly sql "SELECT * FROM users WHERE age > 25"
```

### 2. File-based Execution

Execute SQL commands from a file:

```bash
# Create a SQL file
cat > schema.sql << EOF
CREATE TABLE products (
    id INTEGER,
    name TEXT,
    price INTEGER,
    category TEXT
);

INSERT INTO products VALUES
    (1, 'Laptop', 1200, 'Electronics'),
    (2, 'Book', 25, 'Education');
EOF

# Execute the file
git prolly sql -f schema.sql
```

### 3. Interactive Mode

Start an interactive SQL shell:

```bash
git prolly sql -i
```

In interactive mode:
- Type SQL queries and press Enter to execute
- Type `help` for available commands
- Type `exit` or `quit` to leave the shell

### 4. Historical Data Querying

Query data from specific branches or commits using the `-b` parameter:

```bash
# Query data from main branch
git prolly sql -b main "SELECT * FROM users"

# Query data from a specific commit
git prolly sql -b a1b2c3d4 "SELECT COUNT(*) FROM products"

# Query data from a feature branch
git prolly sql -b feature/new-schema "SELECT * FROM categories"
```

**Important Requirements:**
- Only `SELECT` statements are allowed when using `-b` parameter
- Your working directory must have clean status (no uncommitted staging changes)
- The branch/commit will be temporarily checked out and restored after execution

**Example with staging changes:**
```bash
# This will be blocked if you have uncommitted changes
git prolly set user:123 "John Doe"  # Creates staging changes
git prolly sql -b main "SELECT * FROM users"
# Error: Cannot use -b/--branch parameter with uncommitted staging changes

# Commit your changes first
git prolly commit -m "Add new user"
git prolly sql -b main "SELECT * FROM users"  # Now works
```

## SQL Operations

### Supported SQL Features

#### DDL (Data Definition Language)
- `CREATE TABLE` - Create new tables
- `DROP TABLE` - Remove tables (coming soon)

#### DML (Data Manipulation Language)
- `INSERT` - Add new rows
- `SELECT` - Query data
- `UPDATE` - Modify existing rows
- `DELETE` - Remove rows

#### Query Features
- `WHERE` clauses
- `ORDER BY` sorting
- `JOIN` operations (INNER, LEFT, RIGHT)
- `GROUP BY` aggregation
- Aggregate functions: `COUNT()`, `AVG()`, `MAX()`, `MIN()`
- `LIMIT` for result limiting

### Examples

#### Creating Tables

```sql
-- Simple table
CREATE TABLE users (
    id INTEGER,
    name TEXT,
    email TEXT,
    created_at TEXT
);

-- Table with more data types
CREATE TABLE products (
    id INTEGER,
    name TEXT,
    price INTEGER,
    in_stock BOOLEAN,
    description TEXT
);
```

#### Inserting Data

```sql
-- Single row insert
INSERT INTO users (id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com');

-- Multiple row insert
INSERT INTO users (id, name, email) VALUES
    (2, 'Bob Smith', 'bob@example.com'),
    (3, 'Charlie Brown', 'charlie@example.com');

-- Insert without specifying columns (must match table structure)
INSERT INTO products VALUES
    (1, 'Laptop', 1200, true, 'High-performance laptop');
```

#### Querying Data

```sql
-- Select all columns
SELECT * FROM users;

-- Select specific columns
SELECT name, email FROM users;

-- With WHERE clause
SELECT * FROM users WHERE id > 1;
SELECT * FROM products WHERE price < 100 AND in_stock = true;

-- With ORDER BY
SELECT * FROM users ORDER BY name ASC;
SELECT * FROM products ORDER BY price DESC;

-- With LIMIT
SELECT * FROM products ORDER BY price DESC LIMIT 5;
```

#### Joins

```sql
-- Create related tables
CREATE TABLE orders (
    id INTEGER,
    user_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    order_date TEXT
);

-- Inner join
SELECT u.name, p.name as product, o.quantity
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id;

-- Left join (shows all users, even without orders)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
```

#### Aggregation

```sql
-- Count rows
SELECT COUNT(*) FROM users;

-- Group by with aggregation
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products
GROUP BY category;

-- Having clause
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
```

#### Updates and Deletes

```sql
-- Update single row
UPDATE users SET email = 'newemail@example.com' WHERE id = 1;

-- Update multiple rows
UPDATE products SET price = price * 1.1 WHERE category = 'Electronics';

-- Delete rows
DELETE FROM orders WHERE order_date < '2023-01-01';
```

## Output Formats

### 1. Table Format (Default)

```bash
git prolly sql "SELECT * FROM users"
```

Output:
```
│ id     │ name              │ email                    │
├────────┼───────────────────┼──────────────────────────┤
│ I64(1) │ Str("Alice")      │ Str("alice@example.com") │
│ I64(2) │ Str("Bob")        │ Str("bob@example.com")   │
```

### 2. JSON Format

```bash
git prolly sql -o json "SELECT * FROM users"
```

Output:
```json
[
  {
    "id": 1,
    "name": "Alice",
    "email": "alice@example.com"
  },
  {
    "id": 2,
    "name": "Bob",
    "email": "bob@example.com"
  }
]
```

### 3. CSV Format

```bash
git prolly sql -o csv "SELECT * FROM users"
```

Output:
```csv
id,name,email
I64(1),Str("Alice"),Str("alice@example.com")
I64(2),Str("Bob"),Str("bob@example.com")
```

## Interactive Mode

The interactive SQL shell provides a convenient environment for exploring your data:

```bash
git prolly sql -i
```

```
🌟 ProllyTree SQL Interactive Shell
====================================
Type 'exit' or 'quit' to exit
Type 'help' for available commands

prolly-sql> CREATE TABLE test (id INTEGER, value TEXT);
✓ Table created successfully

prolly-sql> INSERT INTO test VALUES (1, 'Hello'), (2, 'World');
✓ Inserted 2 rows

prolly-sql> SELECT * FROM test;
│ id     │ value         │
├────────┼───────────────┤
│ I64(1) │ Str("Hello")  │
│ I64(2) │ Str("World")  │

prolly-sql> exit
Goodbye!
```

### Interactive Mode with Historical Data

Use interactive mode to explore historical data:

```bash
# Start interactive mode against a specific branch
git prolly sql -b feature/analytics -i
```

```
🌟 ProllyTree SQL Interactive Shell
====================================
Executing against branch/commit: feature/analytics
⚠️  Only SELECT statements are allowed in this mode
Type 'exit' or 'quit' to exit
Type 'help' for available commands

prolly-sql> SELECT COUNT(*) FROM new_analytics_table;
│ COUNT(*) │
├──────────┤
│ I64(150) │

prolly-sql> SELECT * FROM products WHERE price > 1000;
│ id     │ name                │ price     │
├────────┼─────────────────────┼───────────┤
│ I64(1) │ Str("Gaming PC")    │ I64(1500) │
│ I64(2) │ Str("MacBook Pro")  │ I64(2000) │

prolly-sql> INSERT INTO products VALUES (3, 'iPad', 800);
Error: Only SELECT statements are allowed when using -b/--branch parameter
       Historical commits/branches are read-only for data integrity

prolly-sql> exit
Goodbye!
Restored to original branch: main
```

## Advanced Examples

### 1. Historical Data Analysis

Compare data across different points in time:

```bash
# Query current data
git prolly sql "SELECT COUNT(*) as current_users FROM users"

# Query data from last week's commit
git prolly sql -b 7d1a2b3c "SELECT COUNT(*) as users_last_week FROM users"

# Compare product prices between branches
git prolly sql -b main "SELECT name, price FROM products WHERE category = 'Electronics'"
git prolly sql -b feature/price-update "SELECT name, price FROM products WHERE category = 'Electronics'"

# Analyze data growth over time
git prolly sql -b v1.0 "SELECT COUNT(*) as v1_orders FROM orders"
git prolly sql -b v2.0 "SELECT COUNT(*) as v2_orders FROM orders"
```

### 2. Complex Data Analysis

```sql
-- Create sales data
CREATE TABLE sales (
    id INTEGER,
    product_id INTEGER,
    customer_id INTEGER,
    quantity INTEGER,
    price INTEGER,
    sale_date TEXT
);

-- Top selling products
SELECT p.name, SUM(s.quantity) as total_sold, SUM(s.quantity * s.price) as revenue
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.id, p.name
ORDER BY revenue DESC
LIMIT 10;

-- Customer purchase history
SELECT c.name, COUNT(DISTINCT s.id) as purchase_count,
       SUM(s.quantity * s.price) as total_spent
FROM customers c
JOIN sales s ON c.id = s.customer_id
GROUP BY c.id, c.name
HAVING total_spent > 1000;
```

### 2. Data Migration Script

```bash
# Create migration script
cat > migrate_v2.sql << EOF
-- Add new columns to existing table
CREATE TABLE users_new (
    id INTEGER,
    name TEXT,
    email TEXT,
    created_at TEXT,
    updated_at TEXT,
    status TEXT
);

-- Copy existing data
INSERT INTO users_new (id, name, email)
SELECT id, name, email FROM users;

-- Update new fields
UPDATE users_new SET
    created_at = '2024-01-01',
    updated_at = '2024-01-01',
    status = 'active';

-- Verify migration
SELECT COUNT(*) as migrated_count FROM users_new;
EOF

# Execute migration
git prolly sql -f migrate_v2.sql
```

### 3. Reporting and Analytics

```bash
# Generate daily report
git prolly sql -o json "
SELECT
    DATE(order_date) as date,
    COUNT(*) as orders,
    SUM(quantity * price) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE(order_date)
ORDER BY date
" > daily_revenue.json

# Export customer list
git prolly sql -o csv "
SELECT id, name, email, created_at
FROM users
WHERE status = 'active'
ORDER BY name
" > active_customers.csv
```

## Git Integration

### Version Control for Your Data

All SQL operations are stored in ProllyTree and can be versioned with Git:

```bash
# Make changes
git prolly sql "INSERT INTO users VALUES (4, 'David', 'david@example.com')"
git prolly sql "UPDATE products SET price = 1100 WHERE id = 1"

# Check status
git prolly status

# Commit changes
git prolly commit -m "Added new user David and updated laptop price"

# View history
git prolly show HEAD

# Diff between commits
git prolly diff HEAD~1 HEAD
```

### Working with Branches

```bash
# Create a new branch for experimental changes
git checkout -b feature/new-schema

# Make schema changes
git prolly sql "CREATE TABLE categories (id INTEGER, name TEXT)"
git prolly sql "ALTER TABLE products ADD COLUMN category_id INTEGER"

# Commit on feature branch
git prolly commit -m "Added categories support"

# Switch back to main
git checkout main

# The new tables don't exist on main branch
git prolly sql "SELECT * FROM categories"  # Error: table not found

# Query the new schema without switching branches
git prolly sql -b feature/new-schema "SELECT * FROM categories"

# Merge when ready
git merge feature/new-schema
```

### Cross-Branch Data Comparison

Compare data between branches without switching contexts:

```bash
# Compare user counts between branches
echo "Main branch users:"
git prolly sql -b main "SELECT COUNT(*) FROM users"

echo "Feature branch users:"
git prolly sql -b feature/user-management "SELECT COUNT(*) FROM users"

# Generate reports from different branches
git prolly sql -b production -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > prod_metrics.json
git prolly sql -b staging -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > staging_metrics.json

# Compare table schemas between versions
git prolly sql -b v1.0 "SELECT name FROM sqlite_master WHERE type='table'"
git prolly sql -b v2.0 "SELECT name FROM sqlite_master WHERE type='table'"
```

## Best Practices

### 1. Schema Design

- Use meaningful table and column names
- Define appropriate data types for columns
- Consider adding indexes for frequently queried columns (when supported)

### 2. Query Optimization

- Use specific column names instead of `SELECT *` when possible
- Add appropriate WHERE clauses to limit result sets
- Use JOINs efficiently - avoid cartesian products

### 3. Data Integrity

- Commit your changes regularly with meaningful messages
- Test queries in interactive mode before running in scripts
- Use transactions for multi-step operations (when supported)

### 4. Migration Strategy

```bash
# Always backup before major changes
git prolly sql "SELECT * FROM important_table" -o json > backup.json

# Test migrations on a branch first
git checkout -b migration-test
# ... run migration ...
# If successful, merge to main
```

### 5. Historical Data Querying

- **Commit changes before using `-b`**: Always commit your staging changes before querying historical data
- **Use for read-only analysis**: The `-b` parameter is perfect for generating reports without affecting current work
- **Branch-specific schemas**: Use `-b` to query data from branches with different table structures
- **Performance**: Historical queries access committed data, so they may be slower than current branch queries

```bash
# Good practice: commit first
git prolly commit -m "Save current work"
git prolly sql -b production "SELECT * FROM metrics"

# Avoid: Don't leave uncommitted changes
git prolly set user:new "data"  # Uncommitted change
git prolly sql -b main "SELECT * FROM users"  # Will be blocked
```

## Troubleshooting

### Common Issues

1. **"Failed to open dataset"**
   - Ensure you're in a git-prolly initialized directory
   - Run `git prolly init` if needed

2. **"Table not found"**
   - Check table name spelling
   - Ensure the table was created and changes were committed
   - Use `git prolly list` to see all keys (tables are stored with `:__schema__` suffix)

3. **Query errors**
   - Use `--verbose` flag for detailed error messages
   - Check SQL syntax - the parser is strict about formatting
   - Ensure column names match exactly (case-sensitive)

4. **"Cannot use -b/--branch parameter with uncommitted staging changes"**
   - Check staging status with `git prolly status`
   - Commit your changes first: `git prolly commit -m "Save changes"`
   - Or discard changes if not needed

5. **"Only SELECT statements are allowed when using -b/--branch parameter"**
   - Historical data is read-only for safety
   - Use regular `git prolly sql` (without `-b`) for data modifications
   - Switch to the target branch if you need to make changes there

6. **"Failed to checkout branch/commit"**
   - Verify the branch/commit exists: `git branch -a` or `git log --oneline`
   - Check branch name spelling (case-sensitive)
   - Ensure you have access to the specified commit

### Performance Tips

1. **Large Result Sets**: Use LIMIT to restrict output
   ```bash
   git prolly sql "SELECT * FROM large_table LIMIT 100"
   ```

2. **Complex Queries**: Break them into steps
   ```bash
   # Create intermediate results
   git prolly sql "CREATE TABLE temp_results AS SELECT ..."
   ```

3. **Bulk Operations**: Use file-based execution
   ```bash
   # More efficient than individual commands
   git prolly sql -f bulk_insert.sql
   ```

### Debug Mode

Use verbose mode for debugging:

```bash
git prolly sql --verbose "SELECT * FROM users"
```

This will show:
- Detailed error messages
- Query execution time
- Additional context for troubleshooting

## Limitations

Current limitations (may be addressed in future versions):

1. **No ALTER TABLE** - Create new table and migrate data instead
2. **Limited aggregate functions** - SUM() may have issues with empty groups
3. **No DISTINCT in some contexts** - Use GROUP BY as workaround
4. **No subqueries in some positions** - Use JOINs or temporary tables
5. **Case-sensitive identifiers** - Table and column names are case-sensitive

## Conclusion

The `git prolly sql` command brings the power of SQL to ProllyTree's versioned storage, enabling complex data analysis while maintaining Git's version control benefits. This unique combination allows you to:

- Query your data with familiar SQL syntax
- Version control your data changes
- Branch and merge data schemas
- Track data history over time
- Export data in multiple formats

For more examples and advanced usage, see the `examples/sql_example.rs` file in the repository.