vibesql 0.1.2

SQL database engine with SQL:1999 compliance
Documentation
# SQLite Source Code Reference

This directory contains the SQLite source code as a reference for learning and optimization.

## Purpose

SQLite is the **gold standard** for embedded SQL databases with decades of optimization experience. We include their source code as a reference to:

1. **Learn optimization techniques** - Study how they optimize specific operations
2. **Understand implementation patterns** - See how they structure aggregates, joins, constraint checking
3. **Compare approaches** - Validate our architectural decisions against battle-tested code
4. **Get specific insights** - Answer questions like "How does SQLite implement COUNT(*)?"

## License

SQLite is in the **public domain** - it's not even open source, it's more permissive!

From sqlite.org:
> SQLite is in the public domain and does not require a license.

This means we can:
- ✅ Use as reference
- ✅ Study implementation
- ✅ Include as submodule
- ✅ No attribution required (though we credit them anyway)

**Note**: We're learning from SQLite, not copying code. This is a reference for understanding proven database implementation patterns.

## Getting Started

### First Time Setup

If you just cloned the repository:

```bash
# Clone with submodules
git clone --recurse-submodules https://github.com/rjwalters/vibesql.git

# Or if already cloned
git submodule update --init --recursive
```

### Updating SQLite Reference

To update to the latest SQLite version:

```bash
cd docs/reference/sqlite
git pull origin master
cd ../../..
git add docs/reference/sqlite
git commit -m "docs: Update SQLite reference to latest version"
```

## Key Files to Study

### 1. Aggregate Optimization (COUNT, SUM, AVG, MIN, MAX)

**Files to study**:
- `src/vdbe.c` - Virtual Database Engine (query execution)
- `src/select.c` - SELECT statement processing
- `src/where.c` - WHERE clause optimization

**Questions to answer**:
- How do they optimize `COUNT(*)`?
- Do they have fast paths for simple aggregates?
- How do they handle aggregate + GROUP BY?

### 2. UPDATE Performance

**Files to study**:
- `src/update.c` - UPDATE statement implementation
- `src/vdbe.c` - Execution engine
- `src/btree.c` - B-tree storage layer

**Questions to answer**:
- How do they batch constraint checks?
- What's their UPDATE execution flow?
- How do they optimize primary key lookups?

### 3. Expression Evaluation

**Files to study**:
- `src/expr.c` - Expression evaluation
- `src/vdbe.c` - Expression execution in VDBE

**Questions to answer**:
- How do they minimize allocations?
- What's their expression evaluation strategy?
- How do they handle type coercion?

### 4. Query Optimization

**Files to study**:
- `src/where.c` - WHERE clause analysis and optimization
- `src/select.c` - Query planning
- `src/analyze.c` - Query statistics

**Questions to answer**:
- How do they detect fast paths?
- What optimizations do they apply?
- How do they handle index selection?

### 5. Storage and Indexing

**Files to study**:
- `src/btree.c` - B-tree implementation
- `src/pager.c` - Page cache
- `src/wal.c` - Write-Ahead Logging

**Questions to answer**:
- How do they structure their storage?
- What indexing strategies do they use?
- How do they maintain hash indexes?

## Searching the SQLite Source

### Finding Specific Functions

```bash
# Search for a specific function
grep -r "sqlite3_count" docs/reference/sqlite/src/

# Search for aggregate implementations
grep -r "Aggregate" docs/reference/sqlite/src/

# Search for optimization patterns
grep -r "optimization" docs/reference/sqlite/src/
```

### Finding Specific Features

```bash
# COUNT(*) optimization
grep -r "COUNT(\*)" docs/reference/sqlite/src/

# UPDATE statement handling
grep -r "UPDATE" docs/reference/sqlite/src/update.c

# Hash join implementation
grep -r "hash.*join" docs/reference/sqlite/src/
```

## Related Documentation

- **COMPARISONS.md** - Documents our implementation vs SQLite's approach
- **SQLite Official Docs** - https://www.sqlite.org/docs.html
- **SQLite Source Docs** - https://www.sqlite.org/src/doc/trunk/README.md

## Benefits

1. **Immediate reference** - Can quickly grep/search SQLite code when investigating issues
2. **Learning resource** - Team can study proven implementations
3. **Validation** - Confirm our approaches match or differ from SQLite intentionally
4. **Optimization ideas** - Discover techniques we haven't considered
5. **Documentation** - Can link to specific SQLite code in our docs/issues

## Non-Goals

- **Not copying code** - We're learning, not copying
-**Not porting SQLite** - We're building our own DB with our own design
-**Not benchmarking against SQLite internals** - Just studying implementation

## Related Issues

- #814 - COUNT(*) optimization
- #815 - UPDATE optimization
- #816 - Debug print removal