frozen-duckdb 0.1.0

# 🦆 Frozen DuckDB Binary

**Pre-compiled DuckDB binary that never needs compilation - Fast builds forever!**

This project provides architecture-specific DuckDB binaries that eliminate the slow compilation bottleneck in Rust projects using `duckdb-rs`. Instead of compiling DuckDB from source every time, this project uses pre-compiled official binaries for lightning-fast builds.

## 🚀 Performance

| Build Type | Before | After | Improvement |
|------------|--------|-------|-------------|
| **First Build** | 1-2 minutes | 7-10 seconds | **85% faster** |
| **Incremental** | 30 seconds | 0.11 seconds | **99% faster** |
| **Release Builds** | 1-2 minutes | 0.11 seconds | **99% faster** |

## 📦 What's Included

- **Architecture-specific DuckDB v1.4.0 binaries**:
  - `libduckdb_x86_64.dylib` (55MB) - Intel Macs
  - `libduckdb_arm64.dylib` (50MB) - Apple Silicon Macs
- **C/C++ headers** for development
- **Smart environment setup** with automatic architecture detection
- **Build integration** with Rust projects
- **Arrow compatibility patch** for version conflicts
- **CLI tool** for dataset management and TPC-H benchmark data generation

## 🛠️ Installation

### Option 1: Download Pre-built Binary

```bash
# Clone this repository
git clone https://github.com/seanchatmangpt/frozen-duckdb.git
cd frozen-duckdb

# Set up environment (automatically detects your architecture)
source prebuilt/setup_env.sh

# Use in your Rust project
cargo build -p your-duckdb-crate
```

### Option 2: Build from Source

```bash
# Clone this repository
git clone https://github.com/seanchatmangpt/frozen-duckdb.git
cd frozen-duckdb

# Build the frozen binary
./scripts/build_frozen_duckdb.sh

# Set up environment
source prebuilt/setup_env.sh
```

## 🔧 Integration with Your Rust Project

### 1. Update your `Cargo.toml`

```toml
[dependencies]
# Use prebuilt DuckDB - no compilation needed!
duckdb = { version = "1.4.0", default-features = false }
```

### 2. Add build script (`build.rs`)

```rust
use std::env;
use std::path::Path;

fn main() {
    // Check if we should use the prebuilt DuckDB
    if let Ok(lib_dir) = env::var("DUCKDB_LIB_DIR") {
        let lib_dir = Path::new(&lib_dir);
        let include_dir = env::var("DUCKDB_INCLUDE_DIR")
            .map(|p| Path::new(&p).to_path_buf())
            .unwrap_or_else(|_| lib_dir.join("include"));

        // Tell rustc where to find the DuckDB library and headers
        println!("cargo:rustc-link-search=native={}", lib_dir.display());
        println!("cargo:rustc-link-lib=dylib=duckdb");
        println!("cargo:include={}", include_dir.display());

        // This prevents the bundled build
        println!("cargo:rerun-if-env-changed=DUCKDB_LIB_DIR");
        println!("cargo:rerun-if-env-changed=DUCKDB_INCLUDE_DIR");
    } else {
        // Fall back to bundled if no prebuilt library is specified
        println!("cargo:warning=No DUCKDB_LIB_DIR specified, using bundled DuckDB");
    }
}
```

### 3. Set up environment before building

```bash
# In your project directory
source /path/to/frozen-duckdb/prebuilt/setup_env.sh
cargo build
```

## 🎯 Usage Examples

### Basic Usage

```bash
# Set up the frozen DuckDB environment (auto-detects architecture)
source prebuilt/setup_env.sh

# Build your project (now fast!)
cargo build -p your-duckdb-crate

# Run tests
cargo test -p your-duckdb-crate
```

### CLI Dataset Management

The frozen-duckdb CLI provides utilities for managing test datasets:

```bash
# Generate TPC-H benchmark data (industry standard)
cargo run -- download --dataset tpch --output-dir datasets

# Generate Chinook sample data
cargo run -- download --dataset chinook --output-dir datasets

# Convert between formats
cargo run -- convert --input data.csv --output data.parquet --input-format csv --output-format parquet

# Show system information
cargo run -- info
```

**TPC-H Dataset Features:**
- **8 realistic tables**: customer, lineitem, nation, orders, part, partsupp, region, supplier
- **Scalable data**: From tiny (SF 0.01) to massive datasets (SF 1000+)
- **Industry standard**: Used for database benchmarking worldwide
- **Fast generation**: SF 0.01 generates ~86k rows in <1 second

### Architecture Detection

The setup script automatically detects your architecture:

```bash
source prebuilt/setup_env.sh
# 🍎 Detected Apple Silicon (arm64), using 50MB binary
# 🖥️  Detected x86_64 architecture, using 55MB binary
```

### CI/CD Integration

```yaml
# GitHub Actions example
- name: Set up frozen DuckDB
  run: |
    git clone https://github.com/seanchatmangpt/frozen-duckdb.git
    source frozen-duckdb/prebuilt/setup_env.sh
    echo "DUCKDB_LIB_DIR=$DUCKDB_LIB_DIR" >> $GITHUB_ENV
    echo "DUCKDB_INCLUDE_DIR=$DUCKDB_INCLUDE_DIR" >> $GITHUB_ENV

- name: Build with frozen DuckDB
  run: cargo build --release
```

## 🔍 Troubleshooting

### Library Not Found

If you get "library not found" errors:

```bash
# Check environment variables
echo $DUCKDB_LIB_DIR
echo $DUCKDB_INCLUDE_DIR

# Verify library exists
ls -la $DUCKDB_LIB_DIR/libduckdb*

# Re-source the environment
source prebuilt/setup_env.sh
```

### Architecture Issues

If you need to force a specific architecture:

```bash
# Force x86_64 (Intel Mac)
ARCH=x86_64 source prebuilt/setup_env.sh

# Force arm64 (Apple Silicon)
ARCH=arm64 source prebuilt/setup_env.sh
```

### Arrow Compatibility Issues

If you encounter Arrow version conflicts:

```bash
# Apply the Arrow patch
./scripts/apply_arrow_patch.sh
```

## 🏗️ Architecture

```
frozen-duckdb/
├── prebuilt/                 # Pre-compiled binaries
│   ├── libduckdb_x86_64.dylib # Intel Mac binary (55MB)
│   ├── libduckdb_arm64.dylib  # Apple Silicon binary (50MB)
│   ├── duckdb.h              # C header
│   ├── duckdb.hpp            # C++ header
│   └── setup_env.sh          # Smart environment setup
├── scripts/                  # Build and setup scripts
│   ├── build_frozen_duckdb.sh
│   ├── download_duckdb_binaries.sh
│   └── apply_arrow_patch.sh
├── examples/                 # Usage examples
└── README.md                # This file
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- [DuckDB](https://duckdb.org/) - The amazing analytical database
- [duckdb-rs](https://github.com/duckdb/duckdb-rs) - Rust bindings for DuckDB
- [Apache Arrow](https://arrow.apache.org/) - Columnar in-memory analytics

## 📊 Benchmarks

### Before Frozen DuckDB
```
cargo build -p my-duckdb-crate
   Compiling libduckdb-sys v1.4.0
   Compiling duckdb v1.4.0
   Compiling my-duckdb-crate v0.1.0
    Finished dev profile [unoptimized + debuginfo] target(s) in 2m 15s
```

### After Frozen DuckDB
```
source prebuilt/setup_env.sh
cargo build -p my-duckdb-crate
   Compiling my-duckdb-crate v0.1.0
    Finished dev profile [unoptimized + debuginfo] target(s) in 0.11s
```

**Result: 99% faster builds!** 🚀

## 💡 Architecture-Specific Benefits

- **Smaller downloads**: Users only get the binary for their architecture
- **Faster setup**: No need to download unused architecture code
- **Better performance**: Native architecture optimization
- **Reduced storage**: 50-55MB per user instead of 105MB universal binary