# CLI API Reference
## Overview
The Frozen DuckDB CLI provides a **comprehensive command-line interface** for dataset management, format conversion, performance benchmarking, and **LLM operations** via the Flock extension.
## Command Structure
```bash
frozen-duckdb [OPTIONS] <COMMAND>
Options:
-v, --verbose... Increase verbosity (can be used multiple times)
-h, --help Print help
-V, --version Print version
Commands:
download Download and generate sample datasets
convert Convert datasets between different formats
info Show comprehensive system information
flock-setup Setup Ollama for LLM operations
complete Generate text completion
embed Generate embeddings for semantic search
search Perform semantic search
filter Filter data using LLM evaluation
summarize Summarize text collections
test Show testing guidance
benchmark Benchmark operations (coming soon)
```
## Dataset Management Commands
### `download` - Download and Generate Datasets
Downloads or generates sample datasets for testing and development.
```bash
frozen-duckdb download --dataset <DATASET> --format <FORMAT> [OPTIONS]
Arguments:
<DATASET> Dataset name to download or generate [possible values: chinook, tpch]
Options:
-o, --output-dir <DIR> Output directory for dataset files [default: datasets]
-f, --format <FORMAT> Output format [default: csv] [possible values: csv, parquet, duckdb]
-h, --help Print help
```
**Dataset Options:**
- **`chinook`**: Music database with artists, albums, tracks, and sales data
- **`tpch`**: TPC-H decision support benchmark with 8 tables
**Examples:**
```bash
# Download Chinook dataset in CSV format
frozen-duckdb download --dataset chinook --format csv
# Generate TPC-H dataset in Parquet format
frozen-duckdb download --dataset tpch --format parquet --output-dir ./data
```
**TPC-H Dataset Contents:**
| customer | Customer information | ~1,500 | 100KB-1MB |
| lineitem | Order line items | ~6,000 | 500KB-2MB |
| nation | Country information | ~25 | 1KB |
| orders | Customer orders | ~1,500 | 100KB-500KB |
| part | Parts catalog | ~2,000 | 200KB-1MB |
| partsupp | Part-supplier relationships | ~8,000 | 300KB-1MB |
| region | Geographic regions | ~5 | 1KB |
| supplier | Supplier information | ~100 | 10KB-50KB |
### `convert` - Format Conversion
Converts datasets between different file formats.
```bash
frozen-duckdb convert --input <INPUT> --output <OUTPUT> [OPTIONS]
Options:
-i, --input <INPUT> Input file path to convert from
-o, --output <OUTPUT> Output file path to convert to
-f, --input-format <FORMAT> Input file format [default: csv] [possible values: csv, parquet, json]
-t, --output-format <FORMAT> Output file format [default: parquet] [possible values: csv, parquet, json, arrow]
-h, --help Print help
```
**Supported Conversions:**
| **CSV** | ✅ | ✅ | ✅ | ❌ |
| **Parquet** | ✅ | ✅ | ❌ | ❌ |
| **JSON** | ❌ | ❌ | ✅ | ❌ |
| **Arrow** | ❌ | ❌ | ❌ | ✅ |
**Examples:**
```bash
# Convert CSV to Parquet
frozen-duckdb convert --input data.csv --output data.parquet
# Convert Parquet to CSV with explicit formats
frozen-duckdb convert --input data.parquet --output data.csv --input-format parquet --output-format csv
```
## System Information Commands
### `info` - System Information
Displays comprehensive information about frozen DuckDB configuration.
```bash
frozen-duckdb info [OPTIONS]
Options:
-v, --verbose Show detailed information
-h, --help Print help
```
**Information Displayed:**
- **Version**: Frozen DuckDB version
- **Build Type**: Pre-compiled binary
- **Architecture**: Current system architecture
- **Available Extensions**: DuckDB extensions loaded
- **Environment Status**: Configuration validation
**Example Output:**
```bash
🦆 Frozen DuckDB Information
Version: 0.1.0
Build Type: Pre-compiled binary
Architecture: arm64
Target: darwin
Available Extensions: parquet, tpch, flock
```
## LLM Integration Commands
### `flock-setup` - Ollama Configuration
Sets up Ollama integration for LLM operations via Flock extension.
```bash
frozen-duckdb flock-setup [OPTIONS]
Options:
-u, --ollama-url <URL> Ollama server URL [default: http://localhost:11434]
-s, --skip-verification Skip model verification
-h, --help Print help
```
**Setup Process:**
1. **Install Flock extension**: `INSTALL flock FROM community; LOAD flock;`
2. **Create Ollama secret**: `CREATE SECRET ollama_secret (TYPE OLLAMA, API_URL 'http://localhost:11434')`
3. **Create models**: `CREATE MODEL('coder', 'qwen3-coder:30b', 'ollama')` and `CREATE MODEL('embedder', 'qwen3-embedding:8b', 'ollama')`
4. **Verify setup**: Test basic LLM operations
**Required Models:**
- **qwen3-coder:30b**: Text generation and completion (30.5B parameters)
- **qwen3-embedding:8b**: Embedding generation (7.6B parameters)
**Example:**
```bash
# Setup with default local Ollama
frozen-duckdb flock-setup
# Setup with custom Ollama URL
frozen-duckdb flock-setup --ollama-url http://192.168.1.100:11434
# Setup without verification (faster)
frozen-duckdb flock-setup --skip-verification
```
### `complete` - Text Completion
Generates text completion using LLM models.
```bash
frozen-duckdb complete [OPTIONS]
Options:
-p, --prompt <PROMPT> Text to complete
-i, --input <FILE> Read prompt from file
-o, --output <FILE> Write response to file
-m, --model <MODEL> Model to use [default: coder] [possible values: coder, embedder]
-h, --help Print help
```
**Usage Modes:**
1. **Direct prompt**: `--prompt "Explain recursion in programming"`
2. **File input**: `--input prompt.txt`
3. **Interactive**: No arguments (reads from stdin)
**Examples:**
```bash
# Complete text directly
frozen-duckdb complete --prompt "Explain recursion in programming"
# Read prompt from file
frozen-duckdb complete --input my_prompt.txt --output response.txt
# Interactive mode
### `embed` - Embedding Generation
Generates embeddings for semantic search and similarity operations.
```bash
frozen-duckdb embed [OPTIONS]
Options:
-t, --text <TEXT> Text to generate embeddings for
-i, --input <FILE> Read texts from file (one per line)
-o, --output <FILE> Write embeddings to file as JSON
-m, --model <MODEL> Model to use [default: embedder] [possible values: coder, embedder]
-n, --normalize Normalize embeddings
-h, --help Print help
```
**Input Formats:**
- **Single text**: `--text "Python programming language"`
- **File input**: `--input texts.txt` (one text per line)
- **Multiple texts**: Both options can be combined
**Output Format:**
```json
[
{
"text": "Python programming language",
"embedding": [0.123, 0.456, ...],
"dimensions": 1024
}
]
```
**Examples:**
```bash
# Generate embedding for single text
frozen-duckdb embed --text "machine learning"
# Generate embeddings for multiple texts from file
frozen-duckdb embed --input documents.txt --output embeddings.json
# Generate normalized embeddings
frozen-duckdb embed --text "artificial intelligence" --normalize
```
### `search` - Semantic Search
Performs semantic search using embeddings and similarity matching.
```bash
frozen-duckdb search [OPTIONS]
Options:
-q, --query <QUERY> Search query
-c, --corpus <FILE> Corpus file for search
-t, --threshold <FLOAT> Similarity threshold [default: 0.7]
-l, --limit <INT> Maximum results [default: 10]
-f, --format <FORMAT> Output format [default: text] [possible values: text, json]
-h, --help Print help
```
**Search Process:**
1. **Embed query**: Generate embedding for search query
2. **Compare embeddings**: Calculate similarity with corpus embeddings
3. **Rank results**: Sort by similarity score
4. **Filter results**: Apply threshold and limit
5. **Format output**: Return results in requested format
**Examples:**
```bash
# Basic semantic search
frozen-duckdb search --query "machine learning" --corpus documents.txt
# Search with custom threshold and limit
frozen-duckdb search --query "database optimization" --corpus papers.txt --threshold 0.8 --limit 5
# JSON output format
frozen-duckdb search --query "rust programming" --corpus code.txt --format json
```
### `filter` - LLM-based Filtering
Filters data using LLM evaluation and criteria matching.
```bash
frozen-duckdb filter [OPTIONS]
Options:
-c, --criteria <CRITERIA> Filtering criteria
-p, --prompt <PROMPT> Custom evaluation prompt
-i, --input <FILE> Input file to filter (one item per line)
-o, --output <FILE> Output file for results
-m, --model <MODEL> Model to use [default: coder] [possible values: coder, embedder]
-h, --help Print help
```
**Filtering Modes:**
1. **Criteria-based**: `--criteria "Is this about technology?"`
2. **Custom prompt**: `--prompt "Answer yes or no: {{text}}"`
3. **Positive only**: Only show items that match criteria
**Examples:**
```bash
# Filter technology-related items
frozen-duckdb filter --criteria "Is this about technology?" --input items.txt
# Custom evaluation prompt
frozen-duckdb filter --prompt "Is this a programming language? Answer yes or no: {{text}}" --input languages.txt
# Save results to file
frozen-duckdb filter --criteria "Is this positive?" --input reviews.txt --output positive_reviews.txt
```
### `summarize` - Text Summarization
Summarizes collections of text using LLM capabilities.
```bash
frozen-duckdb summarize [OPTIONS]
Options:
-i, --input <FILE> Input file or directory
-o, --output <FILE> Output file for summary
-s, --strategy <STRATEGY> Summarization strategy [default: concise] [possible values: concise, detailed, bullet]
-l, --max-length <INT> Maximum summary length in words [default: 200]
-m, --model <MODEL> Model to use [default: coder] [possible values: coder, embedder]
-h, --help Print help
```
**Input Types:**
- **Single file**: `--input document.txt` (one text per line)
- **Directory**: `--input documents/` (reads all .txt files)
- **Multiple files**: Processes all text files in directory
**Summarization Strategies:**
- **`concise`**: Brief, to-the-point summary (default)
- **`detailed`**: Comprehensive summary with key points
- **`bullet`**: Bullet-point format for easy scanning
**Examples:**
```bash
# Summarize single document
frozen-duckdb summarize --input article.txt --strategy concise
# Summarize multiple documents in directory
frozen-duckdb summarize --input papers/ --output summary.txt --strategy detailed
# Bullet-point summary with custom length
frozen-duckdb summarize --input notes.txt --strategy bullet --max-length 100
```
## Utility Commands
### `test` - Testing Guidance
Shows information about running the test suite.
```bash
frozen-duckdb test
# Output:
🧪 Tests have been moved to the test suite
Run tests with: cargo test
Run specific tests with: cargo test <test_name>
Run all tests with: cargo test --all
```
### `benchmark` - Performance Benchmarking
Runs performance benchmarks (feature coming soon).
```bash
frozen-duckdb benchmark [OPTIONS]
Options:
-o, --operation <OPERATION> Operation type to benchmark [default: query] [possible values: query, insert, export]
-n, --iterations <INT> Number of iterations [default: 1000]
-s, --size <SIZE> Dataset size [default: medium] [possible values: small, medium, large]
-h, --help Print help
```
## Error Handling
The CLI provides **clear error messages** and **consistent exit codes**:
### Exit Codes
| **0** | Success | Operation completed successfully |
| **1** | General error | Invalid arguments, file not found |
| **2** | Environment error | DUCKDB_LIB_DIR not set |
| **3** | Binary validation | No DuckDB binary found |
| **4** | Flock extension | Extension not available |
### Error Messages
**Environment Errors:**
```bash
❌ DUCKDB_LIB_DIR not set
Please run: source prebuilt/setup_env.sh
❌ No frozen DuckDB binary found in /path/to/lib
Check that binaries exist in prebuilt/
```
**LLM Errors:**
```bash
❌ Flock extension not available
Run 'frozen-duckdb flock-setup' first
❌ Model not found
Check if Ollama models are properly configured
```
**File Errors:**
```bash
❌ Failed to read input file 'missing.txt': No such file or directory
Check that the file exists and is readable
❌ Failed to write to output file 'readonly.txt': Permission denied
Check file permissions and try again
```
## Performance Characteristics
### Startup Performance
- **CLI startup**: <100ms
- **Command parsing**: <10ms
- **Environment validation**: <50ms
- **Extension loading**: <200ms (for Flock operations)
### Operation Performance
| **Dataset generation** | <10s (small) | <100MB |
| **Format conversion** | <1s (typical files) | <50MB |
| **Text completion** | <5s (typical requests) | <200MB |
| **Embedding generation** | <3s (single text) | <150MB |
| **Semantic search** | <2s (small corpus) | <100MB |
| **Text summarization** | <10s (multiple documents) | <200MB |
## Verbosity Levels
The CLI supports **multiple verbosity levels** for debugging:
```bash
# Default (WARN and above)
frozen-duckdb info
# Info level (-v)
frozen-duckdb -v info
# Debug level (-vv)
frozen-duckdb -vv download --dataset chinook
# Trace level (-vvv)
frozen-duckdb -vvv complete --prompt "test"
```
**Log Levels:**
- **No flag**: WARN level and above
- **`-v`**: INFO level and above
- **`-vv`**: DEBUG level and above
- **`-vvv`**: TRACE level and above (most verbose)
## Integration Examples
### Basic Usage Script
```bash
#!/bin/bash
# setup_frozen_duckdb.sh
# Set up environment
source prebuilt/setup_env.sh
# Verify setup
frozen-duckdb info
# Generate sample data
frozen-duckdb download --dataset chinook --format csv
# Convert to Parquet for better performance
frozen-duckdb convert --input datasets/chinook.csv --output datasets/chinook.parquet
echo "✅ Frozen DuckDB setup complete!"
```
### CI/CD Integration
```yaml
# .github/workflows/test.yml
- name: Setup frozen DuckDB
run: |
source frozen-duckdb/prebuilt/setup_env.sh
echo "DUCKDB_LIB_DIR=$DUCKDB_LIB_DIR" >> $GITHUB_ENV
echo "DUCKDB_INCLUDE_DIR=$DUCKDB_INCLUDE_DIR" >> $GITHUB_ENV
- name: Run tests
run: cargo test --all
- name: Generate test data
run: frozen-duckdb download --dataset tpch --format parquet --output-dir test_data
```
### LLM Pipeline Script
```bash
#!/bin/bash
# llm_pipeline.sh
# Setup Ollama and Flock
frozen-duckdb flock-setup
# Generate embeddings for documents
frozen-duckdb embed --input documents.txt --output embeddings.json
# Search for relevant content
frozen-duckdb search --query "$1" --corpus documents.txt --format json
# Generate summary if requested
if [[ "$2" == "summary" ]]; then
frozen-duckdb summarize --input documents.txt --output summary.txt
fi
```
## Troubleshooting
### Common Issues
#### 1. Environment Not Configured
```bash
❌ DUCKDB_LIB_DIR not set
# Solution:
source prebuilt/setup_env.sh
```
#### 2. Binary Not Found
```bash
❌ No frozen DuckDB binary found
# Solution:
ls -la prebuilt/libduckdb*
# Check that binaries exist
```
#### 3. Flock Extension Issues
```bash
❌ Flock extension not available
# Solution:
frozen-duckdb flock-setup
# Then verify Ollama is running
```
#### 4. Model Not Available
```bash
❌ Model not found
# Solution:
# Check Ollama models
ollama pull qwen3-coder:30b
ollama pull qwen3-embedding:8b
```
### Debug Information
```bash
# Show system information
frozen-duckdb info
# Show available extensions
frozen-duckdb -v info
# Test with maximum verbosity
frozen-duckdb -vvv complete --prompt "test"
```
## Performance Tuning
### Optimization Tips
1. **Use appropriate formats**: Parquet for analytical workloads, CSV for human-readable data
2. **Batch operations**: Process multiple files together when possible
3. **Memory management**: Monitor memory usage for large datasets
4. **Network optimization**: Use local Ollama for faster LLM operations
### Resource Requirements
| **RAM** | 4GB | 16GB | For LLM operations |
| **Storage** | 100MB | 1GB | Including datasets |
| **Network** | N/A | Fast local | For Ollama communication |
| **CPU** | 2 cores | 4+ cores | For parallel operations |
## Security Considerations
### Data Privacy
- **Local processing**: All operations happen locally by default
- **No data transmission**: LLM operations use local Ollama instance
- **File permissions**: Respect existing file system permissions
- **Temporary files**: Cleaned up automatically
### Network Security
- **Local Ollama only**: Designed for localhost:11434 by default
- **No external APIs**: All LLM operations through local Ollama
- **Configurable URLs**: Can specify custom Ollama endpoints
- **Network isolation**: No internet connectivity required
## Summary
The CLI API provides a **comprehensive, user-friendly interface** for all Frozen DuckDB operations, from basic dataset management to advanced LLM capabilities. The design emphasizes **ease of use**, **performance**, and **reliability** while maintaining **complete compatibility** with existing workflows.
**Key Features:**
- **Intuitive commands**: Clear, consistent command structure
- **Comprehensive options**: Rich configuration for all use cases
- **Error handling**: Clear messages and actionable guidance
- **Performance optimization**: Fast operations with minimal overhead
- **LLM integration**: Seamless Flock extension support
- **Production ready**: Robust error handling and validation