# Text Completion Guide
## Overview
**Text completion** is the core LLM capability that enables **code generation**, **text writing**, **question answering**, and **content creation** directly within DuckDB using the **Flock extension** and **Ollama models**.
## Basic Text Completion
### Simple Completion
**Using CLI:**
```bash
# Generate text completion
frozen-duckdb complete --prompt "Explain recursion in programming"
# Expected output:
# Recursion in programming is a technique where a function calls itself to solve a problem...
```
**Using SQL:**
```sql
-- Basic text completion
SELECT llm_complete(
{'model_name': 'coder'},
{'prompt_name': 'complete', 'context_columns': [{'data': 'Explain recursion in programming'}]}
) as explanation;
```
### File-Based Completion
**Input from file:**
```bash
# Read prompt from file
echo "Write a function to calculate fibonacci numbers in Rust" > prompt.txt
frozen-duckdb complete --input prompt.txt --output response.txt
# Interactive mode
**Output to file:**
```bash
# Save response to file
frozen-duckdb complete --prompt "Write a Python web server" --output server_code.py
# Append to existing file
frozen-duckdb complete --prompt "Add error handling" --output server_code.py
```
## Code Generation
### Function Generation
**Basic function:**
```bash
frozen-duckdb complete --prompt "Write a Rust function to calculate factorial"
# Output:
# fn factorial(n: u64) -> u64 {
# match n {
# 0 | 1 => 1,
# _ => n * factorial(n - 1),
# }
# }
```
**Advanced function:**
```bash
frozen-duckdb complete --prompt "Write a Python function to merge two sorted lists"
# Output:
# def merge_sorted_lists(list1, list2):
# result = []
# i = j = 0
# while i < len(list1) and j < len(list2):
# if list1[i] < list2[j]:
# result.append(list1[i])
# i += 1
# else:
# result.append(list2[j])
# j += 1
# result.extend(list1[i:])
# result.extend(list2[j:])
# return result
```
### Class and Module Generation
**Class generation:**
```bash
frozen-duckdb complete --prompt "Write a Python class for a bank account"
# Output:
# class BankAccount:
# def __init__(self, account_number, balance=0):
# self.account_number = account_number
# self.balance = balance
#
# def deposit(self, amount):
# if amount > 0:
# self.balance += amount
# return True
# return False
#
# def withdraw(self, amount):
# if 0 < amount <= self.balance:
# self.balance -= amount
# return True
# return False
#
# def get_balance(self):
# return self.balance
```
**Module generation:**
```bash
frozen-duckdb complete --prompt "Write a Python module for data validation"
# Output:
# """
# Data validation utilities
# """
# import re
# from typing import List, Optional
#
# def validate_email(email: str) -> bool:
# pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
# return re.match(pattern, email) is not None
#
# def validate_phone(phone: str) -> bool:
# pattern = r'^\+?1?\d{9,15}$'
# return re.match(pattern, phone) is not None
#
# def validate_credit_card(card_number: str) -> bool:
# # Remove spaces and dashes
# card = re.sub(r'[\s-]', '', card_number)
# if not card.isdigit() or len(card) < 13 or len(card) > 19:
# return False
# # Luhn algorithm
# digits = [int(d) for d in reversed(card)]
# for i in range(1, len(digits), 2):
# digits[i] *= 2
# if digits[i] > 9:
# digits[i] -= 9
# return sum(digits) % 10 == 0
```
## Content Creation
### Article Writing
**Blog post generation:**
```bash
frozen-duckdb complete --prompt "Write a blog post about machine learning in healthcare"
# Output:
# # Machine Learning in Healthcare: Transforming Patient Care
#
# ## Introduction
# Machine learning is revolutionizing healthcare by enabling more accurate diagnoses,
# personalized treatments, and improved patient outcomes...
#
# ## Diagnostic Applications
# ### Medical Imaging
# Machine learning algorithms can analyze medical images with remarkable accuracy...
#
# ## Treatment Personalization
# ### Predictive Analytics
# By analyzing patient data, ML models can predict treatment responses...
#
# ## Challenges and Considerations
# ### Data Privacy
# Healthcare data is highly sensitive and must be protected...
#
# ## Conclusion
# Machine learning holds immense potential for improving healthcare delivery...
```
**Technical documentation:**
```bash
frozen-duckdb complete --prompt "Write documentation for a REST API endpoint"
# Output:
# ## User Profile API
#
# ### GET /api/users/{id}
# Retrieves user profile information.
#
# **Parameters:**
# - `id` (path): User ID (integer)
#
# **Response:**
# ```json
# {
# "id": 123,
# "username": "john_doe",
# "email": "john@example.com",
# "profile": {
# "first_name": "John",
# "last_name": "Doe",
# "bio": "Software developer",
# "avatar_url": "https://example.com/avatar.jpg"
# },
# "created_at": "2024-01-15T10:30:00Z"
# }
# ```
#
# **Error Responses:**
# - `404`: User not found
# - `401`: Unauthorized access
```
### Creative Writing
**Story generation:**
```bash
frozen-duckdb complete --prompt "Write a short story about a robot learning to paint"
# Output:
# In the quiet corners of the artist's studio, where canvases whispered secrets of color and form,
# there lived a robot named Palette. Unlike his mechanical brethren who calculated trajectories and
# optimized algorithms, Palette dreamed in hues and shades...
```
**Poetry generation:**
```bash
frozen-duckdb complete --prompt "Write a haiku about autumn"
# Output:
# Golden leaves descend
# Whispers of wind through branches
# Autumn's quiet song
```
## Question Answering
### Technical Q&A
**Programming questions:**
```bash
frozen-duckdb complete --prompt "How do I handle errors in async Rust code?"
# Output:
# In async Rust, error handling follows similar patterns to synchronous code but with some
# important differences. Here are the key approaches:
#
# ## Using Result<T, E>
# ```rust
# async fn fetch_data() -> Result<String, Box<dyn std::error::Error>> {
# // ... async operation that might fail
# }
# ```
#
# ## Error Propagation
# ```rust
# async fn process_data() -> Result<(), Box<dyn std::error::Error>> {
# let data = fetch_data().await?;
# // ... process data
# Ok(())
# }
# ```
#
# ## Custom Error Types
# ```rust
# #[derive(Debug)]
# enum ApiError {
# Network(reqwest::Error),
# Parse(serde_json::Error),
# Validation(String),
# }
# ```
#
# ## Best Practices
# - Use `?` operator for error propagation
# - Implement `From` traits for easy conversion
# - Consider using `anyhow` or `thiserror` for flexible error handling
# - Log errors appropriately for debugging
```
**Database questions:**
```bash
frozen-duckdb complete --prompt "How do I optimize a slow SQL query?"
# Output:
# Optimizing slow SQL queries involves several strategies:
#
# ## 1. Analyze Query Execution Plan
# ```sql
# EXPLAIN ANALYZE SELECT * FROM large_table WHERE condition;
# ```
#
# ## 2. Create Appropriate Indexes
# ```sql
# CREATE INDEX idx_condition ON large_table(condition_column);
# CREATE INDEX idx_composite ON large_table(col1, col2);
# ```
#
# ## 3. Query Optimization Techniques
# - Use SELECT only needed columns instead of SELECT *
# - Avoid functions in WHERE clauses when possible
# - Use JOIN instead of subqueries when appropriate
# - Consider partitioning for very large tables
#
# ## 4. Database Configuration
# - Adjust memory settings: `SET work_mem = '256MB'`
# - Enable query result caching when appropriate
# - Monitor slow query logs for patterns
#
# ## 5. Application-Level Optimization
# - Implement query result caching
# - Use prepared statements for repeated queries
# - Batch multiple simple queries when possible
```
### Educational Content
**Concept explanation:**
```bash
frozen-duckdb complete --prompt "Explain quantum entanglement for beginners"
# Output:
# Imagine two coins that are quantum-entangled. When you flip one coin and get heads,
# the other coin instantly shows tails, no matter how far apart they are. This isn't
# just a fast communication - it's as if the coins are connected in a way that
# transcends space and time...
```
**Tutorial creation:**
```bash
frozen-duckdb complete --prompt "Create a beginner's tutorial for Docker"
# Output:
# # Docker Tutorial for Beginners
#
# ## What is Docker?
# Docker is a platform that allows you to package applications and their dependencies
# into lightweight, portable containers...
#
# ## Installation
# ### Linux
# ```bash
# curl -fsSL https://get.docker.com -o get-docker.sh
# sudo sh get-docker.sh
# ```
#
# ### macOS
# ```bash
# brew install --cask docker
# ```
#
# ## Your First Container
# ```bash
# docker run hello-world
# ```
#
# ## Building Custom Images
# ```dockerfile
# FROM node:14
# WORKDIR /app
# COPY package*.json ./
# RUN npm install
# COPY . .
# EXPOSE 3000
# CMD ["npm", "start"]
# ```
#
# ## Common Commands
# - `docker build -t my-app .` - Build an image
# - `docker run -p 3000:3000 my-app` - Run a container
# - `docker ps` - List running containers
# - `docker images` - List available images
#
# ## Best Practices
# - Use multi-stage builds to reduce image size
# - Pin specific versions in your Dockerfile
# - Use .dockerignore to exclude unnecessary files
# - Scan images for vulnerabilities regularly
```
## Advanced Completion Patterns
### Multi-step Generation
**Iterative improvement:**
```bash
# Generate initial code
frozen-duckdb complete --prompt "Write a basic calculator in Python" --output calculator_v1.py
# Improve with error handling
frozen-duckdb complete --prompt "Add error handling to this calculator code" --input calculator_v1.py --output calculator_v2.py
# Add features
frozen-duckdb complete --prompt "Add scientific calculator functions" --input calculator_v2.py --output calculator_v3.py
```
### Context-Aware Generation
**Using conversation history:**
```bash
# Build context through multiple completions
echo "I'm learning Python and want to understand classes" > context.txt
frozen-duckdb complete --input context.txt --prompt "Explain Python classes with examples" --output explanation.txt
frozen-duckdb complete --input explanation.txt --prompt "Show me how to create a class for a bank account" --output bank_account.py
```
### Template-Based Generation
**Custom prompts for specific tasks:**
```sql
-- Create specialized prompts
CREATE PROMPT('api_docs', 'Generate API documentation for this endpoint: {{endpoint}}');
CREATE PROMPT('test_cases', 'Generate test cases for this function: {{function}}');
CREATE PROMPT('code_review', 'Review this code for bugs and improvements: {{code}}');
-- Use with parameters
SELECT llm_complete(
{'model_name': 'coder'},
{
'prompt_name': 'api_docs',
'context_columns': [{'endpoint': 'GET /api/users/{id}'}]
}
);
```
## Performance Optimization
### Model Selection
**Quality vs Speed Trade-off:**
```bash
# Fast responses (7B model)
frozen-duckdb complete --prompt "Quick explanation" --model fast_coder
# High quality (30B model)
frozen-duckdb complete --prompt "Detailed technical explanation" --model quality_coder
```
**Model configuration:**
```sql
-- Create models for different use cases
CREATE MODEL('fast_coder', 'qwen3-coder:7b', 'ollama');
CREATE MODEL('quality_coder', 'qwen3-coder:30b', 'ollama');
-- Use based on requirements
SELECT CASE
WHEN prompt_length < 100 THEN 'fast_coder'
ELSE 'quality_coder'
END as selected_model;
```
### Batch Processing
**Multiple completions:**
```bash
#!/bin/bash
# batch_completions.sh
PROMPTS=("Explain variables" "Explain functions" "Explain classes")
OUTPUT_DIR="python_tutorials"
for i in "${!PROMPTS[@]}"; do
prompt="${PROMPTS[$i]}"
output_file="$OUTPUT_DIR/tutorial_$((i+1)).txt"
frozen-duckdb complete --prompt "$prompt in Python" --output "$output_file"
echo "Generated: $output_file"
done
```
**Large document processing:**
```python
import subprocess
def process_document_chunks(file_path, chunk_size=1000):
"""Process large document in chunks"""
with open(file_path, 'r') as f:
content = f.read()
# Split into chunks
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
results = []
for i, chunk in enumerate(chunks):
# Generate completion for each chunk
result = subprocess.run([
"frozen-duckdb", "complete",
"--prompt", f"Summarize this text chunk: {chunk[:200]}..."
], capture_output=True, text=True)
if result.returncode == 0:
results.append(result.stdout.strip())
return results
# Example usage
summaries = process_document_chunks("large_document.txt")
```
## Error Handling and Quality Assurance
### Input Validation
**Check prompt quality:**
```python
def validate_prompt(prompt):
"""Validate prompt before sending to LLM"""
if not prompt or len(prompt.strip()) < 10:
raise ValueError("Prompt too short or empty")
if len(prompt) > 10000:
raise ValueError("Prompt too long")
return prompt.strip()
# Example usage
try:
validated_prompt = validate_prompt(user_input)
response = llm_complete(validated_prompt)
except ValueError as e:
print(f"Invalid prompt: {e}")
```
### Output Validation
**Check response quality:**
```python
def validate_response(response, expected_format=None):
"""Validate LLM response quality"""
if not response or len(response.strip()) < 20:
raise ValueError("Response too short")
if expected_format == "code" and not any(keyword in response.lower() for keyword in ["def", "function", "class"]):
raise ValueError("Response doesn't appear to contain code")
return response
# Example usage
response = llm_complete("Write a Python function")
validated_response = validate_response(response, expected_format="code")
```
### Retry Logic
**Handle transient failures:**
```python
import time
def robust_completion(prompt, max_retries=3):
"""Generate completion with retry logic"""
for attempt in range(max_retries):
try:
response = llm_complete(prompt)
return validate_response(response)
except Exception as e:
if attempt == max_retries - 1:
raise e
print(f"Attempt {attempt + 1} failed: {e}")
time.sleep(2 ** attempt) # Exponential backoff
raise Exception("All retry attempts failed")
# Example usage
response = robust_completion("Write a complex algorithm")
```
## Integration Examples
### Rust Integration
```rust
use std::process::Command;
fn generate_documentation(function_name: &str) -> Result<String, Box<dyn std::error::Error>> {
let prompt = format!("Generate comprehensive documentation for the {} function", function_name);
let output = Command::new("frozen-duckdb")
.args(&["complete", "--prompt", &prompt])
.output()?;
if output.status.success() {
Ok(String::from_utf8(output.stdout)?)
} else {
Err(format!("Documentation generation failed: {}",
String::from_utf8(output.stderr)?).into())
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let docs = generate_documentation("process_data")?;
println!("Generated documentation:\n{}", docs);
Ok(())
}
```
### Python Integration
```python
import subprocess
import json
class LLMClient:
def __init__(self):
self.model = "coder"
def complete(self, prompt, output_file=None):
"""Generate text completion"""
cmd = ["frozen-duckdb", "complete", "--prompt", prompt, "--model", self.model]
if output_file:
cmd.extend(["--output", output_file])
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
return result.stdout.strip()
else:
raise Exception(f"LLM completion failed: {result.stderr}")
def generate_code(self, description, language="python"):
"""Generate code for given description"""
prompt = f"Write {language} code for: {description}"
return self.complete(prompt)
def explain_code(self, code):
"""Generate explanation for code"""
prompt = f"Explain this code in detail: {code}"
return self.complete(prompt)
# Example usage
client = LLMClient()
code = client.generate_code("a function to calculate fibonacci numbers")
explanation = client.explain_code(code)
print("Generated code:", code)
print("Explanation:", explanation)
```
### Shell Script Integration
```bash
#!/bin/bash
# code_generator.sh
if [[ -z "$1" ]]; then
echo "Usage: $0 <description>"
echo "Example: $0 'a web server in Python'"
exit 1
fi
DESCRIPTION="$1"
OUTPUT_DIR="generated_code"
# Create output directory
mkdir -p "$OUTPUT_DIR"
# Generate code
echo "๐ค Generating code for: $DESCRIPTION"
frozen-duckdb complete --prompt "Write complete, runnable code for: $DESCRIPTION" --output "$OUTPUT_DIR/$(echo "$DESCRIPTION" | tr ' ' '_').py"
# Generate tests
echo "๐งช Generating tests..."
echo "๐ Generating documentation..."
frozen-duckdb complete --prompt "Write README documentation for this code" --input "$OUTPUT_DIR/$(echo "$DESCRIPTION" | tr ' ' '_').py" --output "$OUTPUT_DIR/README.md"
echo "โ
Code generation complete!"
echo "Generated files in: $OUTPUT_DIR/"
ls -la "$OUTPUT_DIR/"
```
## Best Practices
### 1. Prompt Engineering
**Clear and specific:**
```bash
# Good prompt
frozen-duckdb complete --prompt "Write a Python function that takes a list of numbers and returns the sum of all even numbers"
# Bad prompt
frozen-duckdb complete --prompt "Write code"
```
**Include context:**
```bash
# Good prompt with context
frozen-duckdb complete --prompt "Write a Rust function for a web server that handles GET requests. Include error handling and JSON responses."
# Bad prompt without context
frozen-duckdb complete --prompt "Write a function"
```
### 2. Quality Assurance
**Validate outputs:**
```python
def validate_code_response(response, language="python"):
"""Validate that response contains valid code"""
if language == "python":
# Check for Python syntax indicators
return any(indicator in response for indicator in ["def ", "class ", "import ", "from "])
elif language == "rust":
return any(indicator in response for indicator in ["fn ", "struct ", "impl ", "use "])
return True
# Example usage
response = llm_complete("Write Python code")
if validate_code_response(response, "python"):
print("โ
Response appears to contain valid code")
else:
print("โ Response may not contain expected code")
```
### 3. Performance Optimization
**Batch processing:**
```bash
# Process multiple prompts efficiently
cat > prompts.txt << 'EOF'
Explain variables in Python
Explain functions in Python
Explain classes in Python
EOF
# Process all at once (more efficient)
frozen-duckdb complete --input prompts.txt --output responses.txt
```
**Caching results:**
```python
import hashlib
import json
def get_cached_completion(prompt):
"""Get completion with caching"""
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
cache_file = f"cache/{prompt_hash}.json"
# Check cache first
try:
with open(cache_file, 'r') as f:
cached = json.load(f)
return cached['response']
except FileNotFoundError:
pass
# Generate new completion
response = llm_complete(prompt)
# Cache result
os.makedirs("cache", exist_ok=True)
with open(cache_file, 'w') as f:
json.dump({'prompt': prompt, 'response': response}, f)
return response
```
## Troubleshooting
### Common Issues
#### 1. Poor Quality Responses
**Problem:** Generated text is irrelevant or low quality
**Solutions:**
```bash
# Improve prompt clarity
frozen-duckdb complete --prompt "Write a detailed explanation of recursion in programming with examples"
# Use better model
frozen-duckdb complete --prompt "Explain recursion" --model quality_coder
# Add more context
frozen-duckdb complete --prompt "As a senior developer, explain recursion to a junior developer with concrete examples"
```
#### 2. Incomplete Responses
**Problem:** Responses cut off or incomplete
**Solutions:**
```bash
# Check for length limits
frozen-duckdb complete --prompt "Write a comprehensive guide" --model quality_coder
# Use streaming if available (future feature)
# frozen-duckdb complete --prompt "Long explanation" --stream
```
#### 3. Inconsistent Formatting
**Problem:** Code formatting issues
**Solutions:**
```bash
# Specify formatting requirements
frozen-duckdb complete --prompt "Write Python code with proper indentation and comments"
# Post-process formatting
frozen-duckdb complete --prompt "Format this code properly" --input unformatted_code.txt
```
## Summary
Text completion with Frozen DuckDB provides **powerful code generation**, **content creation**, and **question answering** capabilities with **local inference** for **privacy and performance**. The system supports **multiple programming languages**, **various content types**, and **advanced prompting techniques**.
**Key Capabilities:**
- **Code generation**: Functions, classes, modules in multiple languages
- **Content creation**: Articles, documentation, tutorials, creative writing
- **Question answering**: Technical explanations, concept clarification
- **Educational content**: Tutorials, explanations, learning materials
**Performance Characteristics:**
- **Response time**: 2-5 seconds for typical requests
- **Quality**: Excellent with qwen3-coder:30b model
- **Privacy**: Complete local processing
- **Reliability**: Consistent results with proper prompting
**Integration Options:**
- **CLI interface**: Direct command-line operations
- **Rust/Python/Node.js**: Native integration with popular languages
- **SQL queries**: Direct LLM operations within DuckDB
- **Batch processing**: Efficient handling of multiple requests
**Best Practices:**
- **Clear prompting**: Specific, contextual prompts for better results
- **Quality validation**: Check outputs for relevance and accuracy
- **Performance optimization**: Use appropriate models and batch processing
- **Error handling**: Implement retry logic and fallback strategies