SQL Query Analyzer
Static analysis and LLM-powered optimization for SQL queries.
A comprehensive SQL analysis tool that combines fast, deterministic static analysis with optional AI-powered insights. Identifies performance issues, style violations, and security vulnerabilities in your SQL queries.
Table of Contents
- Highlights
- Installation
- Quick Start
- Rules
- Configuration
- CLI Reference
- Example
- CI/CD Integration
- LLM Providers
- Architecture
- Performance
- Contributing
- Acknowledgements
- Coverage
- License
Highlights
- 18 Built-in Rules — Performance, style, and security checks run instantly without API calls
- Schema-Aware Analysis — Validates queries against your database schema, suggests missing indexes
- Multiple Output Formats — Text, JSON, YAML, and SARIF for CI/CD integration
- Parallel Execution — Rules execute concurrently using rayon
- Optional LLM Analysis — Deep semantic analysis via OpenAI, Anthropic, or local Ollama
- Configurable — Disable rules, override severity levels, customize via TOML
Installation
From source
Pre-built binaries
Download from Releases.
Quick Start
# Run static analysis (no API key required)
# Output as SARIF for CI/CD
# Pipe queries from stdin
|
# Enable LLM analysis
Rules
Performance Rules
| ID | Rule | Severity | Description |
|---|---|---|---|
PERF001 |
Select star without limit | Warning | SELECT * without LIMIT can return unbounded rows |
PERF002 |
Leading wildcard | Warning | LIKE '%value' prevents index usage |
PERF003 |
OR instead of IN | Info | Multiple OR conditions can be simplified to IN |
PERF004 |
Large offset | Warning | OFFSET > 1000 causes performance degradation |
PERF005 |
Missing join condition | Error | Cartesian product detected |
PERF006 |
Distinct with order by | Info | Potentially redundant operations |
PERF007 |
Scalar subquery | Warning | N+1 query pattern detected |
PERF008 |
Function on column | Warning | Function calls prevent index usage |
PERF009 |
NOT IN with subquery | Warning | Can cause unexpected NULL behavior |
PERF010 |
UNION without ALL | Info | Unnecessary deduplication overhead |
PERF011 |
Select without where | Info | Full table scan on large tables |
Style Rules
| ID | Rule | Severity | Description |
|---|---|---|---|
STYLE001 |
Select star | Info | Explicit column list preferred |
STYLE002 |
Missing table alias | Info | Multi-table queries should use aliases |
Security Rules
| ID | Rule | Severity | Description |
|---|---|---|---|
SEC001 |
Missing WHERE in UPDATE | Error | Potentially dangerous bulk update |
SEC002 |
Missing WHERE in DELETE | Error | Potentially dangerous bulk delete |
Schema-Aware Rules
| ID | Rule | Severity | Description |
|---|---|---|---|
SCHEMA001 |
Missing index on filter | Warning | WHERE/JOIN column lacks index |
SCHEMA002 |
Column not in schema | Warning | Referenced column doesn't exist |
SCHEMA003 |
Index suggestion | Info | ORDER BY column could benefit from index |
Configuration
Configuration is loaded from (in order of precedence):
- Command-line arguments
- Environment variables
.sql-analyzer.tomlin current directory~/.config/sql-analyzer/config.toml
Example Configuration
[]
# Disable specific rules by ID
= ["STYLE001", "PERF011"]
# Override default severity levels
[]
= "error" # Promote to error
= "info" # Demote to info
[]
= "ollama"
= "codellama"
= "http://localhost:11434"
[]
= 3
= 1000
= 30000
= 2.0
Environment Variables
| Variable | Description |
|---|---|
LLM_API_KEY |
API key for OpenAI/Anthropic |
LLM_PROVIDER |
Provider name (openai, anthropic, ollama) |
LLM_MODEL |
Model identifier |
OLLAMA_URL |
Ollama base URL |
CLI Reference
sql-query-analyzer analyze [OPTIONS] -s <SCHEMA> -q <QUERIES>
Options
| Flag | Description | Default |
|---|---|---|
-s, --schema <FILE> |
Path to SQL schema file | required |
-q, --queries <FILE> |
Path to SQL queries file (use - for stdin) |
required |
-p, --provider <PROVIDER> |
LLM provider: openai, anthropic, ollama |
ollama |
-a, --api-key <KEY> |
API key (or use LLM_API_KEY env) |
- |
-m, --model <MODEL> |
Model name | provider default |
--ollama-url <URL> |
Ollama base URL | http://localhost:11434 |
--dialect <DIALECT> |
SQL dialect: generic, mysql, postgresql, sqlite |
generic |
-f, --output-format <FMT> |
Output: text, json, yaml, sarif |
text |
-v, --verbose |
Show complexity scores | false |
--dry-run |
Show what would be sent to LLM | false |
--no-color |
Disable colored output | false |
Exit Codes
| Code | Meaning |
|---|---|
0 |
Success, no issues or only informational |
1 |
Warnings found |
2 |
Errors found |
Example
schema.sql:
(
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP
);
(
id INT PRIMARY KEY,
user_id INT NOT NULL,
total DECIMAL(10,2),
status VARCHAR(20)
);
(user_id);
queries.sql:
SELECT * FROM users WHERE email = 'test@example.com';
SELECT * FROM orders WHERE user_id = 1 ORDER BY created_at DESC;
DELETE FROM users;
Output:
=== Static Analysis ===
Found 1 error(s), 2 warning(s), 1 info
Query #1:
[ERROR] SEC002: DELETE without WHERE clause is dangerous
→ Add WHERE clause to limit affected rows
[ WARN] SCHEMA001: Column 'email' in WHERE clause has no index
→ Consider adding index on 'email'
Query #2:
[ WARN] SCHEMA001: Column 'created_at' in ORDER BY has no index
→ Consider adding index on 'created_at'
[ INFO] SCHEMA003: ORDER BY column 'created_at' could benefit from index
→ CREATE INDEX idx_created_at ON table(created_at)
CI/CD Integration
GitHub Actions
name: SQL Analysis
on:
pull_request:
paths:
- '**/*.sql'
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install sql-query-analyzer
run: cargo install sql-query-analyzer
- name: Analyze SQL
run: |
sql-query-analyzer analyze \
-s db/schema.sql \
-q db/queries.sql \
-f sarif > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
GitLab CI
sql-analysis:
stage: test
script:
- cargo install sql-query-analyzer
- sql-query-analyzer analyze -s schema.sql -q queries.sql -f sarif > gl-sast-report.json
artifacts:
reports:
sast: gl-sast-report.json
Pre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: sql-analyzer
name: SQL Query Analyzer
entry: sql-query-analyzer analyze -s schema.sql -q
language: system
files: \.sql$
LLM Providers
| Provider | Model Examples | Notes |
|---|---|---|
| OpenAI | gpt-4, gpt-3.5-turbo |
Requires API key |
| Anthropic | claude-sonnet-4-20250514 |
Requires API key |
| Ollama | llama3.2, codellama, mistral |
Local, no API key |
Using Ollama (Recommended for Development)
# Install Ollama
|
# Pull a model
# Run analysis
Architecture
┌─────────────────────────────────────────────────────┐
│ CLI Interface │
└─────────────────────┬───────────────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ SQL Parser │ │ Schema Parser │
│ (sqlparser) │ │ (sqlparser) │
└────────┬────────┘ └────────┬────────┘
│ │
└────────────┬────────────┘
▼
┌────────────────────────┐
│ Static Analysis │
│ (18 rules, parallel) │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ LLM Analysis (opt) │
│ OpenAI/Anthropic/ │
│ Ollama │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ Output Formatter │
│ Text/JSON/YAML/SARIF │
└────────────────────────┘
Performance
- Parallel rule execution via rayon
- Query caching to avoid re-parsing identical queries
- Lazy evaluation for complexity scoring
- Memory-efficient string storage with CompactString
Typical performance: ~1000 queries analyzed in <100ms (static analysis only).
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Development
# Run tests
# Run with all checks
# Generate docs
# Format code
Acknowledgements
The idea for this tool came from Yegor Bugayenko:
It would be great to have a tool that takes two inputs: 1) the entire database schema in SQL, and 2) all SQL queries that my web app issues to the database during unit testing. The tool should use an LLM to analyze the queries and identify which ones are suboptimal, especially with respect to the existing indexes.
Coverage
Sunburst
The inner-most circle is the entire project, moving away from the center are folders then, finally, a single file. The size and color of each slice is representing the number of statements and the coverage, respectively.
Grid
Each block represents a single file in the project. The size and color of each block is represented by the number of statements and the coverage, respectively.
Icicle
The top section represents the entire project. Proceeding with folders and finally individual files. The size and color of each slice is representing the number of statements and the coverage, respectively.
License
MIT © 2025