# ccql
**Claude Code Query Language** - SQL query engine for Claude Code and Codex CLI data.
## Installation
### Homebrew (macOS/Linux)
```bash
brew install douglance/tap/ccql
```
### Shell script (macOS/Linux)
```bash
### npm
npm package pending 0.3.0 publish — use the shell script or Homebrew in the
meantime.
### Cargo
```bash
cargo install ccql
```
### From source
```bash
git clone https://github.com/douglance/devsql
cd devsql
cargo install --path crates/ccql
```
## Quick Start
```bash
# SQL is the default command - just pass a query
ccql "SELECT display FROM history ORDER BY timestamp DESC LIMIT 5"
ccql "SELECT session_id, text FROM jhistory ORDER BY ts DESC LIMIT 5"
ccql "SELECT tool_name, COUNT(*) as n FROM transcripts WHERE type='tool_use' GROUP BY tool_name"
ccql "SELECT content FROM todos WHERE status='pending'"
# Get help
ccql -h # Quick reference
ccql --help # Full documentation
ccql tables # Show table schemas
ccql examples # Show query examples
```
## Tables
| `history` | `history.jsonl` | User prompts |
| `jhistory` | `~/.codex/history.jsonl` | Codex CLI prompt history (virtual) |
| `codex_history` | Alias of `jhistory` | Codex CLI prompt history (virtual) |
| `transcripts` | `projects/<slug>/**/*.jsonl` (+ legacy `transcripts/*.jsonl`) | Conversation logs (virtual) |
| `sessions` | Same files as `transcripts` | One aggregated row per session (virtual) |
| `todos` | `todos/*.json` | Task items (virtual) |
### history
| `display` | TEXT | The prompt text |
| `timestamp` | INTEGER | Unix timestamp (ms) |
| `project` | TEXT | Project directory |
| `pastedContents` | OBJECT | Pasted content (JSON) |
### jhistory
| `session_id` | TEXT | Codex session ID |
| `ts` | INTEGER | Raw Unix timestamp (seconds) |
| `text` | TEXT | Raw prompt text |
| `display` | TEXT | Prompt text (normalized alias for `text`) |
| `timestamp` | INTEGER | Unix timestamp (milliseconds) |
`codex_history` exposes the exact same columns as `jhistory`.
### transcripts
Sourced from the modern Claude Code layout, scanned recursively:
- `projects/<slug>/<sessionId>.jsonl` — top-level sessions
- `projects/<slug>/<sessionId>/subagents/agent-<id>.jsonl` — subagent transcripts
- `transcripts/*.jsonl` — legacy flat layout (still read for backward compatibility)
The table is schemaless: every top-level JSON key in each record becomes a
column, so newer fields (`message.usage.*`, `compactMetadata`, record `type`s
like `ai-title` / `pr-link` / `mode`, etc.) surface automatically. Nested
objects are exposed as GlueSQL `MAP` values — drill into them with
`UNWRAP(<column>, '<dotted.path>')`.
| `_source_file` | TEXT | Source file name (e.g. `<sessionId>.jsonl`, `agent-<id>.jsonl`, legacy `ses_xxx.jsonl`) |
| `_session_id` | TEXT | Session ID (parent session ID for subagent files; `ses_` prefix stripped for legacy) |
| `_project` | TEXT | Project slug directory name (e.g. `-Users-you-Developer-app`); NULL for legacy flat files |
| `_agent_id` | TEXT | Subagent file stem (e.g. `agent-abc123`); NULL for non-subagent rows |
| `type` | TEXT | Record type: `user`, `assistant`, `tool_use`, `tool_result`, `ai-title`, `pr-link`, `mode`, … |
| `timestamp` | TEXT | ISO 8601 timestamp |
| `message` | OBJECT | Full message object, including `message.usage.*` token/cache counts |
| `content` | TEXT | Message text (type='user') |
| `tool_name` | TEXT | Tool name (type='tool_*') |
| `tool_input` | OBJECT | Tool parameters |
| `tool_output` | OBJECT | Tool response (type='tool_result') |
Assistant rows additionally get flattened convenience columns (only added when
the source value exists; a top-level JSON key with the same name is never
overwritten):
| `model` | TEXT | `message.model` |
| `usage_input_tokens` | INTEGER | `message.usage.input_tokens` |
| `usage_output_tokens` | INTEGER | `message.usage.output_tokens` |
| `usage_cache_read_input_tokens` | INTEGER | `message.usage.cache_read_input_tokens` |
| `usage_cache_creation_input_tokens` | INTEGER | `message.usage.cache_creation_input_tokens` |
| `usage_ephemeral_5m_input_tokens` | INTEGER | `message.usage.cache_creation.ephemeral_5m_input_tokens` |
| `usage_ephemeral_1h_input_tokens` | INTEGER | `message.usage.cache_creation.ephemeral_1h_input_tokens` |
| `usage_service_tier` | TEXT | `message.usage.service_tier` |
#### Querying cache / usage tokens
`message.usage` carries the token accounting. Use `UNWRAP` with a dotted path to
read nested values directly in SQL:
```sql
SELECT _project,
_session_id,
UNWRAP(message, 'usage.input_tokens') AS input_tokens,
UNWRAP(message, 'usage.output_tokens') AS output_tokens,
UNWRAP(message, 'usage.cache_read_input_tokens') AS cache_read,
UNWRAP(message, 'usage.cache_creation_input_tokens') AS cache_creation,
UNWRAP(message, 'usage.cache_creation.ephemeral_5m_input_tokens') AS ephemeral_5m
FROM transcripts
WHERE type = 'assistant'
AND UNWRAP(message, 'usage.cache_read_input_tokens') IS NOT NULL;
```
### sessions
One row per top-level session file (subagent files contribute only to
`subagent_count`; legacy flat files get a row with NULL `project`). Read-only,
like the other virtual tables.
| `session_id` | TEXT | Session ID (file stem; `ses_` prefix stripped for legacy files) |
| `project` | TEXT | Project slug directory name; NULL for legacy flat files |
| `cwd` | TEXT | Working directory (first seen in the session) |
| `git_branch` | TEXT | Git branch (first seen) |
| `version` | TEXT | Claude Code version (last seen) |
| `title` | TEXT | AI-generated session title (from the `ai-title` record, if any) |
| `first_timestamp` | TEXT | Earliest record timestamp (ISO 8601) |
| `last_timestamp` | TEXT | Latest record timestamp (ISO 8601) |
| `user_message_count` | INTEGER | Number of `type='user'` records |
| `assistant_message_count` | INTEGER | Number of `type='assistant'` records |
| `subagent_count` | INTEGER | Number of subagent transcript files for this session |
| `total_input_tokens` | INTEGER | Sum of `message.usage.input_tokens` over assistant records |
| `total_output_tokens` | INTEGER | Sum of `message.usage.output_tokens` |
| `total_cache_read_input_tokens` | INTEGER | Sum of `message.usage.cache_read_input_tokens` |
| `total_cache_creation_input_tokens` | INTEGER | Sum of `message.usage.cache_creation_input_tokens` |
| `pr_url` | TEXT | Linked PR URL (from the `pr-link` record, if any) |
| `pr_number` | INTEGER | Linked PR number (if any) |
Token totals cover the top-level session file only (not subagent transcripts).
```sql
-- Top 10 sessions by cache-read tokens
SELECT title, project, total_cache_read_input_tokens, last_timestamp
FROM sessions
ORDER BY total_cache_read_input_tokens DESC
LIMIT 10;
-- Sessions that opened a PR
SELECT title, pr_number, pr_url
FROM sessions
WHERE pr_url IS NOT NULL
ORDER BY last_timestamp DESC;
-- Daily output token totals by model (via flattened transcript columns)
SELECT SUBSTR(timestamp, 1, 10) AS day,
model,
SUM(usage_output_tokens) AS output_tokens
FROM transcripts
WHERE type = 'assistant' AND usage_output_tokens IS NOT NULL
GROUP BY SUBSTR(timestamp, 1, 10), model
ORDER BY SUBSTR(timestamp, 1, 10) DESC;
```
### todos
| `_source_file` | TEXT | Source filename |
| `_workspace_id` | TEXT | Workspace ID |
| `_agent_id` | TEXT | Agent ID |
| `content` | TEXT | Todo description |
| `status` | TEXT | `pending` \| `in_progress` \| `completed` |
| `activeForm` | TEXT | Display text when active |
## Examples
> **Gotcha:** GlueSQL silently ignores `ORDER BY` on a select-list alias
> (e.g. `ORDER BY n DESC` after `COUNT(*) as n`) — repeat the full expression
> in the `ORDER BY` clause instead.
### Filter by Current Project
Use the `project` column (history) or `_project` slug column (transcripts) to
limit queries to the folder you're working in:
```bash
# Only prompts from current project
ccql "SELECT display FROM history WHERE project = '$(pwd)' ORDER BY timestamp DESC LIMIT 10"
# Transcript activity for the current project (slug = pwd with '/' and '.' as '-')
ccql "SELECT _session_id, COUNT(*) as n FROM transcripts
WHERE _project = '$(pwd | tr '/.' '--')'
GROUP BY _session_id ORDER BY COUNT(*) DESC LIMIT 10"
```
### History Queries
```bash
# Recent prompts
ccql "SELECT display FROM history ORDER BY timestamp DESC LIMIT 10"
# Search prompts
ccql "SELECT display FROM history WHERE display LIKE '%error%'"
# Prompts by project
ccql "SELECT project, COUNT(*) as n FROM history GROUP BY project ORDER BY COUNT(*) DESC"
# Long prompts (likely pasted code)
ccql "SELECT LENGTH(display) as len, timestamp FROM history
ORDER BY LENGTH(display) DESC LIMIT 10"
# Recent Codex prompts
ccql "SELECT datetime(timestamp/1000, 'unixepoch') as time, display
FROM jhistory ORDER BY timestamp DESC LIMIT 10"
```
### Transcript Queries
```bash
# Tool usage stats
ccql "SELECT tool_name, COUNT(*) as n FROM transcripts
WHERE type='tool_use' GROUP BY tool_name ORDER BY COUNT(*) DESC"
# Most active sessions
ccql "SELECT _session_id, COUNT(*) as n FROM transcripts
GROUP BY _session_id ORDER BY COUNT(*) DESC LIMIT 10"
# Find sessions by top-level content (wrap in COALESCE: GlueSQL errors
# on NULL LIKE, and IS NOT NULL guards don't short-circuit)
ccql "SELECT DISTINCT _session_id FROM transcripts
WHERE COALESCE(content,'') LIKE '%compacted%'"
```
### Todo Queries
```bash
# Pending todos
ccql "SELECT content FROM todos WHERE status='pending'"
# Todo counts by status
ccql "SELECT status, COUNT(*) as n FROM todos GROUP BY status"
```
## Output Formats
```bash
ccql -f table "SELECT ..." # Pretty table (default)
ccql -f json "SELECT ..." # JSON array
ccql -f jsonl "SELECT ..." # JSON lines
ccql -f raw "SELECT ..." # Raw output
```
## Write Operations
Write operations require explicit flags for safety:
```bash
# Preview changes (dry run)
ccql --dry-run "DELETE FROM history WHERE timestamp < 1700000000000"
# Execute with backup
ccql --write "DELETE FROM history WHERE timestamp < 1700000000000"
```
## Other Commands
```bash
ccql prompts # Extract prompts with filtering
ccql sessions # List sessions
ccql search "term" # Full-text search
ccql todos --status pending # List todos
ccql stats # Usage statistics
ccql duplicates # Find repeated prompts
ccql query '.[]' history # jq-style queries
ccql query '.[]' jhistory # jq-style queries for Codex prompts
ccql query '.[]' codex_history
```
## Configuration
```bash
# Set data directory
export CLAUDE_DATA_DIR=~/.claude
export CODEX_HOME=~/.codex
# Or via flag
ccql --data-dir ~/.claude "SELECT ..."
```
## License
MIT