logdive
Fast, self-hosted query engine for structured JSON logs.
A single Rust binary that ingests structured logs, indexes them locally in SQLite, and lets you query them instantly from the CLI or an HTTP API. No infrastructure, no daemons, no cloud.
# Ingest JSON, logfmt, or plain-text logs from a file or stdin.
|
# Tail a growing log file in real time (Ctrl-C to stop).
# Query with AND and OR.
# Prune old entries to keep the index lean.
# Inspect the index.
# Expose a read-only HTTP API for remote querying.
Status: v0.2.0. Core feature set complete and tested. Adds multi-format ingestion, follow mode, OR queries, pruning, a Docker image, and a versioned HTTP API. See v1 non-goals for what is explicitly out of scope.
Table of contents
- Why logdive
- Install
- Quick start
- The
logdiveCLI - Running with Docker
- The
logdive-apiHTTP server - Query language reference
- Configuration reference
- Architecture
- Performance
- Development
- v1 non-goals
- License
Why logdive
Every backend engineer has hit the same wall: the app is producing JSON logs, something went wrong in production, and the options are grep, an unreadable chain of jq pipes, or spinning up a full observability stack (Loki, Datadog, Elastic) that requires infrastructure, cost, and configuration you don't have time for in a side project or small team.
logdive sits in the gap. It's a single binary you drop anywhere. Point it at a log file — or pipe Docker output into it — and you get a fast, queryable index on your local machine. You can ask it level=error AND service=payments last 2h and get results in milliseconds. You can expose a lightweight HTTP endpoint so a minimal UI or a curl script can query it remotely.
The target user is a backend engineer who wants jq with memory, filters, and time ranges — without YAML files, without a running daemon they didn't ask for, without a monthly bill.
Install
logdive ships two binaries: logdive (CLI) and logdive-api (HTTP server). They share a database format; you can ingest with the CLI and serve queries over HTTP, or vice versa.
From crates.io
Both binaries land in ~/.cargo/bin/ — make sure it's on your PATH.
From Docker
Official multi-arch images (linux/amd64 and linux/arm64) are available on GHCR:
See Running with Docker for usage.
From prebuilt binaries
Download the latest release for your platform from the GitHub Releases page. Binaries are currently built for:
- Linux x86_64
- macOS arm64
Extract the archive and move the binaries to any directory on your PATH.
From source
The compiled binaries will be at target/release/logdive and target/release/logdive-api.
MSRV: Rust 1.85 (edition 2024).
Quick start
The examples/ directory ships with two sample log files. Let's ingest them and run a few queries.
# Ingest both sample files into a throwaway database.
# See what we've got.
# Find every error across both files.
# Find errors or warnings from a specific service.
# Find slow nginx requests.
# Get structured output for further processing.
|
# Prune entries older than 7 days.
See examples/README.md for a longer walkthrough of what these files contain and what queries are interesting against them.
The logdive CLI
Four subcommands: ingest, query, stats, prune.
logdive ingest
Reads log lines from a file or stdin, parses them, and inserts them into the index.
# JSON (default)
# logfmt
# Plain text (whole line becomes `message`)
# Pipe from any source
|
|
# Tail a growing file in real time
Flags:
--file <PATH>/-f— Read from a file. Mutually exclusive with stdin.--format json|logfmt|plain— Input format. Defaultjson.--tag <TAG>/-t— Attach a tag to every ingested entry that does not already contain atagfield.--timestamp-now— Assign the current UTC time (RFC 3339) to entries that lack atimestampfield, instead of skipping them. Useful for formats that do not include timestamps.--follow— Keep the file open and ingest new lines as they are appended, similar totail -f. Detects log rotation (inode change) and truncation and reopens the file automatically. Ctrl-C exits cleanly. Requires--file.--db <PATH>— Override the default~/.logdive/index.dblocation (global, applies to all subcommands). Also settable via$LOGDIVE_DB.
Behavior:
- Deduplication: Every row is fingerprinted with a blake3 hash. Re-ingesting the same file (or a log rotation producing overlapping lines) results in zero duplicate rows.
- Graceful skip: Lines that cannot be parsed in the selected format are counted and skipped, not fatal. Blank lines are silently ignored.
- No-timestamp skip: By default, lines without a
timestampfield are skipped. Pass--timestamp-nowto assign the current UTC time to such entries instead. - Progress: TTY-aware status on stderr. A final summary always prints inserted / deduplicated / skipped counts.
logdive query
Runs a query against the index and renders matching entries.
Flags:
--format pretty|json— Output format. Defaultpretty(colored, human-readable).jsonis newline-delimited, pipe-friendly forjq.--limit <N>— Maximum results to return. Default1000. Use0for unlimited.--db <PATH>— Database path override. Also settable via$LOGDIVE_DB.
Pretty output honors NO_COLOR and auto-strips ANSI when piped. JSON output is identical in shape to the HTTP API's /query response.
See the Query language reference for the full grammar and operator list.
logdive stats
Reports aggregate metadata about the index.
Sample output:
logdive index: /home/user/.logdive/index.db
Entries: 42,317
Time range: 2026-03-14T08:22:01Z → 2026-04-22T19:45:03Z
Tags: api, nginx, payments, worker, (untagged)
DB size: 8.4 MB (8,400,000 bytes)
Errors out (exit code 1) if the configured index file does not exist. This catches typos in --db paths early.
logdive prune
Deletes entries from the index that fall outside a retention window, then vacuums the database file to reclaim disk space.
# Delete everything older than 30 days.
# Delete everything before a specific date.
# Skip the interactive confirmation prompt.
Flags:
--older-than <DURATION>— Delete entries older than this duration. Format: a positive integer followed bym(minutes),h(hours), ord(days). Examples:30d,24h,90m. Mutually exclusive with--before.--before <DATETIME>— Delete entries with a timestamp before this datetime. Accepts the same three formats as thesincequery operator (RFC 3339, ISO naive datetime, ISO date). Mutually exclusive with--older-than.--yes— Skip the interactive[y/N]confirmation. Useful in scripts and cron jobs.--db <PATH>— Database path override. Also settable via$LOGDIVE_DB.
By default prune shows the number of rows that would be deleted and asks for confirmation before proceeding. If the count is zero it exits immediately with "Nothing to prune."
Running with Docker
Official images for linux/amd64 and linux/arm64 are published to GHCR on every merge to main and on every version tag.
# or pin to a specific version:
Start the API server
# Create a named volume for the index.
# Start the server. The index is auto-created on first run.
Ingest logs with the CLI
The default entrypoint is logdive-api. Override it with --entrypoint logdive to run the CLI against the same volume:
Environment variables
The image pre-sets two variables for container-native behavior:
LOGDIVE_DB=/data/index.db— points both binaries at the persistent volume.LOGDIVE_API_HOST=0.0.0.0— binds the API to all container interfaces so-p 4000:4000works.
Override any variable with -e:
Health check
The image declares a Docker HEALTHCHECK on GET /version. No database access is involved — the endpoint returns compile-time constants and is always available once the process is up.
The logdive-api HTTP server
A read-only HTTP server for remote querying. Useful when you want a browser-based UI, a CI check, or a shell one-liner hitting a centrally hosted index.
Flags (with environment-variable fallbacks):
--db <PATH>/$LOGDIVE_DB— Database to serve. Defaults to~/.logdive/index.db.--port <N>/$LOGDIVE_API_PORT— Port to listen on. Default 4000.--host <HOST>/$LOGDIVE_API_HOST— Host to bind. Default127.0.0.1(loopback only). Set to0.0.0.0to expose beyond localhost.--cors-origins <ORIGINS>/$LOGDIVE_API_CORS_ORIGINS— Comma-separated list of allowed CORS origins. Use*to allow any origin. Omit to disable CORS (same-origin only). Invalid values cause a startup error.
# Allow a specific frontend origin.
# Allow any origin (useful for local development).
Endpoints
GET /query
Runs a query and returns matching entries as newline-delimited JSON.
Query parameters:
q(required) — Query expression. URL-encoded.limit(optional) — Maximum results. Default 1000.0means unlimited.
Response:
- Status 200:
Content-Type: application/x-ndjson, one JSON object per line. - Status 400:
{"error": "..."}on missing/emptyqor a malformed query expression. - Status 500:
{"error": "internal server error"}on storage failures (logged server-side).
|
GET /stats
Returns aggregate metadata as a single JSON object.
|
Response shape:
null in the tags array represents untagged rows. min_timestamp and max_timestamp are null on an empty index.
GET /version
Returns the server's version and supported capabilities as a JSON object. Designed for client-side feature detection — call this first to discover which formats and endpoints the running server supports.
|
Response shape:
Always returns 200 OK. Never touches the database.
Security
- Read-only: The API opens the database with
SQLITE_OPEN_READ_ONLY. Writes are rejected at the SQLite level. - No authentication in v0.2: The server assumes the network layer handles access control. Do not expose it publicly without a reverse proxy providing authentication.
- Auto-creates empty index on first run: If the configured database does not exist, the server creates it with an initialized schema and starts cleanly, returning zero results until logs are ingested via the CLI. Genuinely bad paths (wrong directory, permission denied) still cause a startup failure with a clear error message.
- CORS disabled by default: Cross-origin requests are blocked unless
--cors-originsis explicitly configured. - Graceful shutdown: Ctrl-C and SIGTERM (Unix) trigger a clean shutdown.
Query language reference
logdive queries are a small expression language supporting AND within groups and OR between groups.
Grammar
query := and_expr (OR and_expr)*
and_expr := clause (AND clause)*
clause := field OP value
| field CONTAINS string
| TIME_RANGE
field := [a-zA-Z_][a-zA-Z0-9_.]*
OP := "=" | "!=" | ">" | "<"
value := string | number | bool
string := '"' .* '"' | bare_word
TIME_RANGE := "last" duration | "since" datetime
duration := number ("m" | "h" | "d")
Keywords (AND, OR, CONTAINS, last, since, true, false) are case-insensitive.
Fields
Two kinds of fields are supported:
- Known fields —
timestamp,level,message,tag. These are indexed columns on the SQLite table. Queries on them are very fast. - Unknown fields — anything else. These are read from the JSON
fieldsblob via SQLite'sjson_extract(). Slower than known-field queries but works across arbitrary JSON shapes.
Field names must match [a-zA-Z_][a-zA-Z0-9_.]*. Nested access uses dot notation (e.g. user.id).
Operators
| Operator | Meaning | Example |
|---|---|---|
= |
Equals | level=error |
!= |
Not equals | level!=debug |
> |
Greater than | duration_ms > 1000 |
< |
Less than | status < 500 |
CONTAINS |
Substring match (case-insensitive) | message contains "timeout" |
last |
Time window ending now | last 2h |
since |
Time window starting at a given datetime | since 2026-01-01 |
Comparisons work on strings, integers, floats, and booleans. true/false are stored as 1/0.
Time ranges
last takes a number followed by a unit:
m— minutes (last 30m)h— hours (last 2h)d— days (last 7d)
since accepts three formats:
- RFC 3339 / ISO 8601 with timezone:
since 2024-01-01T10:00:00Z - ISO naive datetime (interpreted as UTC):
since "2024-01-01 10:00:00"orsince 2024-01-01T10:00:00 - ISO date (interpreted as UTC midnight):
since 2024-01-01
Timestamps in the index are compared as text. This is correct for ISO-8601-shaped timestamps because they sort lexicographically in chronological order.
Combining clauses
Clauses are joined with AND (case-insensitive). Since v0.2.0, groups of AND-clauses can be separated with OR to match entries satisfying any group. AND binds more tightly than OR. Parenthesised expressions are not yet supported — see v1 non-goals.
# AND only.
# OR between two simple clauses.
# AND within each OR branch.
# Equivalent to: (level=error AND service=payments) OR (level=warn AND tag=worker)
Quoting
Bare words work for simple values. Use double quotes for anything containing spaces, punctuation, or a value that starts with a digit and contains letters.
level=error # bare word
message contains "bad request" # quotes needed for space
version="3beta" # quotes needed for digit-letter mix
since "2024-01-01 10:00:00" # quotes needed for space
Examples
# All errors.
# Errors or warnings.
# Errors from the payments service in the last 2 hours.
# Anything mentioning "timeout" in the last day.
# Slow requests over 500ms.
# Everything from a specific user ID.
# Everything from a specific time range.
# Everything that isn't a health check.
# Errors from payments OR any warn from worker, last hour.
Configuration reference
All configuration is via command-line flags, with environment-variable fallbacks for convenience in containerized deployments.
Environment variables
| Variable | Applies to | Purpose |
|---|---|---|
LOGDIVE_LOG |
both binaries | Verbosity filter for internal diagnostics (passed to tracing_subscriber::EnvFilter). Default warn. Try info or debug for troubleshooting. |
LOGDIVE_DB |
both binaries | Database path fallback for --db. CLI flag takes precedence when both are set. Default ~/.logdive/index.db. |
LOGDIVE_API_PORT |
logdive-api |
Port fallback for --port. Default 4000. |
LOGDIVE_API_HOST |
logdive-api |
Bind host fallback for --host. Default 127.0.0.1. |
LOGDIVE_API_CORS_ORIGINS |
logdive-api |
Allowed CORS origins fallback for --cors-origins. Comma-separated list or *. Default: empty (CORS disabled). |
NO_COLOR |
logdive query |
Standard NO_COLOR convention — suppresses ANSI color output when set. |
HOME |
both binaries | Used to resolve the default ~/.logdive/index.db path on POSIX. |
Default paths
- Index database:
~/.logdive/index.db. Override with--dbor$LOGDIVE_DB. - Parent directory: Auto-created on first
logdive ingest(CLI) or on firstlogdive-apistartup when the database path does not yet exist.
Architecture
logdive is a three-crate Rust workspace:
logdive-core— Pure library. Owns the log entry type, the multi-format parser (JSON, logfmt, plain), the SQLite-backed indexer, the query AST + parser (AND + OR), and the query executor. No I/O at the module level. Publishable to crates.io as a reusable library.logdive— The CLI binary. Thin wrapper aroundlogdive-corethat addsclapparsing, follow-mode file tailing, progress output, and rendering.logdive-api— The HTTP server binary. Axum router overlogdive-core, opened in read-only mode.
Key architectural choices (see the project's design document for full rationale):
- SQLite via
rusqlitewith thebundledfeature — zero infrastructure, ships inside the binary, battle-tested. - Hybrid storage — known fields (
timestamp,level,message,tag) are real indexed columns; everything else is stored in a JSON blob and queried viajson_extract(). - Hand-written recursive descent query parser — ~300 lines of pure Rust enums, no parser combinator library, supports AND + OR with correct precedence.
- Blake3 row hashing for deduplication —
INSERT OR IGNOREon a unique hash column means re-ingesting a file is free. - Batched inserts at 1000 rows per transaction.
- Separate binaries — users who only want the CLI don't pay the Axum + Tokio compile cost.
Performance
Benchmarks live in crates/core/benches/ and run via:
Representative numbers on a modern laptop (Acer Nitro 5, Linux):
| Operation | Throughput / Latency |
|---|---|
| Ingestion, batched insert (10k rows) | ~210k lines/sec |
| Ingestion, parse + insert end-to-end (10k rows) | ~166k lines/sec |
| Query on known field, empty result (100k rows) | ~17 μs |
| Query on known field, 25% match (100k rows, LIMIT 1000) | ~39 ms |
| Query on JSON field, 25% match (100k rows, LIMIT 1000) | ~3.6 ms |
| Query on JSON field, 0% match — full scan (100k rows) | ~68 ms |
CONTAINS full-table scan (100k rows) |
~36–40 ms |
3-clause AND chain (100k rows) |
~22 ms |
Numbers from criterion benchmarks — run cargo bench for your own baseline.
Release-profile binary sizes:
logdive: 3.7 MBlogdive-api: 4.1 MB
Targets: both binaries under 10 MB. Run scripts/check-binary-size.sh to verify.
Development
# Clone and build.
# Run tests.
# Lints and formatting (run before every commit).
# Run the CLI during development.
# Run the API.
# Build the Docker image locally.
MSRV: Rust 1.85. Edition 2024.
Changelog
See CHANGELOG.md for release notes.
Contributing
Bug reports and pull requests welcome. Before submitting a PR, please ensure:
cargo test --workspacepasses.cargo clippy --workspace --all-targets -- -D warningsis clean.cargo fmt --all --checkis clean.- Any new feature lands behind a discussion in an issue first, to avoid scope creep against the v1 non-goals.
v1 non-goals
The following are intentionally out of scope and may or may not land in future versions:
- Parenthesised query expressions —
(level=error OR level=warn) AND service=payments— AND + OR without grouping shipped in v0.2; full parenthesisation is deferred to v0.3. - Authentication on the HTTP API — the API trusts its network layer.
- Ingestion over HTTP — the API is read-only. Ingestion goes through the CLI.
- Multi-machine or networked indexes — single-host only.
- Log shipping, agents, or daemons — logdive is a tool, not a service.
- A browser UI — curl and the CLI are the intended interfaces. Third parties can build UIs against the HTTP API.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.