dkit
Swiss army knife for data format conversion and querying.
Convert between JSON, CSV, YAML, TOML, XML, TSV, and MessagePack with a single CLI. Query nested data, compare files, preview as tables, and pipe everything together.
Quick Start
# Install
# Convert JSON to YAML
# Query nested data
# Preview CSV as a table
Installation
From crates.io
From source
Supported Formats
| Format | Extensions | Read | Write |
|---|---|---|---|
| JSON | .json |
O | O |
| JSONL | .jsonl, .ndjson |
O | O |
| CSV | .csv |
O | O |
| TSV | .tsv |
O | O |
| YAML | .yaml, .yml |
O | O |
| TOML | .toml |
O | O |
| XML | .xml |
O | O |
| MessagePack | .msgpack |
O | O |
| Parquet | .parquet |
O | O |
| Excel | .xlsx |
O | - |
| SQLite | .db, .sqlite |
O | - |
| Markdown | .md |
- | O |
| HTML | - | O |
All conversion paths between supported read/write formats are available. Excel and SQLite are input-only formats. Markdown and HTML are output-only formats for table rendering.
Commands
convert — Format conversion
# Basic conversion
# XML conversion
# JSONL (JSON Lines) conversion
# Output to file
# Batch conversion
# Pipe from stdin
|
|
# Options
# Markdown/HTML table output
# Excel (.xlsx) input
# SQLite (.db, .sqlite) input
# Encoding support
# Parquet (.parquet) input/output
# Streaming mode for large files (chunk-based processing)
query — Data querying
# Field access
# Nested path
# Array iteration
# Negative indexing
Query syntax:
| Syntax | Description |
|---|---|
.field |
Object field access |
.field.sub |
Nested field access |
.[0] |
Array index (0-based) |
.[-1] |
Negative index (from end) |
.[] |
Iterate all elements |
where .field == value |
Filter with comparison (==, !=, >, <, >=, <=) |
where .field contains "str" |
Filter with string operators (contains, starts_with, ends_with) |
select .field1, .field2 |
Select specific fields |
sort .field / sort .field desc |
Sort by field (ascending/descending) |
limit N |
Limit number of results |
| |
Pipeline chaining (pass results between operations) |
# Advanced query examples
# Output query results in different formats
Aggregate functions:
| Function | Description | Example |
|---|---|---|
count |
Count elements | .[] | count |
count field |
Count non-null values | .[] | count email |
sum field |
Sum numeric field | .[] | sum price |
avg field |
Average of numeric field | .[] | avg score |
min field |
Minimum value | .[] | min price |
max field |
Maximum value | .[] | max price |
distinct field |
Unique values | .[] | distinct category |
# Aggregate examples
# GROUP BY examples
Built-in functions (usable in select):
| Category | Functions |
|---|---|
| String | upper(), lower(), trim(), ltrim(), rtrim(), length(), substr(), concat(), replace(), split() |
| Math | round(), ceil(), floor(), abs(), sqrt(), pow() |
| Date | now(), date(), year(), month(), day() |
| Type | to_int(), to_float(), to_string(), to_bool() |
| Util | coalesce(), if_null() |
# Function examples
view — Table preview
# View as table
# Limit rows
# Navigate nested data
# Select columns
# Table customization
# Output in different formats
stats — Data statistics
# Show overall statistics
# Navigate to nested data
# Statistics for a specific column (numeric: sum, avg, median, std, p25, p75)
# String column stats (unique count, length distribution, top values)
# Histogram visualization
# Output formats
schema — Data structure inspection
# Show schema as a tree
# From stdin
|
diff — Compare data files
# Compare same-format files
# Cross-format comparison
# Compare nested path only
# Quiet mode (exit code: 0=same, 1=different)
&& ||
# Comparison modes
# Output formats
# Array comparison strategies
# Ignore options
validate — JSON Schema validation
# Validate data against JSON Schema
# Quiet mode (only valid/invalid)
# From stdin
|
sample — Random/stratified sampling
# Random sampling
# Systematic sampling (every k-th element)
# Stratified sampling (proportional per group)
# Output format
flatten / unflatten — Flatten/restore nested structures
# Flatten nested JSON
# Unflatten (restore nested structure)
# Roundtrip
&&
config — Configuration management
# Show current effective configuration (with source information)
# Create a default user config file
# Create a project-level config file (.dkit.toml in current directory)
Config file priority (highest to lowest):
- CLI options
- Project config (
.dkit.tomlin current directory) - User config (
$XDG_CONFIG_HOME/dkit/config.tomlor~/.dkit.toml) - Defaults
alias — Command aliases
# List all aliases (built-in + user-defined)
# Register a user alias
# Remove a user alias
# Use a built-in alias (j2c, c2j, j2y, y2j, j2t, t2j, c2y, y2c)
completions — Shell completion scripts
# Generate and install shell completions
&&
&&
Watch mode
convert and view support --watch to automatically re-run on file changes:
merge — Combine multiple files
# Merge JSON files
# Merge CSV files and convert to JSON
# Merge YAML configs
Comparison with Existing Tools
| Feature | dkit | jq | miller | yq |
|---|---|---|---|---|
| JSON | O | O | O | O |
| CSV/TSV | O | X | O | X |
| YAML | O | X | X | O |
| TOML | O | X | X | X |
| XML | O | X | X | O |
| MessagePack | O | X | X | X |
| Parquet | O | X | X | X |
| Excel (.xlsx) input | O | X | X | X |
| SQLite input | O | X | X | X |
| Markdown/HTML output | O | X | X | X |
| Cross-format convert | O | X | Partial | Partial |
| Table output | O | X | O | X |
| Query (where/select/sort) | O | O | O | O |
| Aggregate functions | O | O | O | X |
| GROUP BY | O | Partial | O | X |
| Built-in functions | O | O | O | X |
| Pipeline chaining | O | O | O | X |
| Streaming (large files) | O | X | O | X |
| Statistics | O | X | O | X |
| Schema inspection | O | X | X | X |
| File merging | O | X | O | X |
| File diff (modes/formats) | O | X | X | X |
| JSON Schema validation | O | X | X | X |
| Random/stratified sampling | O | X | X | X |
| Flatten/unflatten | O | X | X | X |
| Multi-encoding support | O | X | X | X |
| Watch mode (auto re-run) | O | X | X | X |
| Config file | O | X | X | X |
| Command aliases | O | X | X | X |
| Shell completions | O | O | O | O |
| Single binary | O | O | O | O |
dkit focuses on seamless conversion between all supported formats with a unified query syntax, eliminating the need for separate tools per format.
Building from Source
Contributing
Contributions are welcome! Please see the GitHub Issues for planned features and known issues.
- Fork the repository
- Create a feature branch (
git checkout -b feat/my-feature) - Commit your changes
- Push to the branch and open a Pull Request
Please ensure cargo test and cargo clippy -- -D warnings pass before submitting.