hawk π¦
Modern data analysis tool for structured data (JSON, YAML, CSV)
hawk combines the simplicity of awk with the power of pandas for data exploration. Unlike traditional text tools that work line-by-line, hawk understands structured data natively. Unlike heavy data science tools that require complex setup, hawk brings analytics to your terminal with a single command.
Perfect for:
- π Data Scientists: Quick CSV/JSON analysis without Python overhead
- π§ DevOps Engineers: Kubernetes YAML, Docker Compose, Terraform analysis
- π API Developers: REST response exploration and validation
- π Business Analysts: Instant insights from structured datasets
β¨ Features
- π Multi-format support: JSON, YAML, CSV with automatic detection (vs jq's JSON-only)
- πΌ Pandas-like operations: Filtering, grouping, aggregation (vs awk's line-based processing)
- π Smart output formatting: Tables, lists, JSON based on data structure
- π Fast and lightweight: Built in Rust for performance (vs pandas' Python overhead)
- π§ Developer-friendly: Perfect for DevOps, data analysis, and API exploration
- π― Type-aware: Understands numbers, strings, booleans (vs text tools' string-only approach)
- π Unified syntax: Same query language across all formats (vs format-specific tools)
π Quick Start
Installation
# Install via Homebrew (macOS/Linux)
# Verify installation
Basic Usage
# Explore data structure
# Access fields
# Filter and aggregate
π Query Syntax
Field Access
Filtering
| ) # Numeric comparison
| ) # String equality
| ) # Boolean comparison
| ) # Not equal
| ) # Nested field filtering
Field Selection
| ) # Select multiple fields
| ) # Custom field subset
Aggregation
| | ) # Sum values
| ) # Average values
| ) # Minimum value
| ) # Maximum value
Grouping
| ) # Group by field
| ) | | ) | ) # Average by group
| ) | ) # Sum by group
Complex Queries
# Multi-step analysis
| ) | ) | )
# Multi-level array processing
| )
# Field selection with filtering
| ) | )
# Data exploration workflow
| | ) # Filter active records
| ) |
π― Use Cases
API Response Analysis
# Analyze GitHub API response
|
# Extract specific fields
|
DevOps & Infrastructure
# Kubernetes resource analysis
# AWS EC2 analysis
# Docker Compose services
Data Analysis
# Sales data analysis
# Multi-field analysis
# Log analysis
Configuration Management
# Ansible inventory analysis
# Terraform state analysis
π Supported Formats
JSON
YAML
users:
- name: Alice
age: 30
department: Engineering
- name: Bob
age: 25
department: Marketing
CSV
name,age,department
Alice,30,Engineering
Bob,25,Marketing
All formats support the same query syntax!
π¨ Output Formats
Smart Auto-Detection (default)
Explicit Format Control
π οΈ Advanced Examples
Complex Data Analysis
# Multi-step pipeline analysis
# Nested data exploration
# Cross-format analysis
Real-world DevOps Scenarios
# Find all running containers with high memory usage
# Analyze Kubernetes deployments by namespace
# AWS EC2 cost analysis
# Extract configuration errors from logs
Data Processing Workflows
# 1. Explore structure
# 2. Filter relevant data
# 3. Multi-level processing
# 4. Group and analyze
# 5. Export results
π§ Installation & Setup
Homebrew (Recommended)
# Install via Homebrew
# Or install from the main repository
Build from Source
# Prerequisites: Rust 1.70 or later
# Add to PATH
Binary Releases
Download pre-built binaries from GitHub Releases
- Linux (x86_64)
- macOS (Intel & Apple Silicon)
π Documentation
Command Line Options
Query Language Reference
| Operation | Syntax | Example |
|---|---|---|
| Field access | .field |
.name |
| Array index | .array[0] |
.users[0] |
| Array iteration | .array[] |
.users[] |
| Multi-level arrays | .array[].nested[] |
.Reservations[].Instances[] |
| Field selection | | select_fields(field1,field2) |
| select_fields(name,age) |
| Filtering | | select(.field > value) |
| select(.age > 30) |
| Nested filtering | | select(.nested.field == value) |
| select(.State.Name == "running") |
| Grouping | | group_by(.field) |
| group_by(.department) |
| Counting | | count |
.users | count |
| Aggregation | | sum/avg/min/max(.field) |
| avg(.salary) |
| Info | | info |
. | info |
Supported Operators
- Comparison:
>,<,==,!= - Aggregation:
count,sum,avg,min,max - Grouping:
group_by - Filtering:
select
π€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
Running Tests
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Inspired by the simplicity of
awkand the power ofpandas - Built with the amazing Rust ecosystem
- Special thanks to the
serde,clap, andcsvcrate maintainers
π Related Tools & Comparison
| Tool | Best For | Limitations | hawk Advantage |
|---|---|---|---|
| awk | Text processing, log parsing | Line-based, no JSON/YAML support | Structured data focus, type-aware operations |
| jq | JSON transformation | JSON-only, complex syntax for data analysis | Multi-format, pandas-like analytics |
| pandas | Heavy data science | Requires Python setup, overkill for CLI | Lightweight, terminal-native |
| sed/grep | Text manipulation | No structured data understanding | Schema-aware processing |
Why Choose hawk?
π― For structured data analysis, hawk fills the gap between simple text tools and heavy data science frameworks:
# awk: Limited structured data support
# jq: JSON-only, verbose for analytics
# hawk: Unified, intuitive syntax across all formats
π pandas power, awk simplicity:
# Complex analytics made simple
# vs pandas: requires Python script with imports, DataFrame setup, etc.
π§ DevOps & IaC optimized:
# Kubernetes config analysis (YAML native)
# vs jq: requires YAMLβJSON conversion first
Happy data exploring with hawk! π¦
For questions, issues, or feature requests, please visit our GitHub repository.