hawk π¦
Modern data analysis tool for structured data and text files (JSON, YAML, CSV, Text)
hawk combines the simplicity of awk with the power of pandas for data exploration. Unlike traditional text tools that work line-by-line, hawk understands both structured data and plain text natively. Unlike heavy data science tools that require complex setup, hawk brings analytics to your terminal with a single command.
Perfect for:
- π Data Scientists: Quick CSV/JSON analysis without Python overhead
- π§ DevOps Engineers: Kubernetes YAML, Docker Compose, log analysis
- π API Developers: REST response exploration and validation
- π Business Analysts: Instant insights from structured datasets
- π System Administrators: Log file analysis and text processing
β¨ Features
- π Universal format support: JSON, YAML, CSV, and plain text with automatic detection
- πΌ Pandas-like operations: Filtering, grouping, aggregation, and string manipulation
- π Smart output formatting: Colored tables, lists, JSON based on data structure
- π Fast and lightweight: Built in Rust for performance
- π§ Developer-friendly: Perfect for DevOps, data analysis, and API exploration
- π― Type-aware: Understands numbers, strings, booleans with intelligent conversion
- π Unified syntax: Same query language across all formats
- π§΅ String operations: Powerful text processing capabilities
- π Statistical functions: Built-in median, stddev, unique, sort operations
- π¨ Beautiful output: Automatic color coding with TTY detection
π Quick Start
Installation
# Install via Homebrew (macOS/Linux)
# Install via Cargo (if Rust is installed)
# Verify installation
Basic Usage
# Explore data structure
# Access fields
# Filter and aggregate
# Process text files (NEW in v0.2.0!)
π Query Syntax
Field Access
Text Processing (NEW in v0.2.0!)
# String operations
| | ) # Convert to uppercase
| | ) # Convert to lowercase
| | ) # Remove whitespace (both ends)
| | ) # Remove leading whitespace
| | ) # Remove trailing whitespace
| | ) # Get string length
| | ) # Reverse string
# String manipulation
| | )) # Replace text
| | )) # Extract substring
| | )) # Split by delimiter
| ) # Join array elements
# String filtering
| | )) # Contains pattern
| | )) # Starts with pattern
| | )) # Ends with pattern
Statistical Operations (NEW in v0.2.0!)
| | | | |
# With field specification
| ) # Unique departments
| ) # Median of values
| ) # Sort by timestamp
Filtering
| ) # Numeric comparison
| ) # String equality
| ) # Boolean comparison
| ) # Not equal
| ) # Nested field filtering
# Complex string filtering (NEW!)
| | | )) # Case-insensitive search
| | ) # Filter by string length
Data Transformation (NEW in v0.2.0!)
# Transform data with map
| | ) # Normalize emails
| | )) # Extract timestamps
| | | ) # Clean and normalize
# Complex transformations
| | )) # Change extensions
| | ) | ) # Count words per message
Field Selection
| ) # Select multiple fields
| ) # Custom field subset
Aggregation
| | ) # Sum values
| ) # Average values
| ) # Minimum value
| ) # Maximum value
Grouping
| ) # Group by field
| ) | | ) | ) # Average by group
| ) | ) # Sum by group
Complex Queries
# Multi-step analysis
| ) | ) | )
# Multi-level array processing
| )
# Text processing pipeline (NEW!)
| | )) | | )) | |
# Mixed data and text analysis
| | )) | |
π― Use Cases
Log File Analysis (NEW in v0.2.0!)
# Extract error logs with timestamps
# Analyze log levels
# Find unique IP addresses in access logs
# Count warnings by hour
Text Data Processing
# Clean and normalize text data
# Remove different types of whitespace
# Extract file extensions
# Join processed data
# Count words in documents
# Find long lines
API Response Analysis
# Analyze GitHub API response
|
# Extract specific fields
|
DevOps & Infrastructure
# Kubernetes resource analysis
# AWS EC2 analysis
# Docker Compose services
# Configuration file analysis
Data Analysis
# Sales data analysis
# Statistical analysis (NEW!)
# Data cleaning and normalization
π Supported Formats
JSON
YAML
users:
- name: Alice
age: 30
department: Engineering
- name: Bob
age: 25
department: Marketing
CSV
name,age,department
Alice,30,Engineering
Bob,25,Marketing
Plain Text (NEW in v0.2.0!)
2024-01-15 09:00:01 INFO Application started
2024-01-15 09:00:02 ERROR Failed to connect
2024-01-15 09:00:03 WARN High memory usage
All formats support the same query syntax!
π¨ Output Formats
Smart Auto-Detection (default)
Explicit Format Control
Colored Output (NEW in v0.2.0!)
- Automatic TTY detection: Colors in terminal, plain text in pipes
- Beautiful tables: Headers in blue, numbers in green, booleans in yellow
- Readable JSON: Syntax highlighting for better readability
- NO_COLOR support: Respects NO_COLOR environment variable
π οΈ Advanced Examples
Complex Data Analysis
# Multi-step pipeline analysis
# Nested data exploration
# Cross-format analysis
Real-world Log Analysis (NEW!)
# Extract error timestamps and analyze patterns
# Find most common error messages
# Analyze response times from access logs
# Extract unique user agents
Text Processing Workflows
# 1. Clean configuration files
# 2. Analyze code files
# 3. Process CSV-like text data
# 4. Extract and analyze patterns
Data Processing Workflows
# 1. Explore structure
# 2. Filter relevant data
# 3. Clean and normalize (NEW!)
# 4. Statistical analysis (NEW!)
# 5. Export results
π§ Installation & Setup
Homebrew (Recommended)
# Install via Homebrew
# Or install from the main repository
Cargo (Rust Package Manager)
Build from Source
# Prerequisites: Rust 1.70 or later
# Add to PATH
Binary Releases
Download pre-built binaries from GitHub Releases
- Linux (x86_64)
- macOS (Intel & Apple Silicon)
π Documentation
Command Line Options
Query Language Reference
| Operation | Syntax | Example |
|---|---|---|
| Field access | .field |
.name |
| Array index | .array[0] |
.users[0] |
| Array iteration | .array[] |
.users[] |
| Multi-level arrays | .array[].nested[] |
.Reservations[].Instances[] |
| Text processing | . | map(. | operation) |
. | map(. | upper) |
| String filtering | . | select(. | contains("text")) |
. | select(. | contains("ERROR")) |
| String manipulation | . | map(. | replace("a", "b")) |
. | map(. | trim) |
| Field selection | | select_fields(field1,field2) |
| select_fields(name,age) |
| Filtering | | select(.field > value) |
| select(.age > 30) |
| Nested filtering | | select(.nested.field == value) |
| select(.State.Name == "running") |
| Grouping | | group_by(.field) |
| group_by(.department) |
| Counting | | count |
.users | count |
| Aggregation | | sum/avg/min/max(.field) |
| avg(.salary) |
| Statistics | | median/stddev/unique/sort |
| median |
| Info | | info |
. | info |
String Operations (NEW in v0.2.0!)
| Operation | Syntax | Description |
|---|---|---|
upper |
. | map(. | upper) |
Convert to uppercase |
lower |
. | map(. | lower) |
Convert to lowercase |
trim |
. | map(. | trim) |
Remove whitespace (both ends) |
trim_start |
. | map(. | trim_start) |
Remove leading whitespace |
trim_end |
. | map(. | trim_end) |
Remove trailing whitespace |
length |
. | map(. | length) |
Get string length |
reverse |
. | map(. | reverse) |
Reverse string |
contains(pattern) |
. | select(. | contains("text")) |
Check if contains pattern |
starts_with(pattern) |
. | select(. | starts_with("pre")) |
Check if starts with pattern |
ends_with(pattern) |
. | select(. | ends_with("suf")) |
Check if ends with pattern |
replace(old, new) |
. | map(. | replace("a", "b")) |
Replace text |
substring(start, len) |
. | map(. | substring(0, 5)) |
Extract substring |
split(delimiter) |
. | map(. | split(",")) |
Split by delimiter |
join(delimiter) |
.array[] | join(", ") |
Join array elements with delimiter |
Statistical Operations (NEW in v0.2.0!)
| Operation | Syntax | Description |
|---|---|---|
unique |
. | unique |
Remove duplicates |
sort |
. | sort |
Sort values |
median |
. | median |
Calculate median |
stddev |
. | stddev |
Calculate standard deviation |
length |
. | length |
Get array length |
Supported Operators
- Comparison:
>,<,==,!= - Aggregation:
count,sum,avg,min,max - Statistics:
median,stddev,unique,sort,length - Grouping:
group_by - Filtering:
select - Transformation:
map
π Whatβs New in v0.2.0
π Major Features
- Plain Text Support: Process log files, configuration files, and any text data
- String Operations: Complete set of string manipulation functions
- Statistical Functions: Built-in median, standard deviation, unique, and sort operations
- Enhanced map() Function: Transform data with powerful string operations
- Colored Output: Beautiful, readable output with automatic TTY detection
π§ Improvements
- Better error messages with detailed context
- Improved pipeline processing with proper parentheses handling
- Enhanced type inference for CSV data
- More robust file format detection
π New Use Cases Enabled
- Log file analysis and monitoring
- Text data cleaning and normalization
- Statistical analysis of numeric data
- Complex data transformation pipelines
- Configuration file processing
π€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
Running Tests
π License
This project is licensed under the MIT License - see the file for details.
π Acknowledgments
- Inspired by the simplicity of
awkand the power ofpandas - Built with the amazing Rust ecosystem
- Special thanks to the
serde,clap,csv, andtermcolorcrate maintainers
π Related Tools & Comparison
| Tool | Best For | Limitations | hawk Advantage |
|---|---|---|---|
| awk | Text processing, log parsing | Line-based, no JSON/YAML support | Structured data focus, type-aware operations, string functions |
| jq | JSON transformation | JSON-only, complex syntax for data analysis | Multi-format, pandas-like analytics, text processing |
| pandas | Heavy data science | Requires Python setup, overkill for CLI | Lightweight, terminal-native, instant startup |
| sed/grep | Text manipulation | No structured data understanding | Schema-aware processing, statistical functions |
Why Choose hawk?
π― For structured data analysis, hawk fills the gap between simple text tools and heavy data science frameworks:
# awk: Limited structured data support
# jq: JSON-only, verbose for analytics
# hawk: Unified, intuitive syntax across all formats
π pandas power, awk simplicity:
# Complex analytics made simple
# vs pandas: requires Python script with imports, DataFrame setup, etc.
π§ DevOps & log analysis optimized:
# Kubernetes config analysis (YAML native)
# Log analysis (NEW in v0.2.0!)
Happy data exploring with hawk! π¦
For questions, issues, or feature requests, please visit our GitHub repository.