# hawk π¦
Modern data analysis tool for structured data and text files (JSON, YAML, CSV, Text)
[](https://www.rust-lang.org/)
[](LICENSE)
[](https://crates.io/crates/hawk-data)
**hawk** combines the simplicity of `awk` with the power of `pandas` for data exploration. Unlike traditional text tools that work line-by-line, hawk understands both structured data and plain text natively. Unlike heavy data science tools that require complex setup, hawk brings analytics to your terminal with a single command.
**Perfect for**:
- π **Data Scientists**: Quick CSV/JSON analysis without Python overhead
- π§ **DevOps Engineers**: Kubernetes YAML, Docker Compose, log analysis
- π **API Developers**: REST response exploration and validation
- π **Business Analysts**: Instant insights from structured datasets
- π **System Administrators**: Log file analysis and text processing
## β¨ Features
- π **Universal format support**: JSON, YAML, CSV, **and plain text** with automatic detection
- πΌ **Pandas-like operations**: Filtering, grouping, aggregation, **and string manipulation**
- π **Smart output formatting**: Colored tables, lists, JSON based on data structure
- π **Fast and lightweight**: Built in Rust for performance
- π§ **Developer-friendly**: Perfect for DevOps, data analysis, and API exploration
- π― **Type-aware**: Understands numbers, strings, booleans with intelligent conversion
- π **Unified syntax**: Same query language across all formats
- π§΅ **String operations**: Powerful text processing capabilities
- π **Statistical functions**: Built-in median, stddev, unique, sort operations
- π¨ **Beautiful output**: Automatic color coding with TTY detection
## π Quick Start
### Installation
```bash
# Install via Homebrew (macOS/Linux)
brew install kyotalab/tools/hawk
# Install via Cargo (if Rust is installed)
cargo install hawk-data
# Verify installation
hawk --version
```
### Basic Usage
```bash
# Explore data structure
# Access fields
hawk '.users[0].name' users.json
hawk '.users.name' users.csv
# Filter and aggregate
# Process text files (NEW in v0.2.0!)
```
## π Query Syntax
### Field Access
```bash
.field # Access field
.array[0] # Access array element
.array[] # Access all array elements
.nested.field # Deep field access
.array[0].nested.field # Complex nested access
.array[].nested[] # Multi-level array expansion
```
### Text Processing (NEW in v0.2.0!)
```bash
# String operations
. | map(. | trim) # Remove whitespace (both ends)
. | map(. | trim_start) # Remove leading whitespace
. | map(. | trim_end) # Remove trailing whitespace
. | map(. | length) # Get string length
. | map(. | reverse) # Reverse string
# String manipulation
. | map(. | split(",")) # Split by delimiter
.array[] | join(", ") # Join array elements
# String filtering
. | select(. | ends_with(".log")) # Ends with pattern
```
### Statistical Operations (NEW in v0.2.0!)
```bash
. | median # Calculate median
. | stddev # Calculate standard deviation
. | length # Get array length
# With field specification
.data[] | sort(.timestamp) # Sort by timestamp
```
### Filtering
```bash
. | select(.active == true) # Boolean comparison
. | select(.status != "inactive") # Not equal
. | select(.State.Name == "running") # Nested field filtering
# Complex string filtering (NEW!)
```
### Data Transformation (NEW in v0.2.0!)
```bash
# Transform data with map
.data[] | map(.text | trim | upper) # Clean and normalize
# Complex transformations
```
### Field Selection
```bash
```
### Aggregation
```bash
. | avg(.score) # Average values
. | min(.price) # Minimum value
. | max(.price) # Maximum value
```
### Grouping
```bash
. | group_by(.region) | avg(.sales) # Average by group
. | group_by(.type) | sum(.amount) # Sum by group
```
### Complex Queries
```bash
# Multi-step analysis
# Multi-level array processing
# Text processing pipeline (NEW!)
# Mixed data and text analysis
## π― Use Cases
### Log File Analysis (NEW in v0.2.0!)
```bash
# Extract error logs with timestamps
# Analyze log levels
# Find unique IP addresses in access logs
# Count warnings by hour
### Text Data Processing
```bash
# Clean and normalize text data
# Remove different types of whitespace
# Extract file extensions
# Join processed data
# Count words in documents
# Find long lines
### API Response Analysis
```bash
# Analyze GitHub API response
# Extract specific fields
### DevOps & Infrastructure
```bash
# Kubernetes resource analysis
# AWS EC2 analysis
# Docker Compose services
# Configuration file analysis
### Data Analysis
```bash
# Sales data analysis
# Statistical analysis (NEW!)
# Data cleaning and normalization
```
## π Supported Formats
### JSON
```json
{
"users": [
{ "name": "Alice", "age": 30, "department": "Engineering" },
{ "name": "Bob", "age": 25, "department": "Marketing" }
]
}
```
### YAML
```yaml
users:
- name: Alice
age: 30
department: Engineering
- name: Bob
age: 25
department: Marketing
```
### CSV
```csv
name,age,department
Alice,30,Engineering
Bob,25,Marketing
```
### Plain Text (NEW in v0.2.0!)
```
2024-01-15 09:00:01 INFO Application started
2024-01-15 09:00:02 ERROR Failed to connect
2024-01-15 09:00:03 WARN High memory usage
```
All formats support the same query syntax!
## π¨ Output Formats
### Smart Auto-Detection (default)
```bash
hawk '.users[0].name' data.json # β Alice (list)
hawk '.users[]' data.json # β Colored table format
hawk '.config' data.json # β JSON format
```
### Explicit Format Control
```bash
hawk '.users[]' --format table # Force table
hawk '.users[]' --format json # Force JSON
hawk '.users.name' --format list # Force list
```
### Colored Output (NEW in v0.2.0!)
- **Automatic TTY detection**: Colors in terminal, plain text in pipes
- **Beautiful tables**: Headers in blue, numbers in green, booleans in yellow
- **Readable JSON**: Syntax highlighting for better readability
- **NO_COLOR support**: Respects NO_COLOR environment variable
## π οΈ Advanced Examples
### Complex Data Analysis
```bash
# Multi-step pipeline analysis
# Nested data exploration
# Cross-format analysis
### Real-world Log Analysis (NEW!)
```bash
# Extract error timestamps and analyze patterns
# Find most common error messages
# Analyze response times from access logs
# Extract unique user agents
### Text Processing Workflows
```bash
# 1. Clean configuration files
# 2. Analyze code files
# 3. Process CSV-like text data
# 4. Extract and analyze patterns
### Data Processing Workflows
```bash
# 1. Explore structure
# 2. Filter relevant data
# 3. Clean and normalize (NEW!)
# 4. Statistical analysis (NEW!)
# 5. Export results
hawk '.summary[]' data.json --format csv > results.csv
```
## π§ Installation & Setup
### Homebrew (Recommended)
```bash
# Install via Homebrew
brew install kyotalab/tools/hawk
# Or install from the main repository
brew tap kyotalab/tools
brew install hawk
```
### Cargo (Rust Package Manager)
```bash
cargo install hawk-data
```
### Build from Source
```bash
# Prerequisites: Rust 1.70 or later
git clone https://github.com/kyotalab/hawk.git
cd hawk
cargo build --release
# Add to PATH
sudo cp target/release/hawk /usr/local/bin/
```
### Binary Releases
Download pre-built binaries from [GitHub Releases](https://github.com/kyotalab/hawk/releases)
- Linux (x86_64)
- macOS (Intel & Apple Silicon)
## π Documentation
### Command Line Options
```bash
hawk --help # Show help
hawk --version # Show version
hawk '.query' file.json # Basic usage
hawk '.query' --format json # Specify output format
```
### Query Language Reference
| Field access | `.field` | `.name` |
| Array index | `.array[0]` | `.users[0]` |
| Array iteration | `.array[]` | `.users[]` |
| Multi-level arrays | `.array[].nested[]` | `.Reservations[].Instances[]` |
| **Text processing** | `. \| map(. \| operation)` | `. \| map(. \| upper)` |
| **String filtering** | `. \| select(. \| contains("text"))` | `. \| select(. \| contains("ERROR"))` |
| **String manipulation** | `. \| map(. \| replace("a", "b"))` | `. \| map(. \| trim)` |
| Field selection | ` \| select_fields(field1,field2)` | `\| select_fields(name,age)` |
| Filtering | ` \| select(.field > value)` | `\| select(.age > 30)` |
| Nested filtering | ` \| select(.nested.field == value)` | `\| select(.State.Name == "running")` |
| Grouping | ` \| group_by(.field)` | `\| group_by(.department)` |
| Counting | ` \| count` | `.users \| count` |
| Aggregation | ` \| sum/avg/min/max(.field)` | `\| avg(.salary)` |
| **Statistics** | ` \| median/stddev/unique/sort` | `\| median` |
| Info | ` \| info` | `. \| info` |
### String Operations (NEW in v0.2.0!)
| `upper` | `. \| map(. \| upper)` | Convert to uppercase |
| `lower` | `. \| map(. \| lower)` | Convert to lowercase |
| `trim` | `. \| map(. \| trim)` | Remove whitespace (both ends) |
| `trim_start` | `. \| map(. \| trim_start)` | Remove leading whitespace |
| `trim_end` | `. \| map(. \| trim_end)` | Remove trailing whitespace |
| `length` | `. \| map(. \| length)` | Get string length |
| `reverse` | `. \| map(. \| reverse)` | Reverse string |
| `contains(pattern)` | `. \| select(. \| contains("text"))` | Check if contains pattern |
| `starts_with(pattern)` | `. \| select(. \| starts_with("pre"))` | Check if starts with pattern |
| `ends_with(pattern)` | `. \| select(. \| ends_with("suf"))` | Check if ends with pattern |
| `replace(old, new)` | `. \| map(. \| replace("a", "b"))` | Replace text |
| `substring(start, len)` | `. \| map(. \| substring(0, 5))` | Extract substring |
| `split(delimiter)` | `. \| map(. \| split(","))` | Split by delimiter |
| `join(delimiter)` | `.array[] \| join(", ")` | Join array elements with delimiter |
### Statistical Operations (NEW in v0.2.0!)
| `unique` | `. \| unique` | Remove duplicates |
| `sort` | `. \| sort` | Sort values |
| `median` | `. \| median` | Calculate median |
| `stddev` | `. \| stddev` | Calculate standard deviation |
| `length` | `. \| length` | Get array length |
### Supported Operators
- **Comparison**: `>`, `<`, `==`, `!=`
- **Aggregation**: `count`, `sum`, `avg`, `min`, `max`
- **Statistics**: `median`, `stddev`, `unique`, `sort`, `length`
- **Grouping**: `group_by`
- **Filtering**: `select`
- **Transformation**: `map`
## π Whatβs New in v0.2.0
### π Major Features
- **Plain Text Support**: Process log files, configuration files, and any text data
- **String Operations**: Complete set of string manipulation functions
- **Statistical Functions**: Built-in median, standard deviation, unique, and sort operations
- **Enhanced map() Function**: Transform data with powerful string operations
- **Colored Output**: Beautiful, readable output with automatic TTY detection
### π§ Improvements
- Better error messages with detailed context
- Improved pipeline processing with proper parentheses handling
- Enhanced type inference for CSV data
- More robust file format detection
### π New Use Cases Enabled
- Log file analysis and monitoring
- Text data cleaning and normalization
- Statistical analysis of numeric data
- Complex data transformation pipelines
- Configuration file processing
## π€ Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
git clone https://github.com/kyotalab/hawk.git
cd hawk
cargo build
cargo test
```
### Running Tests
```bash
cargo test # Run all tests
```
## π License
This project is licensed under the MIT License - see the <LICENSE> file for details.
## π Acknowledgments
- Inspired by the simplicity of `awk` and the power of `pandas`
- Built with the amazing Rust ecosystem
- Special thanks to the `serde`, `clap`, `csv`, and `termcolor` crate maintainers
## π Related Tools & Comparison
| **awk** | Text processing, log parsing | Line-based, no JSON/YAML support | Structured data focus, type-aware operations, string functions |
| **jq** | JSON transformation | JSON-only, complex syntax for data analysis | Multi-format, pandas-like analytics, text processing |
| **pandas** | Heavy data science | Requires Python setup, overkill for CLI | Lightweight, terminal-native, instant startup |
| **sed/grep** | Text manipulation | No structured data understanding | Schema-aware processing, statistical functions |
### Why Choose hawk?
**π― For structured data analysis**, hawk fills the gap between simple text tools and heavy data science frameworks:
```bash
# awk: Limited structured data support
awk -F',' '$3 > 30 {print $1}' data.csv
# jq: JSON-only, verbose for analytics
# hawk: Unified, intuitive syntax across all formats
hawk '.[] | select(.age > 30) | .name' data.yaml # Same syntax for YAML
hawk '. | select(. | contains("age=30"))' data.txt # Even works for text!
```
**π pandas power, awk simplicity**:
```bash
# Complex analytics made simple
```
**π§ DevOps & log analysis optimized**:
```bash
# Kubernetes config analysis (YAML native)
# Log analysis (NEW in v0.2.0!)
---
**Happy data exploring with hawk!** π¦
For questions, issues, or feature requests, please visit our [GitHub repository](https://github.com/kyotalab/hawk).