parsm - Parse 'Em - An 'everything' parser, Sedder, Awkker, Grokker, Grepper
Parsm is the powerful command-line tool that understands structured text better than sed, awk, grep or grok.
Overview
parsm is a multi-format data processor that automatically detects and parses JSON, CSV, TOML, YAML, logfmt, and plain text. It provides powerful filtering and templating capabilities with a simple, intuitive syntax.
Installation
Or build from source:
Quick Start
# Basic usage
# Show comprehensive examples
# Extract a field (most common operation)
|
# Extract nested fields
|
# Filter data based on field values
|
# Filter and format output
|
# Simple template output
|
# Parse and understand text
|
Supported Input Formats
parsm automatically detects and parses these formats:
JSON
CSV
Alice,30,Engineer
Bob,25,Designer
YAML
name: Alice
age: 30
active: true
TOML
= "Alice"
= 30
= true
Logfmt
level=error msg="Database connection failed" service=api duration=1.2s
Plain Text
Alice 30 Engineer
Bob 25 Designer
Filter Syntax
Comparison Operators
| Operator | Description | Example |
|---|---|---|
== |
Equal to | name == "Alice" |
!= |
Not equal to | status != "inactive" |
< |
Less than | age < 30 |
<= |
Less than or equal | score <= 95 |
> |
Greater than | age > 18 |
>= |
Greater than or equal | score >= 90 |
String Operations
| Operator | Description | Example |
|---|---|---|
~ |
Contains substring | email ~ "@company.com" |
^= |
Starts with prefix | name ^= "A" |
$= |
Ends with suffix | file $= ".log" |
Boolean Logic
| Operator | Description | Example |
|---|---|---|
&& |
Logical AND | age > 18 && active == true |
|| |
Logical OR | role == "admin" || role == "user" |
! |
Logical NOT | !(status == "disabled") |
Field Access
Simple Fields
Nested Fields (JSON/YAML/TOML)
CSV Fields
CSV columns are automatically named field_0, field_1, etc.:
Text Words
Plain text words are named word_0, word_1, etc.:
Syntax Overview
The parsm DSL has three main components with distinct, unambiguous syntax:
Field Selectors (Data Extraction)
Extract specific fields using simple, unambiguous syntax - the most common operation:
Key principle: Bare identifiers like name are ALWAYS field selectors, never filters or templates.
Cross-format compatibility: Field selector syntax works identically across JSON, YAML, TOML, and other structured formats:
# These work the same for JSON, YAML, and TOML:
Templates (Dynamic Output)
Templates format output with field values using explicit variable syntax:
} # Variables with ${...}
} # Mixed template with literals
} # Original input (requires braces)
} # Nested fields in templates
Literal Text (Static Output)
Braces without variables produce literal text:
} # Outputs literal text "name"
} # Outputs literal text "Hello world"
} # Outputs literal text with dollar sign
Filters (Data Processing)
Filter data using comparison operators with field selectors:
!() # Negation
&&
Examples
# Field extraction (most common - simple syntax)
|
# Output: "Alice"
|
# Output: "alice@example.com"
# Template with variables (dynamic output)
|
# Output: Alice is 30 years old
|
# Output: Alice
# Literal templates (static output)
|
# Output: name
# Filtering with field selectors
|
# Output: {"name": "Alice", "age": 30}
# Combined filtering and templating
|
# Output: Alice is 30 years old
# Original input variable
|
# Output: Original: {"name": "Alice"} → Name: Alice
# CSV positional fields
|
# Output: Employee: Alice, Age: 30, Role: Engineer
# Nested JSON fields
| \
# Output: User: Alice, Email: alice@example.com
Field Selection
Extract specific fields with simple, unambiguous syntax - the most intuitive operation in parsm:
# Simple field extraction (bare identifiers)
|
# Output: "Alice"
|
# Output: 30
# Nested field access (dot notation)
|
# Output: "alice@example.com"
|
# Output: "localhost"
# Array element access (index notation)
|
# Output: "apple"
|
# Output: 87
# Complex nested structures
|
# Output: "Alice"
# Special field names (quoted when needed)
|
# Output: "value"
|
# Output: "data"
# Works consistently across all formats
| | |
# Quoted syntax works the same way
| | |
# Complex field names across formats
| | |
# Extract entire objects or arrays
|
# Output: {"status": "running", "pid": 1234}
|
# Output:
# "Alice"
# "Bob"
Key Benefits:
- Simplest syntax:
nameextracts the "name" field - no quotes needed - Unambiguous: Bare identifiers are ALWAYS field selectors, never filters
- Intuitive: Works exactly as users expect for the most common operation
- Powerful: Supports nested objects, arrays, and complex data structures
- Cross-format: Same syntax works for JSON, YAML, TOML, and other formats
- Flexible quoting: Use quotes only when field names have special characters or spaces
Quoting Rules:
- Unquoted:
name,user.email,items.0- for simple field names - Quoted:
"field-name","field name","special.field"- when needed for special characters or spaces - Both work:
package.nameand"package.name"are identical - use whichever you prefer
Complete Examples
Complete Examples
JSON Processing
# Extract specific fields (simple syntax)
|
|
# Basic filtering
|
# Filter and format
|
# Complex nested data
| \
# Array processing
|
# Extract entire objects
|
CSV Processing
# Filter CSV data
|
# Multiple conditions
|
# Include original data
|
Log Processing
# Filter error logs
| \
# Complex log filtering
|
# Performance monitoring
|
YAML/TOML Processing
# Extract configuration values
| | |
# Filter configuration
|
# Convert format with nested access
|
# Extract configuration sections
| |
# Real-world Cargo.toml examples
| | |
Multi-line Processing
# Process log files
|
# Filter and transform data
|
# Real-time monitoring
| \
Advanced Features
Complex Boolean Logic
# Multiple conditions
# Negation
# String operations
Error Handling
- First line errors: Fatal (format detection failure)
- Subsequent errors: Warnings with continued processing
- Missing fields: Warnings for templates, silent for filters
Performance
- Streaming: Processes line-by-line for constant memory usage
- Format detection: Automatic with intelligent fallback
- Large files: Efficient processing of gigabyte-scale data
Command Line Interface
)
)
Usage Patterns
# Just parsing (convert to JSON)
|
# Field extraction (most common - simple syntax)
|
|
# Filtering only
|
# Template only (simple variable)
|
# Template only (complex formatting)
|
# Filter and template
|
# Literal text output
|
Comparison with Other Tools
| Feature | parsm | jq | awk | sed |
|---|---|---|---|---|
| Multi-format input | ✅ JSON, CSV, YAML, TOML, logfmt, text | JSON only | Text | Text |
| Auto-detection | ✅ Automatic | Manual | Manual | Manual |
| Field extraction | ✅ Simple name syntax |
✅ .name syntax |
Limited | No |
| Filter syntax | ✅ Simple expressions | JQ query language | Programming | Regex |
| Template output | ✅ ${field} syntax |
✅ Complex | ✅ ${1}, ${2} |
Limited |
| Learning curve | ✅ Low | Medium-High | High | Medium |
| Boolean logic | ✅ &&, ||, ! |
✅ Complex | ✅ Programming | Limited |
| Nested fields | ✅ user.email |
✅ .user.email |
Limited | No |
| Performance | Good | Excellent | Excellent | Excellent |
When to use parsm
- Field extraction: When you need simple
namesyntax instead of jq's.name - Multi-format data: When working with mixed JSON, CSV, YAML, etc.
- Simple filtering: When jq syntax is too complex
- Quick transformations: When awk programming is overkill
- Log processing: Especially structured logs (JSON, logfmt)
- Data exploration: Quick inspection and filtering of structured data
- Intuitive syntax: When you want field access to "just work" without quotes or dots
Migration from other tools
# From jq
# From awk
# From grep + cut
|
Architecture Overview
Data Flow
Input → Auto-detect Format → Parse → Normalize to JSON → Filter → Template → Output
Components
- Parser: Auto-detects and parses multiple formats
- Filter Engine: Evaluates boolean expressions
- Template Engine: Renders output with field interpolation
- DSL: Simple domain-specific language for expressions
Key Design Decisions
- Unambiguous field selection: Bare identifiers like
nameare always field selectors - JSON normalization: All formats convert to JSON for uniform processing
- Streaming processing: Line-by-line for memory efficiency
- Automatic format detection: Users don't specify input format
- Simple syntax: Easy to learn and remember, prioritizing the most common operations
- Error tolerance: Continues processing on non-fatal errors
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Add tests for new functionality
- Ensure all tests pass:
cargo test - Run formatting:
cargo fmt - Run linting:
cargo clippy - Submit a pull request
Development
# Build
# Test
# Run with examples
# Test with sample data
|
License
Changelog
See CHANGELOG.md for version history.