parsm 0.2.0

Multi-format data processor that understands structured text better than sed or awk. Supports JSON, CSV, YAML, TOML, logfmt, and plain text with powerful filtering and templating.
Documentation

parsm - Parse 'Em - An 'everything' parser, Sedder, Awkker, Grokker, Grepper

Parsm is the powerful command-line tool that understands structured text better than sed, awk, grep or grok.

Overview

parsm is a multi-format data processor that automatically detects and parses JSON, CSV, TOML, YAML, logfmt, and plain text. It provides powerful filtering and templating capabilities with a simple, intuitive syntax.

Installation

cargo install --path .

Or build from source:

git clone <repository-url>
cd parsm
cargo build --release
./target/release/parsm --examples

Quick Start

# Basic usage
parsm [FILTER] [TEMPLATE]

# Show comprehensive examples
parsm --examples

# Extract a field (most common operation)
echo '{"name": "Alice", "age": 30}' | parsm 'name'

# Extract nested fields
echo '{"user": {"email": "alice@example.com"}}' | parsm 'user.email'

# Filter data based on field values
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25'

# Filter and format output
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25 {${name} is ${age} years old}'

# Simple template output
echo '{"name": "Alice", "age": 30}' | parsm '$name'

# Parse and understand text
echo "a dog is an excellent companion" | parsm 'word_1 == "dog" {The cat would not say $word_4}'

Supported Input Formats

parsm automatically detects and parses these formats:

JSON

{"name": "Alice", "age": 30, "active": true}

CSV

Alice,30,Engineer
Bob,25,Designer

YAML

name: Alice
age: 30
active: true

TOML

name = "Alice"
age = 30
active = true

Logfmt

level=error msg="Database connection failed" service=api duration=1.2s

Plain Text

Alice 30 Engineer
Bob 25 Designer

Filter Syntax

Comparison Operators

Operator Description Example
== Equal to name == "Alice"
!= Not equal to status != "inactive"
< Less than age < 30
<= Less than or equal score <= 95
> Greater than age > 18
>= Greater than or equal score >= 90

String Operations

Operator Description Example
~ Contains substring email ~ "@company.com"
^= Starts with prefix name ^= "A"
$= Ends with suffix file $= ".log"

Boolean Logic

Operator Description Example
&& Logical AND age > 18 && active == true
|| Logical OR role == "admin" || role == "user"
! Logical NOT !(status == "disabled")

Field Access

Simple Fields

name == "Alice"
age > 25
active == true

Nested Fields (JSON/YAML/TOML)

user.email == "alice@example.com"
config.database.host == "localhost"
metrics.cpu.usage > 80

CSV Fields

CSV columns are automatically named field_0, field_1, etc.:

field_0 == "Alice"    # First column
field_1 > "25"        # Second column (string comparison)
field_2 == "Engineer" # Third column

Text Words

Plain text words are named word_0, word_1, etc.:

word_0 == "Alice"     # First word
word_1 > "25"         # Second word
word_2 == "Engineer"  # Third word

Syntax Overview

The parsm DSL has three main components with distinct, unambiguous syntax:

Field Selectors (Data Extraction)

Extract specific fields using simple, unambiguous syntax - the most common operation:

name                     # Simple field extraction
user.email               # Nested field access
items.0                  # Array element access  
"field with spaces"      # Quoted field names (when needed)
'special-field'          # Single-quoted alternatives
"dev-dependencies.lib"   # Complex nested paths with special characters

Key principle: Bare identifiers like name are ALWAYS field selectors, never filters or templates.

Cross-format compatibility: Field selector syntax works identically across JSON, YAML, TOML, and other structured formats:

# These work the same for JSON, YAML, and TOML:
parsm 'package.name'           # Extract nested field
parsm '"package.name"'         # Same with quotes  
parsm '"field-with-hyphens"'   # Special characters
parsm '"field with spaces"'    # Spaces in field names

Templates (Dynamic Output)

Templates format output with field values using explicit variable syntax:

{${name} is ${age} years old}    # Variables with ${...}
$name                            # Simple variable shorthand
{Hello ${name}!}                 # Mixed template with literals
{${0}}                          # Original input (requires braces)
{User: ${user.name}}            # Nested fields in templates

Literal Text (Static Output)

Braces without variables produce literal text:

{name}                   # Outputs literal text "name"
{Hello world}            # Outputs literal text "Hello world"
{Price: $100}            # Outputs literal text with dollar sign

Filters (Data Processing)

Filter data using comparison operators with field selectors:

age > 25                 # Numeric comparison
name == "Alice"          # String equality
user.active == true      # Boolean comparison
!(status == "disabled")  # Negation
name == "Alice" && age > 25  # Boolean logic

Examples

# Field extraction (most common - simple syntax)
echo '{"name": "Alice", "age": 30}' | parsm 'name'
# Output: "Alice"

echo '{"user": {"email": "alice@example.com"}}' | parsm 'user.email'  
# Output: "alice@example.com"

# Template with variables (dynamic output)
echo '{"name": "Alice", "age": 30}' | parsm '{${name} is ${age} years old}'
# Output: Alice is 30 years old

echo '{"name": "Alice", "age": 30}' | parsm '$name'
# Output: Alice

# Literal templates (static output)
echo '{"name": "Alice", "age": 30}' | parsm '{name}'
# Output: name

# Filtering with field selectors
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25'
# Output: {"name": "Alice", "age": 30}

# Combined filtering and templating
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25 {${name} is ${age} years old}'
# Output: Alice is 30 years old

# Original input variable  
echo '{"name": "Alice"}' | parsm '{Original: ${0} → Name: ${name}}'
# Output: Original: {"name": "Alice"} → Name: Alice

# CSV positional fields
echo 'Alice,30,Engineer' | parsm '{Employee: ${1}, Age: ${2}, Role: ${3}}'
# Output: Employee: Alice, Age: 30, Role: Engineer

# Nested JSON fields
echo '{"user": {"name": "Alice", "email": "alice@example.com"}}' | \
  parsm '{User: ${user.name}, Email: ${user.email}}'
# Output: User: Alice, Email: alice@example.com

Field Selection

Extract specific fields with simple, unambiguous syntax - the most intuitive operation in parsm:

# Simple field extraction (bare identifiers)
echo '{"name": "Alice", "age": 30}' | parsm 'name'
# Output: "Alice"

echo '{"name": "Alice", "age": 30}' | parsm 'age'
# Output: 30

# Nested field access (dot notation)
echo '{"user": {"email": "alice@example.com"}}' | parsm 'user.email'
# Output: "alice@example.com"

echo '{"config": {"database": {"host": "localhost"}}}' | parsm 'config.database.host'
# Output: "localhost"

# Array element access (index notation)
echo '{"items": ["apple", "banana", "cherry"]}' | parsm 'items.0'
# Output: "apple"

echo '{"scores": [95, 87, 92]}' | parsm 'scores.1'
# Output: 87

# Complex nested structures
echo '{"users": [{"name": "Alice", "role": "admin"}]}' | parsm 'users.0.name'
# Output: "Alice"

# Special field names (quoted when needed)
echo '{"field name": "value"}' | parsm '"field name"'
# Output: "value"

echo '{"special-field": "data"}' | parsm "'special-field'"
# Output: "data"

# Works consistently across all formats
echo '{"package": {"name": "test"}}' | parsm 'package.name'           # JSON
echo 'package: {name: test}' | parsm 'package.name'                  # YAML  
echo '[package]\nname = "test"' | parsm 'package.name'               # TOML

# Quoted syntax works the same way
echo '{"package": {"name": "test"}}' | parsm '"package.name"'         # JSON
echo 'package: {name: test}' | parsm '"package.name"'                # YAML
echo '[package]\nname = "test"' | parsm '"package.name"'             # TOML

# Complex field names across formats
echo '{"dev-dependencies": {"my-lib": "1.0"}}' | parsm '"dev-dependencies.my-lib"'   # JSON
echo 'dev-dependencies:\n  my-lib: 1.0' | parsm '"dev-dependencies.my-lib"'         # YAML  
echo '[dev-dependencies]\nmy-lib = "1.0"' | parsm '"dev-dependencies.my-lib"'       # TOML

# Extract entire objects or arrays
echo '{"state": {"status": "running", "pid": 1234}}' | parsm 'state'
# Output: {"status": "running", "pid": 1234}

echo '[{"name": "Alice"}, {"name": "Bob"}]' | parsm 'name'
# Output:
# "Alice"
# "Bob"

Key Benefits:

  • Simplest syntax: name extracts the "name" field - no quotes needed
  • Unambiguous: Bare identifiers are ALWAYS field selectors, never filters
  • Intuitive: Works exactly as users expect for the most common operation
  • Powerful: Supports nested objects, arrays, and complex data structures
  • Cross-format: Same syntax works for JSON, YAML, TOML, and other formats
  • Flexible quoting: Use quotes only when field names have special characters or spaces

Quoting Rules:

  • Unquoted: name, user.email, items.0 - for simple field names
  • Quoted: "field-name", "field name", "special.field" - when needed for special characters or spaces
  • Both work: package.name and "package.name" are identical - use whichever you prefer

Complete Examples

Complete Examples

JSON Processing

# Extract specific fields (simple syntax)
echo '{"name": "Alice", "age": 30}' | parsm 'name'
echo '{"user": {"email": "alice@example.com"}}' | parsm 'user.email'

# Basic filtering
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25'

# Filter and format
echo '{"name": "Alice", "age": 30}' | parsm 'age > 25 {${name} is ${age} years old}'

# Complex nested data
echo '{"user": {"name": "Alice", "profile": {"verified": true}}}' | \
  parsm 'user.profile.verified == true {Verified user: ${user.name}}'

# Array processing
echo '{"users": [{"name": "Alice"}, {"name": "Bob"}]}' | parsm 'users.0.name'

# Extract entire objects
echo '{"users": [{"name": "Alice"}, {"name": "Bob"}]}' | parsm 'users'

CSV Processing

# Filter CSV data
echo 'Alice,30,Engineer' | parsm 'field_1 > "25"' '{${1} works as ${3}}'

# Multiple conditions
users.csv | parsm 'field_1 > "25" && field_2 == "Engineer"' '{${1} (${2} years old)}'

# Include original data
echo 'Alice,30,Engineer' | parsm '{${0} → Name: ${1}, Age: ${2}}'

Log Processing

# Filter error logs
echo 'level=error msg="DB connection failed" service=api' | \
  parsm 'level == "error"' '{[${level}] ${msg}}'

# Complex log filtering
logs.txt | parsm 'level == "error" && service == "payment"' '{${timestamp}: ${msg}}'

# Performance monitoring
app.log | parsm 'duration > 1000' '{Slow request: ${path} took ${duration}ms}'

YAML/TOML Processing

# Extract configuration values
cat Cargo.toml | parsm 'package.name'                    # Get package name
cat Cargo.toml | parsm 'package.version'                 # Get version
cat Cargo.toml | parsm '"dependencies.serde_json"'       # Get dependency version

# Filter configuration
config.yaml | parsm 'database.enabled == true' '{DB: ${database.host}:${database.port}}'

# Convert format with nested access
echo 'name: Alice\nconfig: {debug: true}' | parsm '{${name}: debug=${config.debug}}'

# Extract configuration sections
config.toml | parsm '"server"'                           # Get entire server section
config.toml | parsm '"dev-dependencies"'                 # Get dev dependencies

# Real-world Cargo.toml examples
cat Cargo.toml | parsm 'package.description'            # Project description
cat Cargo.toml | parsm '"dependencies.clap"'             # Specific dependency
cat Cargo.toml | parsm 'package.keywords'                # Keywords array

Multi-line Processing

# Process log files
tail -f app.log | parsm 'level == "error"' '{${date}: ${msg}}'

# Filter and transform data
cat users.csv | parsm 'field_1 > "21"' '{{"name": "${1}", "age": ${2}}}'

# Real-time monitoring
docker stats --format "table {{.Name}},{{.CPUPerc}}" | \
  parsm 'field_1 ~ "%"' '{Container ${1} using ${2} CPU}'

Advanced Features

Complex Boolean Logic

# Multiple conditions
parsm 'name == "Alice" && (age > 25 || active == true)'

# Negation
parsm '!(status == "disabled" || role == "guest")'

# String operations
parsm 'email ~ "@company.com" && name ^= "A"'

Error Handling

  • First line errors: Fatal (format detection failure)
  • Subsequent errors: Warnings with continued processing
  • Missing fields: Warnings for templates, silent for filters

Performance

  • Streaming: Processes line-by-line for constant memory usage
  • Format detection: Automatic with intelligent fallback
  • Large files: Efficient processing of gigabyte-scale data

Command Line Interface

parsm [OPTIONS] [FILTER] [TEMPLATE]

Arguments:
  [FILTER]     Filter expression (optional)
  [TEMPLATE]   Template expression for output formatting (optional)

Options:
  --examples   Show comprehensive usage examples
  -h, --help   Print help information
  -V, --version Print version information

Usage Patterns

# Just parsing (convert to JSON)
cat data.yaml | parsm

# Field extraction (most common - simple syntax)
cat data.json | parsm 'name'
cat data.json | parsm 'user.email'

# Filtering only  
cat data.json | parsm 'age > 25'

# Template only (simple variable)
cat data.csv | parsm '$name'

# Template only (complex formatting)
cat data.csv | parsm '{${1}: ${2}}'

# Filter and template
cat data.log | parsm 'level == "error" {[${timestamp}] ${msg}}'

# Literal text output
cat data.json | parsm '{User Profile}'

Comparison with Other Tools

Feature parsm jq awk sed
Multi-format input ✅ JSON, CSV, YAML, TOML, logfmt, text JSON only Text Text
Auto-detection ✅ Automatic Manual Manual Manual
Field extraction ✅ Simple name syntax .name syntax Limited No
Filter syntax ✅ Simple expressions JQ query language Programming Regex
Template output ${field} syntax ✅ Complex ${1}, ${2} Limited
Learning curve ✅ Low Medium-High High Medium
Boolean logic &&, ||, ! ✅ Complex ✅ Programming Limited
Nested fields user.email .user.email Limited No
Performance Good Excellent Excellent Excellent

When to use parsm

  • Field extraction: When you need simple name syntax instead of jq's .name
  • Multi-format data: When working with mixed JSON, CSV, YAML, etc.
  • Simple filtering: When jq syntax is too complex
  • Quick transformations: When awk programming is overkill
  • Log processing: Especially structured logs (JSON, logfmt)
  • Data exploration: Quick inspection and filtering of structured data
  • Intuitive syntax: When you want field access to "just work" without quotes or dots

Migration from other tools

# From jq
jq '.name' data.json              → parsm 'name' < data.json
jq '.user.email' data.json        → parsm 'user.email' < data.json
jq 'select(.age > 25)' data.json  → parsm 'age > 25' < data.json

# From awk  
awk '$2 > 25' data.csv           → parsm 'field_1 > "25"' < data.csv
awk '{print $1, $2}' data.txt    → parsm '{${1} ${2}}' < data.txt

# From grep + cut
grep "error" logs | cut -d' ' -f3  → parsm 'word_0 == "error" ${3}' < logs

Architecture Overview

Data Flow

Input → Auto-detect Format → Parse → Normalize to JSON → Filter → Template → Output

Components

  • Parser: Auto-detects and parses multiple formats
  • Filter Engine: Evaluates boolean expressions
  • Template Engine: Renders output with field interpolation
  • DSL: Simple domain-specific language for expressions

Key Design Decisions

  1. Unambiguous field selection: Bare identifiers like name are always field selectors
  2. JSON normalization: All formats convert to JSON for uniform processing
  3. Streaming processing: Line-by-line for memory efficiency
  4. Automatic format detection: Users don't specify input format
  5. Simple syntax: Easy to learn and remember, prioritizing the most common operations
  6. Error tolerance: Continues processing on non-fatal errors

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Add tests for new functionality
  4. Ensure all tests pass: cargo test
  5. Run formatting: cargo fmt
  6. Run linting: cargo clippy
  7. Submit a pull request

Development

# Build
cargo build

# Test
cargo test

# Run with examples
cargo run -- --examples

# Test with sample data
echo '{"name": "Alice", "age": 30}' | cargo run -- 'age > 25' '{${name}: ${age}}'

License

LICENSE

Changelog

See CHANGELOG.md for version history.