# parsm - **Parse 'Em** - An 'everything' parser, Sedder, Awkker, Grokker, Grepper
Parsm is the powerful command-line tool that understands structured text better than sed, awk, grep or grok.
<img src="eatcookie.jpg" alt="Eat more cookie!" width="25%">
## Overview
`parsm` is a multi-format data processor that automatically detects and parses JSON, CSV, TOML, YAML, logfmt, and plain text. It provides powerful filtering and templating capabilities with a simple, intuitive syntax.
## Installation
```bash
cargo install --path .
```
Or build from source:
```bash
git clone <repository-url>
cd parsm
cargo build --release
./target/release/parsm --examples
```
## Quick Start
```bash
# Basic usage
parsm [FILTER] [TEMPLATE]
# Show comprehensive examples
parsm --examples
# Extract a field (most common operation)
# Extract nested fields
# Filter data based on field values
# Filter and format output
# Simple template output
# Parse and understand text
## Supported Input Formats
`parsm` automatically detects and parses these formats:
### JSON
```json
{"name": "Alice", "age": 30, "active": true}
```
### CSV
```csv
Alice,30,Engineer
Bob,25,Designer
```
### YAML
```yaml
name: Alice
age: 30
active: true
```
### TOML
```toml
name = "Alice"
age = 30
active = true
```
### Logfmt
```
level=error msg="Database connection failed" service=api duration=1.2s
```
### Plain Text
```
Alice 30 Engineer
Bob 25 Designer
```
## Filter Syntax
### Comparison Operators
| `==` | Equal to | `name == "Alice"` |
| `!=` | Not equal to | `status != "inactive"` |
| `<` | Less than | `age < 30` |
| `<=` | Less than or equal | `score <= 95` |
| `>` | Greater than | `age > 18` |
| `>=` | Greater than or equal | `score >= 90` |
### String Operations
| `~` | Contains substring | `email ~ "@company.com"` |
| `^=` | Starts with prefix | `name ^= "A"` |
| `$=` | Ends with suffix | `file $= ".log"` |
### Boolean Logic
| `&&` | Logical AND | `age > 18 && active == true` |
| `\|\|` | Logical OR | `role == "admin" \|\| role == "user"` |
| `!` | Logical NOT | `!(status == "disabled")` |
### Field Access
#### Simple Fields
```bash
name == "Alice"
age > 25
active == true
```
#### Nested Fields (JSON/YAML/TOML)
```bash
user.email == "alice@example.com"
config.database.host == "localhost"
metrics.cpu.usage > 80
```
#### CSV Fields
CSV columns are automatically named `field_0`, `field_1`, etc.:
```bash
field_0 == "Alice" # First column
field_1 > "25" # Second column (string comparison)
field_2 == "Engineer" # Third column
```
#### Text Words
Plain text words are named `word_0`, `word_1`, etc.:
```bash
word_0 == "Alice" # First word
word_1 > "25" # Second word
word_2 == "Engineer" # Third word
```
## Syntax Overview
The parsm DSL has three main components with distinct, unambiguous syntax:
### Field Selectors (Data Extraction)
Extract specific fields using simple, unambiguous syntax - **the most common operation**:
```bash
name # Simple field extraction
user.email # Nested field access
items.0 # Array element access
"field with spaces" # Quoted field names (when needed)
'special-field' # Single-quoted alternatives
"dev-dependencies.lib" # Complex nested paths with special characters
```
**Key principle**: Bare identifiers like `name` are ALWAYS field selectors, never filters or templates.
**Cross-format compatibility**: Field selector syntax works identically across JSON, YAML, TOML, and other structured formats:
```bash
# These work the same for JSON, YAML, and TOML:
parsm 'package.name' # Extract nested field
parsm '"package.name"' # Same with quotes
parsm '"field-with-hyphens"' # Special characters
parsm '"field with spaces"' # Spaces in field names
```
### Templates (Dynamic Output)
Templates format output with field values using explicit variable syntax:
```bash
{${name} is ${age} years old} # Variables with ${...}
$name # Simple variable shorthand
{Hello ${name}!} # Mixed template with literals
{${0}} # Original input (requires braces)
{User: ${user.name}} # Nested fields in templates
```
### Literal Text (Static Output)
Braces without variables produce literal text:
```bash
{name} # Outputs literal text "name"
{Hello world} # Outputs literal text "Hello world"
{Price: $100} # Outputs literal text with dollar sign
```
### Filters (Data Processing)
Filter data using comparison operators with field selectors:
```bash
age > 25 # Numeric comparison
name == "Alice" # String equality
user.active == true # Boolean comparison
!(status == "disabled") # Negation
name == "Alice" && age > 25 # Boolean logic
```
### Examples
```bash
# Field extraction (most common - simple syntax)
# Template with variables (dynamic output)
# Literal templates (static output)
# Filtering with field selectors
# Combined filtering and templating
# Original input variable
# CSV positional fields
# Nested JSON fields
# Output: User: Alice, Email: alice@example.com
```
## Field Selection
Extract specific fields with simple, unambiguous syntax - the most intuitive operation in parsm:
```bash
# Simple field extraction (bare identifiers)
# Nested field access (dot notation)
# Array element access (index notation)
# Complex nested structures
# Special field names (quoted when needed)
# Works consistently across all formats
echo '[package]\nname = "test"' | parsm 'package.name' # TOML
# Quoted syntax works the same way
echo '[package]\nname = "test"' | parsm '"package.name"' # TOML
# Complex field names across formats
echo '[dev-dependencies]\nmy-lib = "1.0"' | parsm '"dev-dependencies.my-lib"' # TOML
# Extract entire objects or arrays
# "Alice"
# "Bob"
```
**Key Benefits:**
- **Simplest syntax**: `name` extracts the "name" field - no quotes needed
- **Unambiguous**: Bare identifiers are ALWAYS field selectors, never filters
- **Intuitive**: Works exactly as users expect for the most common operation
- **Powerful**: Supports nested objects, arrays, and complex data structures
- **Cross-format**: Same syntax works for JSON, YAML, TOML, and other formats
- **Flexible quoting**: Use quotes only when field names have special characters or spaces
**Quoting Rules:**
- **Unquoted**: `name`, `user.email`, `items.0` - for simple field names
- **Quoted**: `"field-name"`, `"field name"`, `"special.field"` - when needed for special characters or spaces
- **Both work**: `package.name` and `"package.name"` are identical - use whichever you prefer
## Complete Examples
## Complete Examples
### JSON Processing
```bash
# Extract specific fields (simple syntax)
# Basic filtering
# Filter and format
# Complex nested data
# Array processing
# Extract entire objects
### CSV Processing
```bash
# Filter CSV data
# Multiple conditions
# Include original data
### Log Processing
```bash
# Filter error logs
# Complex log filtering
# Performance monitoring
### YAML/TOML Processing
```bash
# Extract configuration values
cat Cargo.toml | parsm '"dependencies.serde_json"' # Get dependency version
# Filter configuration
# Convert format with nested access
echo 'name: Alice\nconfig: {debug: true}' | parsm '{${name}: debug=${config.debug}}'
# Extract configuration sections
# Real-world Cargo.toml examples
cat Cargo.toml | parsm 'package.keywords' # Keywords array
```
### Multi-line Processing
```bash
# Process log files
# Filter and transform data
# Real-time monitoring
```
## Advanced Features
### Complex Boolean Logic
```bash
# Multiple conditions
# Negation
# String operations
parsm 'email ~ "@company.com" && name ^= "A"'
```
### Error Handling
- **First line errors**: Fatal (format detection failure)
- **Subsequent errors**: Warnings with continued processing
- **Missing fields**: Warnings for templates, silent for filters
### Performance
- **Streaming**: Processes line-by-line for constant memory usage
- **Format detection**: Automatic with intelligent fallback
- **Large files**: Efficient processing of gigabyte-scale data
## Command Line Interface
```bash
parsm [OPTIONS] [FILTER] [TEMPLATE]
Arguments:
[FILTER] Filter expression (optional)
[TEMPLATE] Template expression for output formatting (optional)
Options:
--examples Show comprehensive usage examples
-h, --help Print help information
-V, --version Print version information
```
### Usage Patterns
```bash
# Just parsing (convert to JSON)
# Field extraction (most common - simple syntax)
# Filtering only
# Template only (simple variable)
# Template only (complex formatting)
# Filter and template
# Literal text output
## Comparison with Other Tools
| **Multi-format input** | ✅ JSON, CSV, YAML, TOML, logfmt, text | JSON only | Text | Text |
| **Auto-detection** | ✅ Automatic | Manual | Manual | Manual |
| **Field extraction** | ✅ Simple `name` syntax | ✅ `.name` syntax | Limited | No |
| **Filter syntax** | ✅ Simple expressions | JQ query language | Programming | Regex |
| **Template output** | ✅ `${field}` syntax | ✅ Complex | ✅ `${1}, ${2}` | Limited |
| **Learning curve** | ✅ Low | Medium-High | High | Medium |
| **Boolean logic** | ✅ `&&`, `\|\|`, `!` | ✅ Complex | ✅ Programming | Limited |
| **Nested fields** | ✅ `user.email` | ✅ `.user.email` | Limited | No |
| **Performance** | Good | Excellent | Excellent | Excellent |
### When to use parsm
- **Field extraction**: When you need simple `name` syntax instead of jq's `.name`
- **Multi-format data**: When working with mixed JSON, CSV, YAML, etc.
- **Simple filtering**: When jq syntax is too complex
- **Quick transformations**: When awk programming is overkill
- **Log processing**: Especially structured logs (JSON, logfmt)
- **Data exploration**: Quick inspection and filtering of structured data
- **Intuitive syntax**: When you want field access to "just work" without quotes or dots
### Migration from other tools
```bash
# From jq
jq '.name' data.json → parsm 'name' < data.json
jq '.user.email' data.json → parsm 'user.email' < data.json
jq 'select(.age > 25)' data.json → parsm 'age > 25' < data.json
# From awk
awk '$2 > 25' data.csv → parsm 'field_1 > "25"' < data.csv
awk '{print $1, $2}' data.txt → parsm '{${1} ${2}}' < data.txt
# From grep + cut
## Architecture Overview
### Data Flow
```
Input → Auto-detect Format → Parse → Normalize to JSON → Filter → Template → Output
```
### Components
- **Parser**: Auto-detects and parses multiple formats
- **Filter Engine**: Evaluates boolean expressions
- **Template Engine**: Renders output with field interpolation
- **DSL**: Simple domain-specific language for expressions
### Key Design Decisions
1. **Unambiguous field selection**: Bare identifiers like `name` are always field selectors
2. **JSON normalization**: All formats convert to JSON for uniform processing
3. **Streaming processing**: Line-by-line for memory efficiency
4. **Automatic format detection**: Users don't specify input format
5. **Simple syntax**: Easy to learn and remember, prioritizing the most common operations
6. **Error tolerance**: Continues processing on non-fatal errors
## Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature-name`
3. Add tests for new functionality
4. Ensure all tests pass: `cargo test`
5. Run formatting: `cargo fmt`
6. Run linting: `cargo clippy`
7. Submit a pull request
### Development
```bash
# Build
cargo build
# Test
cargo test
# Run with examples
cargo run -- --examples
# Test with sample data
## License
[LICENSE](LICENSE)
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for version history.