Expand description
§parsm - Parse ’Em - Multi-Format Data Processor
A powerful library for parsing, filtering, and transforming structured data from various formats.
§Overview
parsm
automatically detects and parses JSON, CSV, TOML, YAML, logfmt, and plain text,
providing powerful filtering and templating capabilities with a simple, intuitive syntax.
§Quick Start
use parsm::{parse_command, process_stream, StreamingParser};
use std::io::Cursor;
// Parse a filter expression
let dsl = parse_command(r#"age > 25 {${name} is ${age} years old}"#)?;
// Process streaming data
let input = r#"{"name": "Alice", "age": 30}"#;
let mut output = Vec::new();
process_stream(Cursor::new(input), &mut output)?;
§Supported Formats
- JSON:
{"name": "Alice", "age": 30}
- CSV:
Alice,30,Engineer
- YAML:
name: Alice\nage: 30
- TOML:
name = "Alice"\nage = 30
- Logfmt:
level=error msg="timeout" service=api
- Plain Text:
Alice 30 Engineer
§Filter Syntax
- Comparison:
age > 25
,name == "Alice"
- String ops:
email ~ "@company.com"
,name ^= "A"
,file $= ".log"
- Truthy checks:
active?
,user.verified?
,!disabled?
(checks if fields have truthy values) - Boolean logic:
age > 25 && active == true
,name == "Alice" || name == "Bob"
- Nested fields:
user.email == "alice@example.com"
- Parentheses:
(age > 25) && (status == "active")
- Array membership:
role in ["admin", "moderator"]
,user.id in allowed_ids
Note: Bare field names like name
are field selectors, not filters.
Use explicit comparisons: name == "Alice"
instead of just name
.
For boolean fields, use the truthy operator: active?
instead of just active
.
§Template Syntax
Templates use ${variable}
for field substitution. There are several ways to create templates:
- Braced templates:
{${name} is ${age}}
(explicit field variables) - Simple variables:
$name
(becomes a field template) - Mixed templates:
{Hello ${name}!}
(mix literals and variables) - Interpolated text:
Hello $name
(variables in plain text) - Literal templates:
{name}
(literal text, not field substitution) - Indexed fields:
{${1}, ${2}, ${3}}
(1-based positional access, requires braces) - Original input:
{${0}}
(entire original input, requires braces) - Nested fields:
{${user.email}}
or$user.email
- Literal dollars:
{Price: $12.50}
(literal $ when not followed by valid variable name)
Variable Mapping Rules:
${0}
always refers to the original input text${1}
,${2}
, etc. refer to positional fields (1st, 2nd, etc.)$0
,$1
,$20
are treated as literal text unless in${n}
form- Consistent across all data formats (CSV, JSON, text, logfmt, YAML, TOML)
§Field Selection
Field selectors extract specific fields from data. Both quoted and unquoted syntax work:
- Simple fields:
name
,age
,status
- Nested fields:
user.email
,config.database.host
- Quoted fields:
"field name"
,"user.email"
(for names with spaces or special chars)
use parsm::parse_command;
// Extract fields using simple syntax
let dsl = parse_command("user.email")?;
assert!(dsl.field_selector.is_some());
// Or quoted syntax for complex names
let dsl = parse_command(r#""field with spaces""#)?;
assert!(dsl.field_selector.is_some());
§Examples
§Basic Filtering
use parsm::{parse_command, FilterEngine};
use serde_json::json;
let dsl = parse_command(r#"age > 25"#)?;
let data = json!({"name": "Alice", "age": 30});
if let Some(filter) = &dsl.filter {
let passes = FilterEngine::evaluate(filter, &data);
assert!(passes);
}
§Template Rendering
use parsm::parse_command;
use serde_json::json;
let dsl = parse_command(r#"age > 25 {${name} is ${age} years old}"#)?;
let data = json!({"name": "Alice", "age": 30});
if let Some(template) = &dsl.template {
let output = template.render(&data);
assert_eq!(output, "Alice is 30 years old");
}
§Format Detection and Parsing
use parsm::StreamingParser;
// Create separate parsers for different formats
let mut json_parser = StreamingParser::new();
let json_result = json_parser.parse_line(r#"{"name": "Alice"}"#)?;
let mut csv_parser = StreamingParser::new();
let csv_result = csv_parser.parse_line("Alice,30,Engineer")?;
let mut logfmt_parser = StreamingParser::new();
let logfmt_result = logfmt_parser.parse_line("level=error msg=timeout")?;
§Architecture
The library consists of several key components:
parse
: Multi-format parser with automatic detectionfilter
: Boolean expression evaluation enginedsl
: Domain-specific language parser using Pest- High-level functions for stream processing
§Error Handling
- First line errors: Fatal (format detection failure)
- Subsequent errors: Warnings with continued processing
- Missing fields: Graceful fallback behavior
§Performance
- Streaming: Line-by-line processing for constant memory usage
- Format detection: Efficient with intelligent fallback
- Large files: Scales to gigabyte-scale data processing
§Comprehensive Examples from README
All examples from the README are tested here to ensure documentation accuracy.
§Field Extraction Examples
use parsm::{parse_command, process_stream};
use serde_json::json;
use std::io::Cursor;
// Simple field extraction
let dsl = parse_command("name")?;
assert!(dsl.field_selector.is_some());
// Nested field access
let dsl = parse_command("user.email")?;
assert!(dsl.field_selector.is_some());
// Array element access
let dsl = parse_command("items.0")?;
assert!(dsl.field_selector.is_some());
// Process real data with field extraction
let input = r#"{"name": "Alice", "age": 30}"#;
let mut output = Vec::new();
process_stream(Cursor::new(input), &mut output)?;
let result = String::from_utf8(output)?;
assert!(result.contains("Alice") || result.contains("30"));
§Template Examples
use parsm::parse_command;
use serde_json::json;
// Variable template with braces
let dsl = parse_command(r#"{${name} is ${age} years old}"#)?;
assert!(dsl.template.is_some());
if let Some(template) = &dsl.template {
let data = json!({"name": "Alice", "age": 30});
let output = template.render(&data);
assert_eq!(output, "Alice is 30 years old");
}
// Simple variable shorthand
let dsl = parse_command("$name")?;
assert!(dsl.template.is_some());
if let Some(template) = &dsl.template {
let data = json!({"name": "Alice"});
let output = template.render(&data);
assert_eq!(output, "Alice");
}
// Literal template (no variables)
let dsl = parse_command("{name}")?;
assert!(dsl.template.is_some());
if let Some(template) = &dsl.template {
let data = json!({"name": "Alice"});
let output = template.render(&data);
assert_eq!(output, "name");
}
// Original input variable (${0} always refers to entire input)
let dsl = parse_command(r#"{Original: ${0} → Name: ${name}}"#)?;
assert!(dsl.template.is_some());
if let Some(template) = &dsl.template {
// Example showing ${0} referring to original input
let data = json!({"$0": "Alice,30,Engineer", "name": "Alice"});
let output = template.render(&data);
assert_eq!(output, "Original: Alice,30,Engineer → Name: Alice");
}
// CSV positional fields (1-based indexing)
let dsl = parse_command(r#"{Employee: ${1}, Age: ${2}, Role: ${3}}"#)?;
assert!(dsl.template.is_some());
if let Some(template) = &dsl.template {
// Example showing how CSV fields map to 1-based indices
let data = json!({"1": "Alice", "2": "30", "3": "Engineer"});
let output = template.render(&data);
assert_eq!(output, "Employee: Alice, Age: 30, Role: Engineer");
}
// Nested JSON fields
let dsl = parse_command(r#"{User: ${user.name}, Email: ${user.email}}"#)?;
assert!(dsl.template.is_some());
§Filter Examples
use parsm::{parse_command, FilterEngine};
use serde_json::json;
// Basic filtering
let dsl = parse_command(r#"age > 25"#)?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"name": "Alice", "age": 30});
assert!(FilterEngine::evaluate(filter, &data));
}
// String equality
let dsl = parse_command(r#"name == "Alice""#)?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"name": "Alice", "age": 30});
assert!(FilterEngine::evaluate(filter, &data));
}
// Truthy field checks (using ? operator)
let dsl = parse_command("active?")?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"name": "Alice", "active": true});
assert!(FilterEngine::evaluate(filter, &data));
}
// Truthy check with nested fields
let dsl = parse_command("user.verified?")?;
assert!(dsl.filter.is_some());
// Boolean comparison
let dsl = parse_command("user.active == true")?;
assert!(dsl.filter.is_some());
// Negation
let dsl = parse_command(r#"!(status == "disabled")"#)?;
assert!(dsl.filter.is_some());
// Boolean logic
let dsl = parse_command(r#"name == "Alice" && age > 25"#)?;
assert!(dsl.filter.is_some());
§Combined Filter and Template Examples
use parsm::{parse_command, FilterEngine};
use serde_json::json;
// Filter with template output
let dsl = parse_command(r#"age > 25 {${name} is ${age} years old}"#)?;
assert!(dsl.filter.is_some());
assert!(dsl.template.is_some());
let data = json!({"name": "Alice", "age": 30});
if let (Some(filter), Some(template)) = (&dsl.filter, &dsl.template) {
if FilterEngine::evaluate(filter, &data) {
let output = template.render(&data);
assert_eq!(output, "Alice is 30 years old");
}
}
§Field Selection Examples
use parsm::parse_command;
// Simple field extraction
let dsl = parse_command("name")?;
assert!(dsl.field_selector.is_some());
assert!(dsl.filter.is_none());
assert!(dsl.template.is_none());
// Nested field access
let dsl = parse_command("user.email")?;
assert!(dsl.field_selector.is_some());
// Array element access
let dsl = parse_command("items.0")?;
assert!(dsl.field_selector.is_some());
// Quoted field names
let dsl = parse_command(r#""field name""#)?;
assert!(dsl.field_selector.is_some());
let dsl = parse_command("'special-field'")?;
assert!(dsl.field_selector.is_some());
§String Operations Examples
use parsm::{parse_command, FilterEngine};
use serde_json::json;
// Contains substring
let dsl = parse_command(r#"email ~ "@company.com""#)?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"email": "alice@company.com"});
assert!(FilterEngine::evaluate(filter, &data));
}
// Starts with prefix
let dsl = parse_command(r#"name ^= "A""#)?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"name": "Alice"});
assert!(FilterEngine::evaluate(filter, &data));
}
// Ends with suffix
let dsl = parse_command(r#"file $= ".log""#)?;
assert!(dsl.filter.is_some());
if let Some(filter) = &dsl.filter {
let data = json!({"file": "app.log"});
assert!(FilterEngine::evaluate(filter, &data));
}
§Comparison Operators Examples
use parsm::parse_command;
// All comparison operators
let operators = vec![
r#"name == "Alice""#,
r#"status != "inactive""#,
"age < 30",
"score <= 95",
"age > 18",
"score >= 90",
];
for op in operators {
let dsl = parse_command(op)?;
assert!(dsl.filter.is_some());
}
§Boolean Logic Examples
use parsm::parse_command;
// Logical AND
let dsl = parse_command("age > 18 && active == true")?;
assert!(dsl.filter.is_some());
// Logical OR
let dsl = parse_command(r#"role == "admin" || role == "user""#)?;
assert!(dsl.filter.is_some());
// Logical NOT
let dsl = parse_command(r#"!(status == "disabled")"#)?;
assert!(dsl.filter.is_some());
§Advanced Boolean Logic Examples
use parsm::parse_command;
// Multiple conditions with parentheses
let dsl = parse_command(r#"name == "Alice" && (age > 25 || active == true)"#)?;
assert!(dsl.filter.is_some());
// Complex negation
let dsl = parse_command(r#"!(status == "disabled" || role == "guest")"#)?;
assert!(dsl.filter.is_some());
// String operations with boolean logic
let dsl = parse_command(r#"email ~ "@company.com" && name ^= "A""#)?;
assert!(dsl.filter.is_some());
§Format-Specific Examples
use parsm::{parse_command, StreamingParser};
// CSV field access patterns (legacy field names still supported)
let dsl = parse_command("field_0 == \"Alice\"")?;
assert!(dsl.filter.is_some());
let dsl = parse_command("field_1 > \"25\"")?;
assert!(dsl.filter.is_some());
// New 1-based positional access for CSV
let dsl = parse_command(r#"{${1}} {${2}} {${3}}"#)?;
assert!(dsl.template.is_some());
// Text word access patterns (legacy names still supported)
let dsl = parse_command("word_0 == \"Alice\"")?;
assert!(dsl.filter.is_some());
let dsl = parse_command("word_1 > \"25\"")?;
assert!(dsl.filter.is_some());
// New 1-based positional access for text
let dsl = parse_command(r#"{First: ${1}, Second: ${2}}"#)?;
assert!(dsl.template.is_some());
§Disambiguation Rules
The parser follows specific rules to determine how expressions should be interpreted:
- Field selectors: Bare field names with no operators (
name
,user.email
) - Filter expressions: Explicit comparisons (
age > 25
,name == "Alice"
) - Truthy checks: Field names with
?
suffix (active?
,user.verified?
) - Templates: Expressions starting with
$
or wrapped in[]
To avoid ambiguity:
- Always use
field?
syntax for truthy checks, not bare field names - Avoid bare field names in boolean expressions (
name && age
is invalid) - Don’t mix filter expressions with field selectors
use parsm::parse_command;
// These are unambiguous:
let dsl1 = parse_command("active?")?; // Filter using truthy check
let dsl2 = parse_command("name")?; // Field selector
let dsl3 = parse_command("name == \"Alice\" && age > 25")?; // Filter expression
// These would be ambiguous and will be rejected:
// parse_command("active && name"); // Ambiguous - both could be field selectors or truthy checks
// parse_command("name age"); // Ambiguous - missing operator or invalid syntax
Re-exports§
pub use dsl::ParsedDSL;
pub use dsl::parse_command;
pub use dsl::parse_separate_expressions;
pub use filter::ComparisonOp;
pub use filter::FieldPath;
pub use filter::FilterEngine;
pub use filter::FilterExpr;
pub use filter::FilterValue;
pub use filter::Template;
pub use filter::TemplateItem;
pub use format_detector::DetectedFormat;
pub use format_detector::FormatDetector;
pub use parse::ParsedLine;
pub use parse::StreamingParser;
pub use parser_registry::DocumentParser;
pub use parser_registry::ParserRegistry;
Modules§
- csv_
parser - dsl
- DSL Parser - Converts Pest parse tree to AST with Unambiguous Syntax
- filter
- format_
detector - parse
- parser_
registry
Functions§
- process_
single_ value - Process a single value with filter and template/field selector This is a utility function used by both the main binary and CSV parser to ensure consistent behavior
- process_
stream - Process a stream of input data with optional DSL filter and template