xerg 0.2.0

Ultra-fast grep implementation in Rust - built for maximum speed with direct output and parallel processing
Documentation

xerg - High-Performance Parallel Grep Tool

Crates.io Downloads GitHub License: MIT

Published on crates.io as xerg

An ultra-fast, parallel grep implementation in Rust with syntax highlighting and detailed search statistics. Built for performance with multi-core processing and optimized dependencies.

This repository is part of Pragmatic AI Labs Rust Bootcamp

Why xerg?

🚀 23x faster than system grep on large directory structures with beautiful structured output. Optional -x (xtreme) mode delivers 33x speedup when raw speed matters most.

Features

  • Parallel Processing: Multi-core file processing with intelligent thread pool management
  • Pattern Matching: Regular expression engine with optimized performance
  • Structured Streaming: Streams structured matches per file with headers and statistics
  • Directory Traversal: Recursive scanning with symlink support
  • Colorized Output: Customizable syntax highlighting (red, green, blue, bold)
  • Search Statistics: Optional detailed metrics with --stats flag
  • Quality Assurance: Comprehensive test suite and optimized dependencies

Performance Benchmarks

xerg excels in different scenarios depending on your use case:

Performance Comparison: xerg vs System grep

Scenario Tool/Mode Time CPU Usage Winner Performance Gain
Single File Search System grep 0.004s 88% ✅ System grep 92x faster
xerg (default) 0.369s 26%
xerg -x 0.369s 26%
Multi-Directory Search System grep 10.194s 87%
xerg (default) ~0.450s 650% xerg default 23x faster
xerg -x 0.310s 690% xerg xtreme 33x faster
Large Dataset (2971 files) System grep 10.194s Single-core
xerg (default) ~0.450s Multi-core xerg default 23x faster
xerg -x 0.310s Multi-core xerg xtreme 33x faster

Mode Comparison: xerg (default) vs xerg -x

Aspect xerg (default formatted) xerg -x (xtreme)
Output Format Pretty headers, structured Raw file:line:content
Speed Fast (23x vs grep) Fastest (33x vs grep)
Use Case Most users, development Raw speed, automation
Memory Usage Higher (buffering) Lower (direct output)
CPU Usage 650% (6.5 cores) 690% (6.9 cores)

Usage Recommendations

Use xerg (default mode)

Structured output with pretty formatting
Code exploration and development work
Human-readable results with file headers
Most users - balanced speed + readability

Use xerg -x (xtreme mode)

Maximum raw speed when structure isn't needed
Shell pipelines and automation scripts
CI/CD tasks where every millisecond counts
Grep-compatible output for tool integration

Use System grep

Single file searches
Simple shell scripting
One-off quick searches

Detailed Benchmark Results

Test Case Pattern Files Matches System grep xerg (default) xerg -x Best Speedup
Small project (src/) use 8 files 90 matches 0.004s 0.369s 0.369s 0.01x
Large dataset (deps/) use 2971 files 3465 matches 10.194s ~0.450s 0.310s 32.9x
Real codebase function Variable Variable Linear growth Parallel + pretty Maximum speed Scales

Test Environment: macOS, Multi-core system, Release builds
Methodology: Multiple runs averaged, time command measurements
Key Finding: Default mode provides excellent performance with readability; -x maximizes raw speed

Quick Start

Installation

Install from crates.io (Recommended)

cargo install xerg

Build from Source

  1. Clone the repository:

    git clone https://github.com/yinkam/xtreme-grep.git
    cd xtreme-grep
    
  2. Build the project:

    cargo build --release
    

Development Setup

Use the included Makefile for common development tasks:

make help        # Show all available commands
make build       # Build the project
make test        # Run all tests
make run         # Run with default parameters
make clean       # Clean build artifacts
make release     # Build optimized release binary
make all         # Build, test, and create release

Usage

Using the installed binary:

# Basic search in current directory
xerg "pattern" .

# Search with colored output, statistics, and specific path
xerg --color blue --stats "pattern" src/

For development (from source):

# Basic search in current directory
cargo run "pattern" .

# Search with colored output, statistics, and specific path
cargo run --color blue --stats "pattern" src/

# Or use the built binary directly
./target/release/xgrep --color green --stats "pattern" /path/to/search

Command-Line Options

Option Description Example
pattern Search pattern (required) "use"
path File or directory to search (optional, defaults to current directory) src/
--color <COLOR> Highlight color: red, green, blue, bold --color blue
--stats Show detailed search statistics --stats
--help Display help information --help
--version Show version information --version

Search Statistics

$ xerg --stats --color blue "use" src/
--- /Users/user/rust-grep/src/lib.rs ---
    8:  use colors::Color;
    9:  use crawler::get_files;
   10:  use result::print_result;
--- /Users/user/rust-grep/src/main.rs ---
   15:  use greprs::{run_search, Config};
   16:  use std::process;
  lines: 45, matches: 2, skipped: 0
result: files:8; lines:1186; matches:207; skipped:0; errors:0; time:0.012s;

Structured Result Format: Machine-readable summary with semicolon delimiters and millisecond-precision timing. Perfect for performance analysis and automated testing.

Metrics: files = processed files, lines = total lines read, matches = pattern occurrences, skipped = unreadable lines, errors = access failures, time = execution time

Architecture

The project follows a modular architecture with clear separation of concerns:

Core Modules

  • main.rs: CLI entry point and argument parsing
  • lib.rs: Core integration layer connecting all modules
  • search.rs: Parallel file processing with Rayon
  • crawler.rs: Directory traversal with symlink support
  • highlighter.rs: Regex-based text highlighting
  • colors.rs: ANSI color management
  • result.rs: Message handling and statistics formatting

Dependencies

Crate Purpose
clap CLI argument parsing
num_cpus Thread optimization
rayon Parallel processing
regex Pattern matching
walkdir Directory traversal

Binary Size: 2.2MB optimized release build with minimal feature flags enabled

Implementation Details

Multi-core Processing: Utilizes cores - 1 threads for optimal performance without system lock-up
Memory Efficient: Line-by-line processing handles files of any size
Structured Streaming: Streams structured matches per file as processing completes
Optimized I/O: Buffered reading and compiled regex reuse

Planned Features

  • Auto-color detection (--color=auto)
  • Silent mode (-s, --silent)
  • Case insensitive search (-i, --ignore-case)
  • Invert matching (-v, --invert-match)
  • Multi-pattern support
  • File type filtering
  • Line number display (-n, --line-number)

Contributing

See CONTRIBUTING.md for guidelines on how to contribute to this project.

This is a learning-focused project demonstrating comprehensive Rust development practices. Contributions that enhance the educational value are especially welcome.

License

This project is open source and available under the MIT License.


Built during the Pragmatic AI Labs Rust Bootcamp