ock 0.1.2

A simple, fast command line utility for working with table-like data
# Instructions for Agents

This file provides guidance to coding agents when working with code in this repository.

## Project Overview

`ock` is a command-line utility for working with table-like data, serving as a simpler and faster replacement for most awk use cases. It's written in Rust using the 2021 edition and uses a selector-based approach to extract specific rows and columns from structured text data.

### Dependencies
- `clap` (4.5.4) - Command-line argument parsing with derive features
- `regex` (1.7.0) - Regular expression support for selectors and delimiters
- `once_cell` (1.21.3) - Lazy static initialization for regex caching
- `lru` (0.16.0) - LRU cache for compiled regex patterns

## Development Commands

### Build
```bash
cargo build           # Debug build
cargo build --release # Optimized release build (applies aggressive optimizations from Cargo.toml)
```

### Release Process
The project uses automated release management via `release-plz`:
- **Automatic Version Bumping**: Based on conventional commits (feat → minor, fix → patch, BREAKING CHANGE → major)
- **Changelog Generation**: Automatically maintained in CHANGELOG.md
- **Release Workflow**: Push to main triggers release-plz to create/update a Release PR
- **Publishing**: Merging the Release PR publishes to crates.io and creates GitHub releases

To trigger a release:
1. Ensure all commits use conventional commit format
2. Push changes to main
3. Review and merge the automated Release PR created by release-plz
4. Binary builds automatically trigger on new version tags

**Required Setup**:
- `CARGO_REGISTRY_TOKEN` secret must be configured in GitHub repository settings for crates.io publishing

### Test
```bash
cargo test                           # Run all tests (unit and integration)
cargo test --test integration_test   # Run integration tests only
cargo test test_name                 # Run specific test by name
cargo test --lib                    # Run unit tests only
```

### Format & Lint
```bash
cargo fmt            # Format code
cargo fmt --check    # Check formatting without changes
cargo fmt --all      # Format all targets
cargo clippy         # Run linter
cargo clippy -- -D warnings  # Lint with warnings as errors (CI)
```

### Installation
```bash
cargo install --path .  # Install ock locally for testing
```

### Run Examples
```bash
ps aux | cargo run -- -c pid -r 0:10  # Example: filter process list
```

## Code Style & Conventions

- Use rustfmt defaults (4-space indent); run before pushing
- Naming: functions/vars `snake_case`, types `CamelCase`, consts `SCREAMING_SNAKE_CASE`
- Add `///` docs for public items; keep examples small and runnable
- Prefer iterators and borrowing; keep `main` thin and move logic into modules

## Core Architecture

### Module Structure
- `main.rs` - Entry point containing the main parsing logic and output formatting
  - Handles input source detection (stdin, file, or literal text)
  - Manages the row/column selection pipeline
  - Implements column alignment for pretty-printed output
- `cli.rs` - Command-line argument parsing using the `clap` crate
  - Defines CLI interface with Args struct
  - Implements input parsing logic (parse_input function)
- `selector.rs` - Selector struct and parsing logic for row/column selection syntax
  - Implements Python-like slicing syntax (e.g., `1:10:2`)
  - Supports both numeric indices and regex patterns
  - Contains selector matching logic for rows and columns
- `utils.rs` - Utility functions for regex comparison and text splitting
  - Included via `include!()` macro in other modules
  - Provides `regex_compare` and `split` helper functions

### Test Organization
- Unit tests are embedded in each module using `#[cfg(test)]` modules and separate test files:
  - `cli_tests.rs` - Tests for CLI parsing and input detection
  - `main_tests.rs` - Tests for main logic functions and output formatting
  - `selector_tests.rs` - Tests for selector parsing and matching logic
  - `utils_tests.rs` - Tests for utility functions (regex_compare, split)
- Integration tests in `tests/integration_test.rs`
  - End-to-end tests simulating actual CLI usage with various inputs
  - Tests for data processing scenarios with different delimiters and selectors
- Dev dependencies include `tempfile` (3.8.0) for temporary file testing

## Key Implementation Details

### Input Processing Flow
1. Parse CLI arguments via `clap`
2. Determine input source (stdin detection, file check, or literal text)
3. Parse row and column selectors into `Selector` structs
4. Split input into rows using row delimiter regex
5. For each row:
   - Check if it matches row selectors
   - Split into columns and extract matching column indices
   - Collect selected cells
6. Format output with aligned columns for pretty printing

### Selector System
- Single value: `5` (selects item at index 5, 1-based)
- Range: `1:10` (selects items 1 through 10, inclusive)
- Range with step: `1:10:2` (selects every 2nd item from 1 to 10)
- Regex: `pid` (case-insensitive partial match against content)
- Multiple selectors: `1,5,10` or `name,pid` (comma-separated)
- Mixed numeric and regex: Supported in the same selector list

### Delimiter Handling
- Default row delimiter: `\n` (newline)
- Default column delimiter: `\s` (whitespace regex)
- Custom delimiters supported via `--row-delimiter` and `--column-delimiter`
- Delimiters are treated as regex patterns
- Special handling for regex metacharacters in delimiters

### Edge Cases and Behavior
- Empty input: Returns empty output
- Out-of-bounds indices: Silently ignored (no error)
- Regex selectors that don't match: No output for those selectors
- Invalid ranges (start > end): Returns empty selection
- Step value of 0: Treated as step 1
- Whitespace-only lines with default delimiter: Filtered out by split()

## Common Usage Patterns
```bash
# Select specific columns from process list
ps aux | ock -c 2,11

# Filter rows by regex and select columns
ps aux | ock -r python -c pid,command

# Process CSV files
ock -c 1,3,5 --column-delimiter "," data.csv

# Select row ranges with step
ock -r 1:100:10 large_file.txt  # Every 10th row from 1-100
```

## CI/CD & PR Guidelines

### GitHub Actions Workflows
The project uses three automated workflows:

1. **CI Workflow** (`ci.yml`): Runs on all PRs
   - Validates formatting with `cargo fmt`
   - Runs clippy linting
   - Executes all tests
   - Builds release binary

2. **Release Workflow** (`release.yml`): Runs on pushes to main
   - Uses release-plz for automated version management
   - Creates/updates Release PRs with version bumps and changelog
   - Publishes to crates.io when Release PR is merged
   - Creates git tags for new versions

3. **Binary Build Workflow** (`build-binaries.yml`): Triggers on version tags
   - Builds binaries for multiple platforms (Linux, macOS, Windows)
   - Supports x86_64 and aarch64 architectures
   - Creates GitHub Releases with downloadable binaries
   - Generates checksums for all artifacts

### Branch Protection
- The `main` branch has branch protection enabled and cannot be pushed to directly
- All changes must go through pull requests
- Force pushes to `main` are disallowed
- Create feature branches for all work and open PRs to merge into `main`

### Pull Request Requirements
- Commits: concise, imperative subject; reference issues (e.g., `Fix selector step off-by-one (#42)`)
- Always use conventional commits (`feat:`, `fix:`, `docs:`)
- PRs: describe problem, approach, and tradeoffs; link issues; include before/after examples when changing flags or output
- Ensure `cargo fmt`, `cargo clippy`, build, and tests pass before review (CI will verify this)
- Update `README.md` when altering flags, defaults, or examples

## Security Notes

- Be mindful of user-supplied regex and delimiters; avoid catastrophic backtracking
- Validate input and handle errors with clear messages; no new `panic!`s in production code paths

## Known Issues and TODOs
- Step values in selectors have a documented bug (see test_row_range_with_step comment)
- Out-of-bounds column indices return entire row instead of empty (see test_out_of_bounds_indices)
- Consider error handling for invalid selector syntax instead of silent failures

## Performance Optimizations
- Release profile uses aggressive optimizations:
  - Strip symbols for smaller binary
  - Optimize for size (`opt-level = "z"`)
  - Link-time optimization enabled
  - Single codegen unit for better optimization
  - Panic=abort for smaller binary