# Instructions for Agents
This file provides guidance to coding agents when working with code in this repository.
## Project Overview
`ock` is a command-line utility for working with table-like data, serving as a simpler and faster replacement for most awk use cases. It's written in Rust using the 2021 edition and uses a selector-based approach to extract specific rows and columns from structured text data.
### Dependencies
- `clap` (4.5.4) - Command-line argument parsing with derive features
- `regex` (1.7.0) - Regular expression support for selectors and delimiters
- `once_cell` (1.21.3) - Lazy static initialization for regex caching
- `lru` (0.16.0) - LRU cache for compiled regex patterns
## Development Commands
### Build
```bash
cargo build # Debug build
cargo build --release # Optimized release build (applies aggressive optimizations from Cargo.toml)
```
### Release Process
The project uses automated release management via `release-plz`:
- **Automatic Version Bumping**: Based on conventional commits (feat → minor, fix → patch, BREAKING CHANGE → major)
- **Changelog Generation**: Automatically maintained in CHANGELOG.md
- **Release Workflow**: Push to main triggers release-plz to create/update a Release PR
- **Publishing**: Merging the Release PR publishes to crates.io and creates GitHub releases
To trigger a release:
1. Ensure all commits use conventional commit format
2. Push changes to main
3. Review and merge the automated Release PR created by release-plz
4. Binary builds automatically trigger on new version tags
**Required Setup**:
- `CARGO_REGISTRY_TOKEN` secret must be configured in GitHub repository settings for crates.io publishing
### Test
```bash
cargo test # Run all tests (unit and integration)
cargo test --test integration_test # Run integration tests only
cargo test test_name # Run specific test by name
cargo test --lib # Run unit tests only
```
### Format & Lint
```bash
cargo fmt # Format code
cargo fmt --check # Check formatting without changes
cargo fmt --all # Format all targets
cargo clippy # Run linter
cargo clippy -- -D warnings # Lint with warnings as errors (CI)
```
### Installation
```bash
cargo install --path . # Install ock locally for testing
```
### Run Examples
```bash
## Code Style & Conventions
- Use rustfmt defaults (4-space indent); run before pushing
- Naming: functions/vars `snake_case`, types `CamelCase`, consts `SCREAMING_SNAKE_CASE`
- Add `///` docs for public items; keep examples small and runnable
- Prefer iterators and borrowing; keep `main` thin and move logic into modules
## Core Architecture
### Module Structure
- `main.rs` - Entry point containing the main parsing logic and output formatting
- Handles input source detection (stdin, file, or literal text)
- Manages the row/column selection pipeline
- Implements column alignment for pretty-printed output
- `cli.rs` - Command-line argument parsing using the `clap` crate
- Defines CLI interface with Args struct
- Implements input parsing logic (parse_input function)
- `selector.rs` - Selector struct and parsing logic for row/column selection syntax
- Implements Python-like slicing syntax (e.g., `1:10:2`)
- Supports both numeric indices and regex patterns
- Contains selector matching logic for rows and columns
- `utils.rs` - Utility functions for regex comparison and text splitting
- Included via `include!()` macro in other modules
- Provides `regex_compare` and `split` helper functions
### Test Organization
- Unit tests are embedded in each module using `#[cfg(test)]` modules and separate test files:
- `cli_tests.rs` - Tests for CLI parsing and input detection
- `main_tests.rs` - Tests for main logic functions and output formatting
- `selector_tests.rs` - Tests for selector parsing and matching logic
- `utils_tests.rs` - Tests for utility functions (regex_compare, split)
- Integration tests in `tests/integration_test.rs`
- End-to-end tests simulating actual CLI usage with various inputs
- Tests for data processing scenarios with different delimiters and selectors
- Dev dependencies include `tempfile` (3.8.0) for temporary file testing
## Key Implementation Details
### Input Processing Flow
1. Parse CLI arguments via `clap`
2. Determine input source (stdin detection, file check, or literal text)
3. Parse row and column selectors into `Selector` structs
4. Split input into rows using row delimiter regex
5. For each row:
- Check if it matches row selectors
- Split into columns and extract matching column indices
- Collect selected cells
6. Format output with aligned columns for pretty printing
### Selector System
- Single value: `5` (selects item at index 5, 1-based)
- Range: `1:10` (selects items 1 through 10, inclusive)
- Range with step: `1:10:2` (selects every 2nd item from 1 to 10)
- Regex: `pid` (case-insensitive partial match against content)
- Multiple selectors: `1,5,10` or `name,pid` (comma-separated)
- Mixed numeric and regex: Supported in the same selector list
### Delimiter Handling
- Default row delimiter: `\n` (newline)
- Default column delimiter: `\s` (whitespace regex)
- Custom delimiters supported via `--row-delimiter` and `--column-delimiter`
- Delimiters are treated as regex patterns
- Special handling for regex metacharacters in delimiters
### Edge Cases and Behavior
- Empty input: Returns empty output
- Out-of-bounds indices: Silently ignored (no error)
- Regex selectors that don't match: No output for those selectors
- Invalid ranges (start > end): Returns empty selection
- Step value of 0: Treated as step 1
- Whitespace-only lines with default delimiter: Filtered out by split()
## Common Usage Patterns
```bash
# Select specific columns from process list
# Filter rows by regex and select columns
# Process CSV files
ock -c 1,3,5 --column-delimiter "," data.csv
# Select row ranges with step
ock -r 1:100:10 large_file.txt # Every 10th row from 1-100
```
## CI/CD & PR Guidelines
### GitHub Actions Workflows
The project uses three automated workflows:
1. **CI Workflow** (`ci.yml`): Runs on all PRs
- Validates formatting with `cargo fmt`
- Runs clippy linting
- Executes all tests
- Builds release binary
2. **Release Workflow** (`release.yml`): Runs on pushes to main
- Uses release-plz for automated version management
- Creates/updates Release PRs with version bumps and changelog
- Publishes to crates.io when Release PR is merged
- Creates git tags for new versions
3. **Binary Build Workflow** (`build-binaries.yml`): Triggers on version tags
- Builds binaries for multiple platforms (Linux, macOS, Windows)
- Supports x86_64 and aarch64 architectures
- Creates GitHub Releases with downloadable binaries
- Generates checksums for all artifacts
### Branch Protection
- The `main` branch has branch protection enabled and cannot be pushed to directly
- All changes must go through pull requests
- Force pushes to `main` are disallowed
- Create feature branches for all work and open PRs to merge into `main`
### Pull Request Requirements
- Commits: concise, imperative subject; reference issues (e.g., `Fix selector step off-by-one (#42)`)
- Always use conventional commits (`feat:`, `fix:`, `docs:`)
- PRs: describe problem, approach, and tradeoffs; link issues; include before/after examples when changing flags or output
- Ensure `cargo fmt`, `cargo clippy`, build, and tests pass before review (CI will verify this)
- Update `README.md` when altering flags, defaults, or examples
## Security Notes
- Be mindful of user-supplied regex and delimiters; avoid catastrophic backtracking
- Validate input and handle errors with clear messages; no new `panic!`s in production code paths
## Known Issues and TODOs
- Step values in selectors have a documented bug (see test_row_range_with_step comment)
- Out-of-bounds column indices return entire row instead of empty (see test_out_of_bounds_indices)
- Consider error handling for invalid selector syntax instead of silent failures
## Performance Optimizations
- Release profile uses aggressive optimizations:
- Strip symbols for smaller binary
- Optimize for size (`opt-level = "z"`)
- Link-time optimization enabled
- Single codegen unit for better optimization
- Panic=abort for smaller binary