pg-logstats 0.1.0

PostgreSQL log investigation CLI for query-family findings and follow-up SQL
Documentation
# Testing Guide for pg-logstats

This document provides comprehensive instructions for running tests in various configurations for the pg-logstats PostgreSQL log analysis tool.

## Test Structure

The test suite is organized into several categories:

### 1. Unit Tests (`tests/unit/`)
- **Parser Tests** (`parser_tests.rs`): Tests for PostgreSQL stderr log parsing
- **Analytics Tests** (`analytics_tests.rs`): Tests for query analysis and metrics calculation
- **Output Tests** (`output_tests.rs`): Tests for text and JSON output formatting

### 2. Integration Tests (`tests/integration_tests.rs`)
- End-to-end CLI testing with sample log files
- Docker environment testing
- Error handling scenarios
- Performance benchmarks

### 3. Test Data (`tests/test_data/`)
- Utilities for generating various types of test log files
- Expected output files for validation
- Edge cases including empty files, malformed lines, and large files

## Running Tests

### Basic Test Execution

```bash
# Run the canonical local validation path
make check

# Run all tests
make test

# Run tests with output
cargo test -- --nocapture

# Run specific test module
cargo test parser_tests
cargo test analytics_tests
cargo test output_tests
cargo test integration_tests

# Run tests matching a pattern
cargo test test_parse_simple_statement
```

### Unit Tests

```bash
# Run all unit tests
cargo test --test parser_tests
cargo test --test analytics_tests
cargo test --test output_tests

# Run specific unit test categories
cargo test parser_unit_tests
cargo test analytics_unit_tests
cargo test text_formatter_tests
cargo test json_formatter_tests
```

### Integration Tests

```bash
# Run all integration tests
cargo test --test integration_tests

# Run specific integration test categories
cargo test cli_basic_tests
cargo test file_processing_tests
cargo test error_handling_tests
cargo test performance_tests
```

### Property-Based Tests

```bash
# Run property-based tests
cargo test property_based_tests
cargo test property_based_analytics_tests
```

### Performance and Benchmark Tests

```bash
# Run performance tests
cargo test performance_tests

# Run with release mode for accurate performance measurements
cargo test --release performance_tests

# Run memory usage tests
cargo test memory_usage

# Run benchmark tests
cargo test benchmark
```

## Test Configuration

### Environment Variables

```bash
# Enable debug logging during tests
RUST_LOG=debug cargo test

# Set custom test timeout
RUST_TEST_TIME_UNIT=60000 cargo test

# Run tests in single thread (useful for debugging)
cargo test -- --test-threads=1
```

### Test Data Generation

The test suite includes utilities for generating various types of test data:

```bash
# Generate test data (done automatically during tests)
cargo test generate_test_data

# Test with large datasets
cargo test test_performance_with_large_dataset

# Test with edge cases
cargo test edge_case_tests
```

## Continuous Integration

### Buildkite Pipeline

`pg-logstats` should use Buildkite as its CI surface, matching the infra
direction used in `~/code/pgqrs`.

When CI is added or updated, prefer a checked-in `.buildkite/pipeline.yml` that
delegates to the same local validation commands developers run manually. Avoid
re-stating raw cargo commands inline when a local task runner exists.

Expected shape:

- Buildkite bootstrap/setup stays in pipeline or helper scripts
- validation runs through canonical local commands such as `make check`
- packaging smoke checks are first-class CI steps once added

### Test Coverage

```bash
# Install cargo-tarpaulin for coverage
cargo install cargo-tarpaulin

# Generate coverage report
cargo tarpaulin --out Html --output-dir coverage/

# Generate coverage with specific test patterns
cargo tarpaulin --tests parser_tests analytics_tests output_tests
```

## Test Categories and Best Practices

### 1. Parser Tests
- **Log Line Formats**: Tests various PostgreSQL log line formats
- **Edge Cases**: Empty lines, malformed entries, continuation lines
- **Multi-line Statements**: Complex queries spanning multiple lines
- **Timestamp Parsing**: Different timestamp formats and timezones
- **Error Handling**: Invalid log entries and parsing failures

### 2. Analytics Tests
- **Query Classification**: SELECT, INSERT, UPDATE, DELETE, DDL, OTHER
- **Query Normalization**: Parameter replacement, literal normalization
- **Performance Metrics**: Duration calculations, percentiles
- **Frequency Analysis**: Most frequent and slowest queries
- **Error Rate Calculation**: Error counting and rate calculation

### 3. Output Tests
- **Text Formatting**: Human-readable output with colors
- **JSON Formatting**: Structured output with metadata
- **Edge Cases**: Special characters, Unicode, very long queries
- **Performance**: Large dataset formatting performance

### 4. Integration Tests
- **CLI Interface**: Command-line argument parsing and validation
- **File Processing**: Single files, multiple files, directories
- **Output Formats**: Text and JSON output validation
- **Error Scenarios**: Missing files, invalid arguments
- **Performance**: Large file processing benchmarks

## Debugging Tests

### Running Individual Tests

```bash
# Run a specific test with debug output
cargo test test_parse_simple_statement -- --nocapture

# Run tests with backtrace on panic
RUST_BACKTRACE=1 cargo test

# Run tests with full backtrace
RUST_BACKTRACE=full cargo test
```

### Test Debugging Tips

1. **Use `println!` or `dbg!`** for debugging test values
2. **Check test data** in `tests/test_data/` for expected inputs
3. **Verify file paths** when tests fail with file not found errors
4. **Check permissions** for file creation/deletion tests
5. **Use `--nocapture`** to see test output

## Performance Testing

### Benchmark Configuration

```bash
# Run performance tests in release mode
cargo test --release performance_tests

# Run with specific performance thresholds
cargo test test_performance_with_large_dataset

# Memory usage validation
cargo test test_memory_usage
```

### Performance Thresholds

The test suite includes performance assertions:
- **Parser Performance**: < 1000ms for 1000 log lines
- **Analytics Performance**: < 1000ms for 1000 queries
- **Output Formatting**: < 1000ms for large datasets
- **Memory Usage**: Reasonable memory consumption for large inputs

## Mocking and Test Doubles

### External Dependencies

The test suite uses mocking for:
- **File System Operations**: Using `tempfile` for temporary directories
- **Time-based Tests**: Fixed timestamps for reproducible results
- **External Commands**: Mocked CLI interactions

### Test Data Management

- **Temporary Files**: Automatically cleaned up after tests
- **Deterministic Data**: Fixed seeds for reproducible test data
- **Edge Case Coverage**: Comprehensive edge case scenarios

## Contributing to Tests

### Adding New Tests

1. **Unit Tests**: Add to appropriate module in `tests/unit/`
2. **Integration Tests**: Add to `tests/integration_tests.rs`
3. **Test Data**: Add generators to `tests/test_data/mod.rs`
4. **Documentation**: Update this README with new test categories

### Test Naming Conventions

- **Unit Tests**: `test_function_name_scenario`
- **Integration Tests**: `test_cli_feature_scenario`
- **Property Tests**: `property_description`
- **Performance Tests**: `test_performance_scenario`

### Test Organization

- **Group related tests** in modules
- **Use descriptive test names** that explain the scenario
- **Include both positive and negative test cases**
- **Test edge cases and error conditions**
- **Validate both success and failure paths**

## Troubleshooting

### Common Test Failures

1. **File Not Found**: Check test data generation and paths
2. **Permission Denied**: Ensure test has write permissions
3. **Timeout**: Increase timeout for performance tests
4. **Assertion Failures**: Check expected vs actual values
5. **Docker Issues**: Ensure Docker is running for Docker tests

### Getting Help

- Check test output with `--nocapture`
- Use `RUST_LOG=debug` for detailed logging
- Review test data in `tests/test_data/`
- Check CI logs for environment-specific issues
- Run tests individually to isolate problems

## Test Metrics

The test suite aims for:
- **Code Coverage**: > 90%
- **Test Execution Time**: < 30 seconds for full suite
- **Performance Regression**: No degradation in benchmark tests
- **Memory Usage**: Stable memory consumption patterns
- **Error Coverage**: All error paths tested