logpile 0.2.0 - Docs.rs

# logpile Architecture

## Project Structure

```
logpile/
├── Cargo.toml                # Project dependencies and metadata
├── src/
│   ├── main.rs              # Entry point
│   ├── lib.rs               # Library exports
│   ├── cli.rs               # Command-line argument parsing (clap)
│   ├── timestamp.rs         # Timestamp parsing and auto-detection
│   ├── bucket.rs            # Time-based bucketing logic
│   ├── reader.rs            # File/stdin reading with gzip support
│   ├── output.rs            # Output formatters (table, CSV, JSON)
│   ├── plot.rs              # Plotting (ASCII and bitmap)
│   └── processor.rs         # Main processing orchestration
├── examples/
│   ├── sample.log           # Example log file for testing
│   └── sample.log.gz        # Gzipped example
├── test_examples.sh         # Test script
├── README.md                # User documentation
├── ARCHITECTURE.md          # This file
└── LICENSE                  # MIT License
```

## Module Overview

### `main.rs`
- Entry point for the binary
- Parses CLI arguments using clap
- Creates and runs the LogProcessor

### `cli.rs`
- Defines the `Args` struct with all CLI options
- Uses clap's derive macro for argument parsing
- Provides validation and helper methods
- Handles the special case of `--no-default-pattern`

### `timestamp.rs`
- `TimestampParser` struct for parsing timestamps from log lines
- Auto-detects common timestamp formats:
  - ISO 8601
  - Common log formats (YYYY-MM-DD HH:MM:SS)
  - Syslog format
  - Apache/Nginx formats
- Supports custom time format strings via `--time-format`
- Uses regex to extract timestamp candidates from log lines

### `bucket.rs`
- `TimeBucket` struct for time-based aggregation
- Supports fixed bucket sizes (in seconds)
- Supports automatic bucket size selection based on time range
- Uses `BTreeMap` for ordered bucket storage
- Tracks first/last timestamps for time range calculation

### `reader.rs`
- `LogReader` enum for different input sources:
  - Plain text files
  - Gzipped files (.gz)
  - Stdin
- Transparent decompression for gzipped files using flate2
- Provides unified iterator interface for all sources

### `output.rs`
- Functions for different output formats:
  - `output_table()`: Human-readable table with borders
  - `output_csv()`: CSV format for data export
  - `output_json()`: JSON format with metadata
- Uses serde for JSON serialization
- Uses csv crate for proper CSV formatting

### `plot.rs`
- `plot_ascii()`: ASCII charts using textplots
  - Uses Braille characters for smooth lines
  - Shows time range and bucket information
- `plot_png()`: Bitmap charts using plotters
  - Generates PPM format (can be converted to PNG)
  - Includes line series and data points
  - Labeled axes with timestamps

### `processor.rs`
- `LogProcessor`: Main orchestration logic
- Implements two modes:
  - **Batch mode**: Process files once
  - **Follow mode**: Continuously monitor file (like tail -f)
- Compiles regex patterns
- Iterates through log lines
- Extracts timestamps and matches patterns
- Aggregates matches into time buckets
- Calls appropriate output formatter

## Data Flow

```
1. CLI Arguments → Args struct (clap parsing)
2. Args → LogProcessor initialization
   - Compile regex patterns
   - Create TimestampParser
   - Initialize TimeBucket
3. LogProcessor → Read input
   - Files or stdin
   - Decompress if .gz
4. For each line:
   - Check regex match
   - Parse timestamp
   - Add to bucket
5. After processing:
   - Get bucket data
   - Format output (table/CSV/JSON/plot)
```

## Key Design Decisions

### 1. **Modular Architecture**
Each module has a single responsibility:
- Separation of concerns
- Easy to test
- Clear interfaces

### 2. **Regex-Based Timestamp Extraction**
- Flexible auto-detection
- Handles various formats without user configuration
- Falls back to custom format if provided

### 3. **BTreeMap for Buckets**
- Maintains sorted order by timestamp
- Efficient range queries
- Natural ordering for output

### 4. **Iterator-Based File Reading**
- Memory efficient for large files
- Uniform interface for all input types
- Lazy evaluation

### 5. **Enum for Output Formats**
- Type-safe format selection
- Compile-time validation
- Easy to extend

### 6. **Auto Bucket Size**
- Aims for ~30 buckets for good visualization
- Rounds to "nice" intervals (1m, 5m, 15m, 1h, 6h, 1d, etc.)
- Adapts to time range automatically

## Dependencies

### Core
- `clap`: CLI argument parsing with derive macros
- `chrono`: Timestamp parsing and manipulation
- `regex`: Pattern matching
- `anyhow`/`thiserror`: Error handling

### I/O
- `flate2`: Gzip decompression
- `csv`: CSV output formatting
- `serde`/`serde_json`: JSON serialization

### Visualization
- `textplots`: ASCII chart rendering
- `plotters`: Bitmap chart generation (minimal features to avoid system dependencies)

## Performance Considerations

1. **Streaming Processing**: Lines are processed as they're read, not loaded into memory
2. **Compiled Regexes**: Patterns are compiled once at startup
3. **Efficient Bucketing**: O(log n) insertion into BTreeMap
4. **Lazy Evaluation**: Iterator-based pipeline

## Future Enhancements

### Planned Features
- [ ] Severity grouping (INFO/WARN/ERROR breakdown)
- [ ] Prometheus metrics export
- [ ] Interactive TUI with zoom/pan
- [ ] Multi-threaded file processing
- [ ] Real tail -f implementation (inotify on Linux)
- [ ] Histogram distribution analysis
- [ ] Custom aggregation functions (min/max/avg)
- [ ] Support for structured logs (JSON logs)

### Possible Optimizations
- [ ] Parallel processing of multiple files
- [ ] Memoization of timestamp parsing
- [ ] Skip non-matching lines early
- [ ] Compressed output for large result sets