timberjack 0.1.0-beta.1

A CLI tool that fells log files with speed and insight
Documentation
# Timber Performance Optimizations Guide


## Overview


Timber is designed to be fast and efficient across various log analysis tasks. This document details the optimizations implemented and provides guidance on how to get the best performance for your use case.

## Key Performance Features


### SIMD Acceleration


Timber uses Single Instruction, Multiple Data (SIMD) CPU instructions to accelerate pattern matching and string processing operations.

**Benefits:**
- 2-5x faster pattern searching compared to traditional methods
- Automatically used when hardware supports it
- Particularly effective for simple text patterns

**Technical Details:**
- Uses CPU vector instructions (SSE/AVX on x86, NEON on ARM)
- Processes multiple bytes in parallel
- Falls back to standard methods when SIMD is unavailable

### Memory-Mapped File Processing


Timber uses memory mapping to efficiently process log files of any size without loading the entire file into memory.

**Benefits:**
- Efficient handling of very large files
- Lower memory usage
- Better cache usage by leveraging the operating system's page cache

**Technical Details:**
- Uses the operating system's virtual memory system to map file contents
- Processes files in chunks to maintain efficiency
- Handles partial lines across chunk boundaries correctly

### Parallel Processing


For large files, Timber automatically uses multiple CPU cores to accelerate processing.

**Benefits:**
- Significant speedup on multi-core systems
- Automatically scales with available CPU cores
- Can be forced on or off with command-line flags

**Technical Details:**
- Automatically enabled for files larger than 10MB
- Divides files into chunks at line boundaries
- Uses Rayon library for parallel processing
- Merges results from all threads efficiently

### Optimized Memory Usage


Timber implements several techniques to minimize memory usage:

**Benefits:**
- Reduced memory footprint
- Better cache locality
- More efficient processing of repeated log lines

**Technical Details:**
- String deduplication for repeated log lines
- Fast hash maps for lookup operations
- Limited storage of unique lines (capped at 10,000 entries by default)

### Count Mode


The `--count` flag provides a highly optimized mode for simply counting matching logs.

**Benefits:**
- 2-10x faster than full processing
- Minimal memory usage
- Competitive with specialized tools like grep and ripgrep

**Technical Details:**
- Skips result collection and formatting
- Uses optimized chunk processing
- Applies filters directly on the input stream

## Benchmark Results


Results from benchmarking against common tools (time in seconds, lower is better):

| Operation | 10K lines | 100K lines | 1M lines |
|-----------|-----------|------------|----------|
| timber --chop-count | 0.164s | 0.181s | 0.401s |
| grep | 0.166s | 0.181s | 0.296s |
| ripgrep | 0.198s | 0.183s | 0.236s |
| timber --level-count | 0.167s | 0.199s | 0.487s |
| timber --chop | 0.169s | 0.239s | 0.640s |
| timber --stats | 0.258s | 0.444s | 2.735s |

## Performance Tips


### General Tips


1. **Use the `--count` flag** when you only need counts
2. **Limit pattern complexity** - simple patterns are faster than complex regex
3. **Be specific with filters** - combining `--chop` and `--level` can be faster than broad searches

### For Large Files


1. **Let Timber auto-detect** whether to use parallel mode
2. **Consider using `--level` filter first** to reduce the dataset before applying patterns
3. **Skip `--stats` if not needed** - statistical analysis is more resource-intensive

### For Maximum Speed


1. **Use `--count` with a specific pattern** for fastest performance
2. **Avoid `--show-unique` on large files** as it requires more memory
3. **Prefer simple literal patterns** over complex regex when possible

### For Minimal Memory Usage


1. **Use `--sequential` for lower memory usage** at the cost of some speed
2. **Avoid collecting statistics** if not needed
3. **Apply specific filters** to reduce the number of matching lines

## Technical Implementation Details


### Pattern Matching


Timber uses a tiered approach to pattern matching:

1. **SIMD-accelerated literal matching** for simple patterns (fastest)
2. **Standard literal matching** as fallback
3. **Regex matching** for complex patterns (most flexible)

The pattern matcher selection is automatic based on pattern complexity and available hardware.

### File Processing


File processing follows these steps:

1. **Memory-map the file** for efficient access
2. **Divide into chunks** based on processing mode
3. **Process each chunk** with appropriate filters
4. **Merge results** from all chunks

For parallel processing, chunks are processed independently and results are merged afterward.

## Conclusion


Timber offers a balance of speed, efficiency, and rich analysis capabilities. For most operations, it's competitive with specialized tools while providing much more powerful analysis options.

The optimizations described in this document make Timber suitable for a wide range of log analysis tasks, from quick searches on small logs to comprehensive analysis of large log files.