alog 0.6.4

Anonymize 'Combined Log Format' data
Documentation
# alog

[![Crates.io](https://img.shields.io/crates/v/alog.svg)](https://crates.io/crates/alog)
[![CI](https://github.com/thyrc/alog/workflows/Rust/badge.svg)](https://github.com/thyrc/alog/actions?query=workflow%3ARust)
[![Documentation](https://docs.rs/alog/badge.svg)](https://docs.rs/alog)
[![GitHub license](https://img.shields.io/github/license/thyrc/alog.svg)](https://github.com/thyrc/alog/blob/master/LICENSE)

`alog` is a simple log file anonymizer.

## About

In fact by default `alog` just replaces the first *word* on every line of any input stream
with a customizable string.

With version 0.6

* you can (at a substantial cost of CPU cycles) replace the `$remote_user` with '-' as well and
* by default any leading Spaces or Tabs will be removed from every line before replacing any `$remote_addr`.

So "log file anonymizer" might be a bit of an overstatement, but `alog` can be used to (very
efficiently) replace the `$remote_addr` part in many access log formats, e.g. Nginx' default
combined log format:

```text
log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';
```

By default any parseable `$remote_addr` is replaced by it's *localhost* representation,

* any valid IPv4 address is replaced by *127.0.0.1*,
* any valid IPv6 address is replaced by *::1* and
* any String (what might be a domain name) with *localhost*.

Lines without a `$remote_addr` part will remain unchanged (but can be skipped).

## Building alog

With version 0.3 `[features]` where added, so that the library crate won't pull unneeded
dependencies anymore.

### Commandline Tool

To build the `alog` commandline tool you now have to expicitly add `--features`.

```shell
cargo build --features alog-cli
```

or

```shell
cargo build --all-features
```

## Usage

### Commandline tool

Run cli-tool with `--help`.

```shell
./target/release/alog --help
```

### Library

Calling `run()`

```rust
extern crate alog;

fn main() {
    let mut io_conf = alog::IOConfig::default();
    let mut conf = alog::Config::default();

    io_conf.push_input("/tmp/test.log");
    conf.set_ipv4_value("0.0.0.0");

    alog::run(&conf, &io_conf);
}
```

or `run_raw()`

```rust
extern crate alog;

use std::io::Cursor;

fn main() {
    let mut buffer = vec![];

    alog::run_raw(
        &alog::Config {
            ipv4: "XXX",
            ..Default::default()
        },
        Cursor::new(b"8.8.8.8 test line"),
        &mut buffer,
    );

    assert_eq!(buffer, b"XXX test line");
}
```

## About `Config::authuser`

With version 0.6 `alog` can be used to replace the `$remote_user` field with '-', but this
feature comes with a couple of peculiarities.

This feature should work fine with standard Common / Combined Log formatted files, but...

* There will be a significant hit on performance (synthetic benchmarking suggests ~625MB/s
  instead of ~1100MB/s on my machine, but still better than Perl's ~115MB/s ;)
* Used with `Config::trim` set to `false` and malformatted files the performance hit will be
  even worse and removal of the `$remote_user` field will fail altogether if no `$time_local`
  field is found.
* The `$time_local` field is expected to start with '[' followed by a decimal number. E.g.:
  "[10/Oct/2000:13:55:36 -0700]"
* There is an optimization in place to reduce the performance hit with real-life log files,
  but this leads to `$remote_user` fields *starting* with "- [" _not_ being replaced! So in
  
  `"8.8.8.8 - - [frank] [10/Oct/2000:13:55:36 -0700] GET /apache_pb.gif HTTP/1.0 200 2326"`

  "frank" will still be "frank". This optimization can be disabled.

## Project status

`alog` started as a replacement for a <10 line Perl script running on an old backup host.
So nothing shiny.. but it helped me learning some Rust (and crates.io) basics.

With version 0.6 `alog` is feature complete. It doesn't do much, but it does it quite well.
At some point I might re-use this crate and try harder to actually anonymize data. But for
now, this is it.

I will still fix bugs when (and if) I find them, so `alog` is now passively-maintained.