Crate alog

Source
Expand description

alog is a simple log file anonymizer.

§About

In fact by default alog just replaces the first word on every line of any input stream with a customizable string.

So “log file anonymizer” might be a bit of an overstatement, but alog can be used to (very efficiently) replace the $remote_addr part in many access log formats, e.g. Nginx’ default combined log format:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

By default any parseable $remote_addr is replaced by it’s localhost representation,

  • any valid IPv4 address is replaced by 127.0.0.1,
  • any valid IPv6 address is replaced by ::1 and
  • any String (what might be a domain name) with localhost.

Lines without a ‘first word’ will remain unchanged (but can be skipped with Config::skip set to true).

Starting with version 0.6 all Space and Tabulator (b'\t') and from version 0.7 on all ASCII whitespace characters will be removed from the beginning of each line before replacing any $remote_addr by default. To switch back to the previous behaviour just set Config::trim to false.

With version 0.6 you can replace the $remote_user with "-" (Config::authuser set to true) as well. Defaults to false.

With Config::trim set to false the first word can be the (zero width) anchor ^ if the line starts with an ASCII whitespace character.

With version 0.9 the Config::thorough option was added. If set to true every occurrence of $remote_addr will also be replaced in the remainder of each line.

§Personal data in server logs

The default configuration of popular web servers including Apache Web Server and Nginx collect and store at least two of the following three types of logs:

  1. access logs
  2. error logs (including processing-language logs like PHP)
  3. security audit logs

All of these logs contain personal information by default. IP addresses are specifically defined as personal data by the GDPR. The logs can also contain usernames if your web service uses them as part of their URL structure, and even the referral information that’s logged by default can contain personal information (or other sensitive data).

So keep in mind, just removing the IP / $remote_addr or $remote_user part might not be enough to fully anonymize any given log file.

Structs§

Config
Collection of replacement strings / config flags
IOConfig
INPUT / OUTPUT config
IOError

Functions§

run
Creates a reader (defaults to std::io::Stdin) and writer (defaults to std::io::Stdout) from alog::IOConfig and uses both along with alog::Config to actually replace any first word in reader with strings stored in alog::Config.
run_raw
Like alog::run but will let you pass your own reader and writer. Replacement strings and config flags will still be read from alog::Config.