Expand description
alog
is a simple log file anonymizer.
§About
In fact by default alog
just replaces the first word on every line of any input stream
with a customizable string.
With version 0.6 you can (at a substantial cost of CPU cycles) replace the $remote_user
with "-"
(Config::authuser
set to true
) as well. Defaults to false
.
With Config::trim
set to false
the first word can be the (zero width)
anchor ^ or a single b' '
(Space) separated by a b' '
from the remainder of the line.
This was the default behaviour prior to version 0.6.
So “log file anonymizer” might be a bit of an overstatement, but alog
can be used to (very
efficiently) replace the $remote_addr
part in many access log formats, e.g. Nginx’ default
combined log format:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
By default any parseable $remote_addr
is replaced by it’s localhost representation,
- any valid IPv4 address is replaced by
127.0.0.1
, - any valid IPv6 address is replaced by
::1
and - any String (what might be a domain name) with
localhost
.
Lines without a ‘first word’ will remain unchanged (but can be skipped with Config::skip
set to true
).
Starting with version 0.6 all Space and Tabulator (b'\t'
) and from version 0.7 on all
ASCII whitespace characters will be removed
from the beginning of each line before replacing any $remote_addr
by default.
To switch back to the previous behaviour just set Config::trim
to false
.
§Personal data in server logs
The default configuration of popular web servers including Apache Web Server and Nginx collect and store at least two of the following three types of logs:
- access logs
- error logs (including processing-language logs like PHP)
- security audit logs
All of these logs contain personal information by default. IP addresses are specifically defined as personal data by the GDPR. The logs can also contain usernames if your web service uses them as part of their URL structure, and even the referral information that’s logged by default can contain personal information (or other sensitive data).
So keep in mind, just removing the IP / $remote_addr
or $remote_user
part might not be
enough to fully anonymize any given log file.
Structs§
- Collection of replacement strings / config flags
- INPUT / OUTPUT config
Functions§
- Creates a reader (defaults to
std::io::Stdin
) and writer (defaults tostd::io::Stdout
) fromalog::IOConfig
and uses both along withalog::Config
to actually replace any first word inreader
with strings stored inalog::Config
. - Like
alog::run
but will let you pass your ownreader
andwriter
. Replacement strings and config flags will still be read fromalog::Config
.