fakelogs 0.1.10-75501d4

fakelogs is a random log generator. It can be used for load testing of log parsers.
Documentation
# Fakelogs

[Fakelogs](https://gitlab.com/ufoot/fakelogs) is a random log generator.
It can be used for load testing of log parsers.

It is written in [Rust](https://www.rust-lang.org/)
and is mostly a toy project to ramp up on the language.
It might however be useful. Use at your own risk.

![Fakelogs icon](https://gitlab.com/ufoot/fakelogs/raw/master/fakelogs.png)

# Status

[![Build Status](https://gitlab.com/ufoot/fakelogs/badges/master/pipeline.svg)](https://gitlab.com/ufoot/fakelogs/pipelines)

Current version is 0.1.10.

# Install

No install target yet, copy the `fakelogs` binary in your `$PATH` if you wish, that's all.

A few commands which may prove useful:

```sh
cargo build             # build debug binary in ./target/debug/
cargo build --release   # build release binary in ./target/release/
cargo test              # launch tests
rustfmt src/*.rs        # format code
./docker-build.sh       # build Docker image with version tag
./bump-version.sh       # bump minor version number
```

# Usage

Simply launch:

```
cargo run
```

Or just run the binary directly:

```
./target/debug/fakelogs
./target/release/fakelogs
```

Alternatively, using docker:

```
docker run ufoot/fakelogs
```

To pass options:

```
cargo run -- --csv -100
```

By default, the generated lines follow the [Apache common line format](https://httpd.apache.org/docs/1.3/logs.html#common), so look like:

```
127.0.0.1 - james [09/May/2018:16:00:39 +0000] "GET /report HTTP/1.0" 200 123
127.0.0.1 - jill [09/May/2018:16:00:41 +0000] "GET /api/user HTTP/1.0" 200 234
127.0.0.1 - frank [09/May/2018:16:00:42 +0000] "POST /api/user HTTP/1.0" 200 34
127.0.0.1 - mary [09/May/2018:16:00:42 +0000] "POST /api/user HTTP/1.0" 503 12
```

There's a `-c` or `--csv` option, if you call `fakelogs -c` you get an alternate custom CSV format:

```
"10.0.0.4","-","apache",1549573860,"GET /api/user HTTP/1.0",200,1234
```

If you pass an integer after a dash, it defines the average number of lines per second. The default is 1000. Maximum is 1000000.
Eg to change the output to 10000 lines per second:

```
fakelogs -10000
```

Other standard options include:

* `-h`, `--help`: display a short help.
* `-v`, `--version`: display version.
* `--no-high-card`: disable high cardinality, the random 4 letters sections are replaced by `xxxx`
* `--no-time-skew`: disable time skewing, all logs look, on an average, as if they are just from now, and not 30 minutes old.
* `--no-time-jitter`: disable time jittering, all logs have strict increasing time.
* `--no-header`: skip the header line
* `--no-junk`: no random junk lines
* `--no-burst`: no random burst behavior, allows output at a constant rate

# Logs content

The logs may look random, but they follow a few patterns:

* IPs are chosen in a constant, finite list
* users are chosen in a constant, finite list
* HTTP codes are distributed with:
  * 50% of 2XXs
  * 25% of 3XXs
  * 20% of 4XXs
  * 5% of 5XXs
* request methods are distributed with:
  * 60% of GETs
  * 20% of POSTs
  * 20% of HEADs
* the URLs are of the form `/section/XXXX-file.ext` or `/XXXX/file.txt` with `XXXX` being totally random where section can be:
  * 50% of `yolo` (eg: `/yolo/wE5d-index.html`)
  * 15% of `foo/bar`
  * 15% of `bar/foo`
  * 15% of "no section" (so URL of the form `/w3QL/secret.txt`)
  * 5% of `pizzapino`
* size is uniformly distributed between 100 bytes and 19,9k (average is 10k).
* generally, timestamps are generated to match the generation time, minus 30 minutes, so log appear, on an average, to be from half an hour ago.
* but... 10% of the time timestamp is shifted in the past or in the future, by up to 2 minutes, with an average of 1 minute. This means timestamps are not increasing, order is not respected.
* every 5 seconds, the rate may changes, it can either be just one line per second (slow output) or 2500 lines per second (fast output). The ratio is:
  * 40% of fast output
  * 60% of slow output
  * on an average (including the slow output) the throughput should be slightly above 1000 lines per second.
* when the default output of 1000 lines per second is changed, all numbers above are scaled, but the slow output is always one line per second.
* once out of 1000, an invalid line containing `Your attention please, this is a hack!` pops out.

# License

Fakelogs is licensed under the [MIT](https://gitlab.com/ufoot/fakelogs/blob/master/LICENSE) license.