ungai 0.6.2

A unique name generator based on Markov Chain
# Usage

This tutorial will guide you through on how to use the cli properly.

>[!note]
> This tutorial assumes you have installed ungai (v0.6.1 or later) on your system.

## Reference

Below is the output of `ungai -h` for reference.

```txt
Usage: ungai [OPTIONS]

Options:
  -s, --smoothing <SMOOTHING>
          Defines how much smoothing to use for the Markov Chain [default: 0]

  -g, --generate
          Whether to generate a name or not

  -c, --count <COUNT>
          How many names to generate [default: 1]

  -w, --write-transitions <WRITE_TRANSITIONS>
          Whether to write transitions to a file for better performance
          in the next run

  -r, --read-transitions <READ_TRANSITIONS>
          Whether to read transitions from a file for better performance

  -f, --train-from-file <TRAIN_FROM_FILE>
          Provide a file from which to train the model

  -t, --temperature <TEMPERATURE>
          Provide temperature scaling for further creativity of the model [default: 1]

      --max <MAX>
          The maximum length a name can have [default: 10]

      --min <MIN>
          The minimum length a name can have [default: 2]

      --rerun <RERUN>
          Number of allowed reruns to generate a single name [default: 10]

  -h, --help
          Print help (see more with '--help')

  -V, --version
          Print version
```

We will be referring to this `help` message throughout the tutorial.


## 1. Basic name generation

For starters lets generate a simple name.

```bash
ungai -g
```

The `-g` flag is the shorthand for `--generate`.
It tells the program that we want to generate a name.

If we just run the cli without providing any flags, then it will just return with a exit code of 0 (meaning success) without actually doing anything.

>[!note]
> When you run the cli for the first time it downloads a pre-trained dataset, regardless of whether you provide a cli flag or not.


Now maybe you want to generate more than one name, then you can provide the `-c` flag followed by the number of names you want to generate.

eg.
```bash
ungai -g -c 5
```

The script above will generate 5 names.

>[!important]
> Note that it is not guaranteed that the cli will generate exactly 5 names, it may generate less although it is guaranteed that it will not generate more than the specified limit.
>
> This is because the names that `ungai` generates are required to follow 3 rules, if the names don't follow these rules then they are discarded.
>
> The 3 rules are explained below.


## 2. Naming Rules

There are 3 constraints on names that `ungai` generates.

These are -

1. If more than one name is generated then it must not be repeated exactly as it was before.

2. The length of a name must not be more than the specified value of `--max` argument. Default value of `--max` is 10.

3. The length of a name must not be less than the specified value of `--min` argument. Default value of `--min` is 2.


If a name is generated that violates these rules, then it is discarded.

Sometimes the cli keeps on violating these rules.
And if you happen to generate 50-100 names, the cli might run into an infinite loop. 

To prevent this, there is set number of times `ungai` is allowed to rerun to regenerate a name, the default limit is 10 reruns.
These allowed reruns can be configured by specifying a number after the `--rerun` argument.

eg.

```bash
ungai -g -c 100 --rerun 20
```

>[!note]
> I tried to generate 1000 names 20 times and `ungai` generated all of them without touching the `--rerun` argument.
>
> So I doubt that you will need it often, but still
> if you generate a 100,000 names then you might need it.

>[!note]
> `ungai` will do nothing if it hit the `--rerun` limit except for exiting the program.

## 3. Configuring The Length of Names

You can do this very easily by adjusting `--min` and `--max` arguments.

eg.

lets generate a name whose length is between 4-6 letters.

```bash
ungai -g --min 4 --max 6
```

This will generate a name with a length more than or equal to 4 but less than or equal to 6.

2 and 10 are the defaults for `--min` and `--max` arguments respectively.

## 4. Smoothing and Temperature

First lets understand what `smoothing` and `temperature` are.

`smoothing` adds a certain value to every transition fixing dataset gaps.

On the other hand `temperature` controls the randomness of the data and increases output variety.

>[!note]
> Both affect the creativity of the model.

>[!note]
> `-s` is the shorthand for `--smoothing`
>
> `-t` is the shorthand for `--temperature`


##### Usage

Below is an example of adding `smoothing`
```bash
ungai -g -s 0.001
```

Here is an example of adding `temperature`
```bash
ungai -g -t 1.0005
```

>[!important]
> You usually don't need to add smoothing to a good and comprehensive dataset.
>
> Smoothing is for small (not so good) datasets
>
> However you can add `temperature` to any dataset, just bear in mind that your chosen `temperature` might not produce a name you want.
>
> So you will need to make multiple tries with different `temperature`s

The following is the output of `--help` for `--temperature` flag

```txt
-t, --temperature <TEMPERATURE>
          Provide temperature scaling for further creativity of the model

          This is similar to `smoothing`

          Temperature Scaling works in the following way-
          temperature > 1.0: More Creative/Random Names
          temperature < 1.0: More Predictable/Repetitive Names
          temperature = 1.0: No change

          [default: 1]
```

Now you know how to use temperatures.


## 5. Reading Names From a File

You can make `ungai` train on your dataset of names.

Just write a `.csv` or `.txt` (or really any UTF-8 encoded file) with the right format.

The right format is that every name must be separated by a **`,`** and you can have newlines in the file though they are not necessary. Also trailing commas are allowed and you can't use only newlines to separate words/names.

Lets train `ungai` on a list of names.

This is the list -

```txt
Aerion,
Valric,
Thalor,
Nyxen,
Draven,
Kaelith,
Zyros,
Morvain,
```

I am adding a smoothing of 0.001 because the above dataset is shallow.

```bash
ungai -g -f your_file.txt -s 0.001
```

When I ran this on my machine `ungai` generated the name **`valor`**.

Now it might repeat the names again and again because the dataset is shallow, but still you can try different `temperature`s and `smoothing`s


## 6. Writing and Reading Transitions From a File

Now if you have a huge dataset and you want to train `ungai` on that dataset then you can!

Just remember that your dataset must be processed and follow the right formatting.

Now you may think that this might be slow if the dataset is really huge.

So to improve performance you can write the transitions (an internal struct used by `ungai`) to a file.

>[!note]
> When you write these transitions to a file, they are compressed and so you should provide a file extension which specifies it is a compressed file (though you are not required to do so, it will still work if you don't).

We can write the transitions of our previous dataset to a file with the command below

```bash
ungai -f your_dataset.txt -w your_transitions.zst
```

Note that I am using the extension `.zst` for the transitions file because `ungai` uses [zstd](https://crates.io/crates/zstd) crate under the hood to compress the transitions struct.


Now if you want to read those transitions and generate a name, run

```bash
ungai -r your_transitions.zst -g
```

Now you are ready to generate more names with `ungai`.

Refer to Reference to know the expansion of various flags.