conf 0.4.2 - Docs.rs

# conf

`conf` is a `derive`-based config parser aimed at the practically-minded developer building large web projects and applications.

[![Crates.io](https://img.shields.io/crates/v/conf?style=flat-square)](https://crates.io/crates/conf)
[![Crates.io](https://img.shields.io/crates/d/conf?style=flat-square)](https://crates.io/crates/conf)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](LICENSE-APACHE)
[![License](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](LICENSE-MIT)
[![Build Status](https://img.shields.io/github/actions/workflow/status/cbeck88/conf-rs/ci-rust.yml?branch=develop&style=flat-square)](https://github.com/cbeck88/conf-rs/actions/workflows/ci-rust.yml?query=branch%3Adevelop)

[API Docs](https://docs.rs/conf/latest/conf/) | [Proc-macro Reference](./REFERENCE.md) | [Examples](./examples)

## Overview

[`conf`](https://docs.rs/conf/latest/conf/) is a rewrite of `clap-derive` with a similar proc macro API, but a different architecture and different goals. It uses [`clap`](https://docs.rs/clap/4.5.8/clap/) under the hood to parse CLI arguments and generate help text, but it is not a fork. It offers some powerful features and support that `clap-derive` does not, which help with the configuration of large projects. But it also doesn't offer some features of `clap`.

The features that you get for this bargain are:

* You can **assign a prefix to a structure's fields when flattening** it into another structure, and you can similarly do `env` prefixing in a controlled way.
* **You get ALL the errors and not just one of them** if some required env is missing and/or several of the values are invalid. In my searching I found that surprisingly few config crates out there actually do this. Very helpful if your deployments take a while.
* **Isolation & testability around `env`**. `clap` only supports reading env values from `std::env::var_os`.
  * If you want to test what happens when different variables are set, your tests can become racy.
  * If you want to test a component that takes config as an argument, and use `::parse_from` to initialize the config, then your tests will pass or fail depending on your local env.
  * If you want to implement `Default` based on the default values your declared on your structure, you can't really because you can't isolate it from `env`.
  * `conf` lets you pass an iterator to represent a snapshot of the environment.
* **Support for `env` aliases**. `clap` supports aliases for command-line arguments but not for `env`. Make changes without breaking compatibility.
* **You can declare fields which are only read from `env`** and cannot be read from args at all.
* **You can declare fields which represent secrets.** This controls whether or not the entire value should be printed in error messages if it fails to parse.
* **Support for an optional-flatten syntax**. This can be simpler and more idiomatic than using argument groups and such in `clap-derive`.
* **Support for user-defined validation predicates**. This allows you to express constraints that can't be expressed in `clap`.
* **Support for introspection**. This means that after defining your `Conf` struct, you can inspect all the program options programmatically and generate other content, such as an `.env` file template. `Conf` also supports exposing the "value source" for each individual value that it loads, so that you can further understand not just what config was loaded, but where those values came from.
* **Support for layered config**. This means that you can use structured data loaded from a file as an additional source for config values, alongside args and env.

In addition to `args` and `env`, `conf` supports consuming config content in any [`serde`](https://docs.rs/serde/latest/serde/)-compatible format, such as JSON, YAML, TOML, etc., as a hierarchical config layer.
The same commitment to "All the errors and not just one of them" holds. There are several advantages of this integrated approach:

* Other popular approaches to hierarchical config include using [`clap`](https://docs.rs/clap/4.5.8/clap/) for CLI argument parsing only, and then folding the
  results of that into a library like [`figment`](https://docs.rs/figment/latest/figment) or [`config`](https://docs.rs/config/latest/config), which can also manage `env`, files, and compositing it all together.
  * However, typically this creates a maintanence burden, because if a required field could be read via `clap` or could be read from `env` or a config file, it needs to be `Option<T>`
    for `clap` and `T` in the final config structure, so you end up needing to maintain two or three parallel structures.
  * If these structures get out of sync, there isn't really any tooling to help you figure it out and the error messages may be confusing.
  * Dividing the information between multiple structures this way means that `clap` isn't aware of the other ways that a value can be read.
    But `clap` is responsible for generating the `--help` text, and so this causes the documentation of the config to be incomplete and makes it harder
    for users to figure out how to use your program.
  * It leads to poor quality error reporting, because crates like `figment` and `config` rely on `serde::Deserialize` to marshall the composited data onto your final structure.
    This precludes giving multiple error reports if there are multiple problems in different parts of the config. (See [MOTIVATION.md](./MOTIVATION.md) for more discussion.)
* When using `conf` instead, all of these problems are avoided. Notably, `conf` provides its own proc-macro, and so we can walk the `serde::Deserializer` ourselves and
  ensure that we get comprehensive error reporting, even if `serde_derive::Deserialize` would have stopped at the first error.
* `conf` can also be used together with `figment` advantageously. See [Multiple config files](#multiple-config-files) for more on this.

In general, `conf` works with other libraries via dependency injection, and only has a hard dependency on the `clap` ecosystem. The `serde` integration is optional. You can bring any serde-compatible config file parser that you want, at whatever versions you want.

------

`conf` is heavily influenced by [`clap-derive`](https://docs.rs/clap/4.5.8/clap/) and the earlier [`struct-opt`](https://docs.rs/structopt/latest/structopt/) which I used for years. They are both great and became popular for a reason.

Where there is overlap, `conf` tries to stay extremely close to `clap-derive` syntax and behavior, in most cases, for familiarity and ease of migrating a large project.
In some cases, there are small deviations from the behavior of `clap-derive` to either help avoid mistakes, or to make the defaults closer to a good [12-factor app](https://12factor.net/config) behavior.
For some advanced features of `clap`, `conf` has a way to achieve the same thing, but we took a different approach. This is typically in an attempt to simplify how it works for the user of the `derive` macro, to have fewer named concepts, or to ease maintenance going forward. (Because we don't offer an analogue of the `clap_builder` API, the design tradeoffs are different.)

The public API here is restricted to the `Conf` and `Subcommands` traits, proc-macros to derive them, and one error type. It is hoped that this will both reduce the learning curve and ease future development and maintenance.

See [MOTIVATION.md](./MOTIVATION.md) for more discussion about this project and the other various alternatives out there.

* [Using conf in a cargo project](#using-conf-in-a-cargo-project)
* [A tour](#a-tour)
* [Topics](#topics)
  * [Reading files](#reading-files)
  * [Hierarchical config](#hierarchical-config)
  * [Secrets](#secrets)
  * [Argument groups and constraints](#argument-groups-and-constraints)
* [Who should use this crate?](#who-should-use-this-crate)
  * [When should clap-derive be preferred to this crate?](#when-should-clap-derive-be-preferred-to-this-crate)
* [License](#license)

## Using conf in a cargo project

First add `conf` to the dependencies in your `Cargo.toml` file:

```toml
[dependencies]
conf = "0.4"
```

Then, create a `struct` which represents the configuration data your application needs to read on startup.
This struct should derive the `Conf` trait, and the `conf` attributes should be used to describe how each field can be read.

```rust
use conf::Conf;
use http::Uri;

#[derive(Conf)]
pub struct Config {
    /// This is a string parameter, which can be read from args as `--my-param` or from env as `MY_PARAM`.
    #[arg(long, env)]
    my_param: String,

    /// This flag corresponds to `-f` or `--force` in args
    #[arg(short, long)]
    force: bool,

    /// URL to hit, which can be read from args as `--url` or from env as `URL`.
    #[arg(long, env)]
    url: Uri, // This works because Uri implements `FromStr`.
}
```

Finally, you can parse the config:

```rust,ignore
    let config = Config::parse();
```

Usually you would call that somewhere in `fn main()` and then use the `config` to initialize your application.

The `parse()` function will automatically add a `--help` option for users that contains auto-generated documentation, based on your doc strings.

Additionally, if parsing fails for some reason, it will display a helpful error message and exit.

(The `Conf` trait offers a few variants of this function, which you can read about in the docs.)

Generally, the CLI interface and help text that is generated is meant to conform to POSIX and GNU conventions. Read more in [`clap` docu](https://docs.rs/clap/4.5.8/clap/) about this.

## A tour

A field in your struct can be read from a few sources:

* `#[arg(short)]` means that it has an associated "short" command-line option, such as `-u`. By default the first letter of your field is used. This can be overridden with `#[arg(short='t')]` for example.
* `#[arg(long)]` means that it has an associated "long" command-line option, such as `--url`. By default the kebab-case name of your field is used. This can be overridden with `#[arg(long="target-url")]` for example.
* `#[arg(pos)]` means that the argument can be a "positional" command-line option, and doesn't have any associated switch.
* `#[arg(env)]` means that it has an associated environment variable, such as `URL`. By default the upper snake-case name of your field is used. This can be overridden with `#[arg(env="TARGET_URL")]` for example.
* `#[arg(default_value)]` specifies a default value for this field if none of the other three possible sources provides one.

Such attributes can be combined by separating them with commas, for example `#[arg(long, env, default_value="x")]` means the field has an assocated long option, an associated environment variable, and a default value if both of these are omitted.

Your field can have any type as long as it implements `FromStr`, and this will be used to parse it. You can also specify an alternative parsing function using the `value_parser` attribute.

The type `bool` is special and results in a "flag" being generated rather than a "parameter", which expects no string parameter to be passed during parsing.
`Option<T>` is also special, and indicates that the value is optional rather than required.

So far this is almost exactly the same `clap-derive`. Where it gets more interesting is the `flatten` option.

You may have one structure that derives `Conf` and declares a bunch of related config values:

```rust,ignore
#[derive(Conf)]
pub struct DbConfig {
    /// Database connection URL.
    #[arg(long)]
    pub db_url: String,

    /// Set the maximum number of connections of the pool.
    #[arg(long)]
    pub db_max_connections: Option<u32>,

    /// Set the minimum number of connections of the pool.
    #[arg(long)]
    pub db_min_connections: Option<u32>,

    /// Set the timeout duration when acquiring a connection.
    #[arg(long)]
    pub db_connect_timeout: Option<u64>,

    /// Set the maximum amount of time to spend waiting for acquiring a connection.
    #[arg(long)]
    pub db_acquire_timeout: Option<u64>,

    /// Set the idle duration before closing a connection.
    #[arg(long)]
    pub db_idle_timeout: Option<u64>,

    /// Set the maximum lifetime of individual connections.
    #[arg(long)]
    pub db_max_lifetime: Option<u64>
}
```

Then you can "flatten" it into a larger `Conf` structure using the `conf(flatten)` attribute.

```rust,ignore
#[derive(Conf)]
pub struct Config {
    /// Database
    #[conf(flatten)]
    db: DbConfig,
}
```

Intuitively, this is meant to read a lot like the [`serde(flatten)`](https://serde.rs/attr-flatten.html) attribute, and has a similar behavior.
During parsing, the parser behaves as if every field of `DbConfig` were declared within `Config`, and generates matching options, env, and help, but then the parsed values actually
get stored in subfields of the `.db` field.

Using `flatten` can save a lot of labor. For example, suppose your web application consists of ten different web services, and they all need a `DbConfig`. Instead of duplicating all the values,
any env, any defaults, any help text, in each `Config` that you have, you can write that once and then `flatten` it ten times. Then, later when you discover that `DbConfig` should contain another value,
you only have to add it to `DbConfig` once, and every service that uses `DbConfig` will get the new config parameter. Also, when you need to initialize your db connection, you can just pass it the entire `.db` field rather
than pick out needed config arguments one-by-one.

`conf` differs from `clap-derive` in that we expect that you will use `flatten` in your project quite a lot.

For example, you might need to do this:

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(flatten)]
    pub auth_service: HttpClientConfig,

    #[conf(flatten)]
    pub friend_service: HttpClientConfig,

    #[conf(flatten)]
    pub snaps_service: HttpClientConfig,
}
```

because logically, you have three different http clients that you need to configure.

However with `clap-derive`, this is going to cause a problem, because when the fields from `HttpClientConfig` get flattened, their names will collide, and the parser will reject it as ambiguous. There aren't easy ways to fix this in `clap-derive` -- it doesn't support the "diamond pattern".

When using `conf`, you can resolve the problem by declaring a prefix.

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix)]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix)]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix)]
    pub snaps_service: HttpClientConfig,
}
```

This will cause every option associated to the `auth_service` structure to get a prefix, derived from the field name, `auth_service`, on any long-form options and on any env variables. The prefix will be kebab-case for long-form options and upper snake-case for env variables. And similarly for `friend_service` and `snaps_service`.

You can also override this prefix:

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: HttpClientConfig,
}
```

You can also configure env prefixes and option prefixes separately if you want that. Setting `env_prefix` will cause env vars to be prefixed, but not options. `long_prefix` will cause long-form options to be prefixed, but not env vars. (Short options are never prefixed, so there is not usually a great way to resolve a conflict among them. Conf offers a setting `skip_short_flags` on `flatten` sites which can be used to resolve collisions. Short switches should be used with caution in a large project.)

Finally, you can also declare prefixes at the level of a struct rather than a field. So for example, if you need every environment variable your program reads to be prefixed with `ACME_`, you can achieve that very easily.

```rust,ignore
#[derive(Conf)]
#[conf(env_prefix="ACME_")]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: HttpClientConfig,
}
```

`Option<T>` can also be used with a flattened structure, so if one of these services is optional, you can simply write:

```rust,ignore
#[derive(Conf)]
#[conf(env_prefix="ACME_")]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}
```

When `snaps_service` struct has type `Option`, it means that if any of the `snaps_service` values appear, then all of them are required to produce a valid `HttpClientConfig`, and if none of them appear, then `snaps_service` is `None`.

You can read about all the attributes and usage in the docs or the [REFERENCE.md](./REFERENCE.md), but hopefully this is enough to get started.

See also the [examples](./examples).

## Topics

This section discusses more advanced features and usage patterns, as well as alternatives.

### Reading files

Sometimes, a web service needs to read a file on startup.

One way this can be done in `conf` is by using the `value_parser` feature, which works very similarly as in `clap`.

A `value_parser` is a function that takes a `&str` and returns either a value or an error.

For example, if you need to read a `.pem` file on startup, one way you could do that is

```rust
use conf::Conf;
use pem::Pem;
use std::{error::Error, fs};

#[derive(Conf)]
pub struct Config {
    #[conf(long, env, value_parser = |file: &str| -> Result<_, Box<dyn Error>> { Ok(pem::parse(&fs::read_to_string(&file)?)?) })]
    pub pem: Pem,
}
```

This will read a file path either from CLI args or from env, then attempt to open the file and parse it according to the yaml schema.

If your `value_parser` is complex or needs to be reused, the best practice is to put it in a named function.

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(long, env, value_parser = utils::read_cert_file)]
    pub pem: Pem,
}
```

This can be a good pattern for things like reading a certificate or a cryptographic key from a file, which you want to check on startup.
This way you will fail fast if the file is not found or is invalid, but also report all other config problems at the same time.

(Note that we also support `value_parser_os`, which takes `&OsStr` and is a more portable and correct way to read file paths.)

This kind of approach would always read the key from a file, but would allow you to specify the file path either in args or in env.

### Hierarchical config

[Hierarchical config](https://rust-cli-recommendations.sunshowers.io/hierarchical-config.html) is the idea that config values should be merged in from files as well as from args and env.

> Applications *should* follow a hierarchical configuration structure. Use the following order, from highest priority to lowest.
>
>    1. Command-line arguments
>    2. Environment variables
>    3. Directory or repository-scoped configuration
>    4. User-scoped configuration
>    5. System-wide configuration
>    6. Default configuration shipped with the program.

`conf` has strong built-in support for (1), (2), and (6) right out of the box. To get the others, there are basically two approaches.

#### .env files

A simple approach is to use a crate like [`dotenvy`](https://crates.io/crates/dotenvy). This crate can search for an `.env` file, and then set `env` values if they are not already set in your program.
You can do this right before calling `Config::parse()`, and in this manner achieve hierarchical config, with `args > env > .env file > defaults`. You can load multiple `.env` files this way if you need to, searching user-provided paths, default paths, and so on.

In web applications, I often use this approach for *development* and I recommend this approach especially for smaller projects.

* Helps new developers get it up and running locally much faster.
* The `.env` can be in the repo for development, but removed when you go to deployment so that config values that are only appropriate during development don't get shipped.
* Works well if you are using [`diesel`](https://crates.io/crates/diesel), because the `diesel` cli tool also [uses `dotenvy` to search for a `.env` file](https://diesel.rs/guides/getting-started) and find the `DATABASE_URL` when manging database migrations locally.
* You can also pass `.env` files directly to `docker run` if you want to test docker containers locally.

This is a very traditional approach to configuring 12-factor apps.

The biggest drawback is that you are limited to things that can easily be expressed in a `.env` format.
If your config structure logically contains arrays of structs, it may not be very natural to express that in `.env`.

Another drawback is that the `.env` format doesn't really have a spec, and there are many divergent parser implementations. Eventually you may run into incompatibilities between what `docker` does, what `bash` does,
and what the numerous `dotenv` libraries in different programming languages do. This is typically annoying but not insurmountable.

The `--help` output for `conf` always includes details about any `env` sources for program options.

`conf` also supports introspection via `Conf::program_options()`, so it's possible to auto-generate a .env file template which includes all the env var names and annotates them with the doc strings. See documentation for example code.

#### General config files

Alternatively, you may prefer that your application can load layered config from a file in a more structured format.

In the `conf` API, self-describing structured data like this is called a "document". (`conf` doesn't care if it actually came from a file.)

To use a document as a source for layered config in `conf`, you can do the following:

0. Enable the `serde` feature of `conf`, which is on by default.

   Annotate your structs with `#[conf(serde)]`. Fields in your structs might need to implement [`serde::Deserialize`](https://docs.rs/serde/latest/serde/trait.Deserialize.html) depending on how they are annotated (see [reference](./REFERENCE_derive_conf.md)).

1. Determine the file path and load the document content. For example,

   ```rust,ignore
   let config_path = std::env::var("CONFIG").ok().or_else("config.yaml".to_owned());

   let doc_content: serde_yaml::Value = serde_yaml::from_reader(fs::File::open(&config_path).unwrap()).unwrap();
   ```

   `conf` doesn't force you to use any particular library or error handling discipline here.

2. Use the builder API to parse an instance of your structure.

   ```rust,ignore
   let config = MyConfig::conf_builder()
                .doc(config_path, doc_content)
                .parse();
   ```

   The builder uses `std::env::vars_os` and `std::env::args_os` as env and args sources by default, but these can be overrided if desired.
   The `config_path` string parameter is used in error messages.


Intuitively what happens is, `conf` attempts to initialize your struct, mapping the yaml data onto it, similar to [`serde::Deserialize`](https://docs.rs/serde/latest/serde/trait.Deserialize.html).

However, `conf` doesn't implement [`serde::Deserialize`](https://docs.rs/serde/latest/serde/trait.Deserialize.html) on your structure -- instead it implements [`serde::DeserializeSeed`](https://docs.rs/serde/latest/serde/de/trait.DeserializeSeed.html), where the seed is an internal structure that contains the parsed args and env. When you call parse, it will gather args and env to create the seed, and invoke it, together with the doc content. This walks your struct in a manner very similar to `serde_derive`, but for each field in your `Conf` struct, if there are multiple value sources, the priority is `args > env > serde > defaults`. So values from the `serde::Deserializer` can be shadowed, and also holes in the `serde` data can be filled from defaults and so on.

Any `value_parser` is run only if necessary after the available value sources and their priorities have been resolved, so errors from parsing shadowed values don't happen. The struct is only actually instantiated once, and integrity checks only run once.

##### Caveats

This will work best if your config files use a "self-describing" format, which has a type like `serde_yaml::Value` or `serde_json::Value`
which can hold any valid yaml or json, and you deserialize into that first. In particular, it's not recommended to do the following, even if it would avoid some copies:

```rust,ignore
   // Builds, but not recommended
   let config = MyConfig::conf_builder()
                .doc(config_path, serde_yaml::Deserializer::from_reader(fs::File::open(&config_path).unwrap()))
                .parse();
```

If the file is not valid yaml or json, then at some point in the middle of the walk, the deserializer may be in a broken state, and any further attempts to interact with it will yield errors.
Then `conf` may report numerous errors as it tries to read data for different parts of your structure, giving up on failing branches and continuing to try on other branches.
These errors may distract from the root cause. By deserializing into a `Value` type first, and failing fast if that doesn't work, you can avoid this scenario.

See also [./examples/serde/basic.rs](./examples/serde/basic.rs).

#### Multiple config files

A limitation of `conf` is that you can only pass it one document in this manner -- you can't call [`crate::ConfBuilder::doc`] multiple times and pass a series of progressively lower-priority file contents.

However, you can use other libraries to help with this.

```rust
   use figment::{Figment, value::Value, providers::{Format, Toml}};

   let content: Value
     = Figment::new()
       .merge(Toml::file("file1"))
       .merge(Toml::file("file2"))
       .extract()
       .unwrap();
```

The [`Figment::extract` function](https://docs.rs/figment/latest/figment/struct.Figment.html#method.extract) invokes [`serde::Deserialize`](https://docs.rs/serde/latest/serde/trait.Deserialize.html), and so can only report one error. But extracting into a [`figment::Value`](https://docs.rs/figment/latest/figment/value/enum.Value.html) is not expected to fail, since this is the internal representation that `figment` uses.

The `figment::Value` can then be passed to `conf` as a document, since it implements [`serde::de::Deserializer`](https://docs.rs/serde/latest/serde/trait.Deserializer.html). Then `conf` is driving the initialization of your struct, and not `serde_derive`, which retains all the benefits of `conf`'s design.

This negates some of the challenges of using `figment`. For example in their [docu](https://docs.rs/figment/latest/figment/#tips):

> Using #[serde(flatten)] [can break error attribution](https://github.com/SergioBenitez/Figment/issues/80#issuecomment-1701946622), so it’s best to avoid using it when possible.

When using `conf`, our `serde(flatten)` implementation doesn't have the same limitations as the stock serde, and none of the same caveats around it apply. This is because it is built around a [state machine abstraction](./src/state_machine.rs), instead of how `serde_derive` does it. This makes it very easy for us to add features like flatten-with-prefix in serde as well, and to make it compatible with all of our other features.

In this manner, you can get all 6 categories of hierarchical config in your app if needed, without significant restrictions on config file formats.

You can see a more complete [example](./example/serde/figment.rs) and tests in the repo.

In the future, we may extend our API so that the [`figment::Metadata`](https://docs.rs/figment/latest/figment/struct.Metadata.html), which tracks the provenance of individual values, can also be passed on to `conf` and used in error messages.

#### Documenting the config file format

The suggested way to help users of your program understand the config file format is:

* Have some examples committed to your repo, and have tests that they parse correctly
* Either distribute these with the documentation, or along with the release artifacts, or bake them into the binary and add a CLI option which makes the binary emit them.

For example, the AWS CLI tool provides options to emit a config skeleton for many commands, such as, [`aws ecs register-task-definition --generate-cli-skeleton`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-definition-template.html).

### Secrets

`conf` tries to provide the most helpful and detailed errors that it can, and also to report as many problems as it can when parsing fails.

Usually, if a user-provided value cannot be parsed, we want to provide the value and the error in the error message to help debugging. But if the value represents a *secret*, then logging its value is bad.

To prevent `conf` from logging the value, you can mark the field as `secret`.

```rust,ignore
    #[arg(env, secret)]
    pub api_key: ApiKey
```

When `conf` knows that something is a secret, it will avoid revealing the value when generating any kind of error message or help text.
`conf` will also describe it with the `[secret]` tag in the help text.

Handling secrets is a complex topic and much of the discussion is out of scope here.
We'll offer just three points of guidance around this tool.

1. The more valuable the secrets are, and the more challenging the threat model is, the more time it makes sense to spend working on defensive measures. The converse is also true.
   No one really has context to judge this except you, so instead of offering one-size-fits-all guidance, I prefer to think in terms of a sliding scale.
2. If you're at a point where *systematically marking things `secret`* seems like a good idea, then you should also be *using special types to manage the secrets*.
   For example, using [`SecretString` from the `secrecy` crate](https://docs.rs/secrecy/0.8.0/secrecy/type.SecretString.html) instead of `String` will prevent your password from appearing in debug logs *after* it has been loaded.
   There are alternatives out there if `secrecy` crate doesn't work for your use-case. This is usually a pretty low-effort improvement, and it goes hand-in-hand with what the `secret` marking does.
   * It's very easy to expose your secret by accident if you don't do something like this. For example, just by putting a `#[tracing::instrument]` annotation on a function that some day takes a `config` struct, you could accidentally log your password.
3. If you're at a point where you think you need to *systematically [zeroize](https://docs.rs/zeroize/latest/zeroize/) all copies* of your secret that reside in process memory when they are no longer needed, then you are past the point
   where you can use an environment variable to pass the secret value to the application. Your application most likely needs to *read the secret value from a file instead*.
   * The rust standard library handles environment values as `std::ffi::OsString` internally and in its API, but this type cannot be securely zeroized. There are no public APIs to mutably access the underlying bytes, and no public APIs that would otherwise do this for you.
   * At a lower level, `glibc` [exposes the environment as `char **environ`](https://www.gnu.org/software/libc/manual/html_node/Environment-Access.html), makes copies of the entire environment whenever it is changed using `set_var` or similar, and [leaks the old values](https://inbox.sourceware.org/libc-alpha/87le2od4xh.fsf@oldenburg.str.redhat.com/).
     It is difficult to systematically ensure that all of these copies are cleaned up if they contain sensitive data. `environ` often gets copied by other things very early in the process.
     The rust standard library also interacts with the environment via these `glibc` APIs, which means that typical rust libraries like `dotenvy` do as well.

### Argument groups and constraints

`clap` has support for the concept of "argument groups" ([`ArgGroup`](https://docs.rs/clap/4.5.8/clap/struct.ArgGroup.html)) and also "dependencies" among [`Arg`](https://docs.rs/clap/4.5.8/clap/struct.Arg.html)'s. This is used to create additional conditions that must be satisfied for the config to be valid, and error messages if it is invalid.
`clap` [provides](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.conflicts_with) [many](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.exclusive) [functions](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.overrides_with) [on](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.required_if_eq) [`Arg`](https://docs.rs/clap/4.5.8/clap/struct.Arg.html) [and](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.requires_if) [on](https://docs.rs/clap/4.5.8/clap/struct.Arg.html#method.required_unless_present) [`ArgGroup`](https://docs.rs/clap/4.5.8/clap/struct.ArgGroup.html#method.conflicts_with) which can be used to define various kinds of constraints, such as conditional dependency or mutual exclusion, between `Arg`'s or `ArgGroup`'s.

The main reason to use these features in `clap` is that it will generate nicely formatted errors if these constraints are violated, and then you don't have to worry about handling the situation in your application code.

`conf` similarly wants to support adding constraints in this manner that are checked during parsing, but the design goal is that all of these errors should reportable alongside all the other types of errors.

For several reasons, `conf` chose to offer a different API than the `clap` for these purposes.

* In `clap`, this API was designed first for the clap builder API, and then exposed via the `clap-derive` API.
* There are about a dozen functions exposed in total, and multiple named concepts (`Arg` is now joined by `ArgGroup` which is different from `Args`)
* The API relies on explicit `id` values for `Arg`'s and `ArgGroup`s, but this is less idiomatic in the derive API. The derive API is simpler from the user's point of view if these `id`'s are not really exposed and are more like implementation details.
* The API often provides multiple ways to do the same thing, which makes code that uses it less predictable.
* The API has many defaults that I find hard to remember. For example, in an `ArgGroup`, does `required` default to `true` or `false`? Does `multiple` default to `true` or `false`? These defaults are different for an `Args`.
* Sometimes the API doesn't feel idiomatic. For example if I have a group of options where if one of them appears, all of them must appear, the most idiomatic thing is if the API can give me a single `Option` that includes all of them.
  Otherwise I have to unwrap a bunch of options in application code, on the assumption that my constraint works as expected.

`conf` provides one mechanism for idiomatically representing when some collection of arguments are optional-but-mutually-required. Then it provides a few one-offs to express exclusivity between arguments. Finally, it provides a very general mechanism that can express arbitrary constraints.

#### flatten-optional

`conf` supports the following syntax:

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}
```

Intuitively, this means that the `snaps_service` config is optional, and if none of those fields appear, that's not an error, and `snaps_service` will be `None` in the parsed config object.
However, if any of the fields of `snaps_service` appear, then all of its required fields must appear, and parsing the entire flattened object must succeed.

This allows the code that consumes the conditional config to be simpler -- you can just match on whether `snaps_service` is present or absent, and the type system encodes that when any of those fields are present, all are present.
And you can express which arguments in the group are required to be present or not by marking them optional or not (or giving them a default value), within `HttpClientConfig`.

This feature actually covers every use-case I've had in real-life for argument groups and constraints in `clap` across all my web projects, and I like it because I feel that it introduces fewer named concepts
and promotes code reuse. The same struct can be flattened in a required way in one setting and in an optional way in another setting.

Hopefully it's easy to remember what it means, just by looking at the type of the data, and thinking about what would have to happen for it to succeed.
If we can't see any of the (prefixed) substructure's fields appearing, then we return `None`. If we see some of them appearing, it indicates that we're supposed to be producing a `Some`. Once we decide that we're supposed to produce `Some`, it's an error if we can't do so in the normal (non-optional) manner for `flatten`'ed structures.

Also, this design makes it easy to use one struct as both the Conf struct and as a `serde` schema.

#### one_of_fields

`conf` provides a simple way to specify that some fields in a struct are mutually exclusive.

```rust
use conf::Conf;

#[derive(Conf)]
#[conf(at_most_one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(repeat, long, env)]
    pub c: Vec<String>,
}
```

When used with two fields, it provides a way to translate many usages of `conflicts_with` in the `clap-derive` API.

When used with all fields in a struct, it is similar to an `ArgGroup` with `multiple=false` and `required=false` in the `clap-derive` API.

This also works with the *flatten-optional* feature, so one or more optional flattened groups can be made exclusive with eachother or with simple arguments in this structure.

However, it can only be used with fields on the struct where the attribute appears, and cannot be used with fields inside of flattened structs, or elsewhere in the structure.

`conf` provides a variation which requires *exactly* one of the fields to appear.

```rust
use conf::Conf;

#[derive(Conf)]
#[conf(one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(repeat, long, env)]
    pub c: Vec<String>,
}
```

When used with all fields in a struct, this is similar to an `ArgGroup` with `multiple=false` and `required=true` in the `clap-derive` API.

Finally `conf` provides one more variation

```rust
use conf::Conf;

#[derive(Conf)]
#[conf(at_least_one_of_fields(a, b, c))]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(repeat, long, env)]
    pub c: Vec<String>,
}
```

When used with all fields in a struct, this is similar to an `ArgGroup` with `multiple=true` and `required=true` in the `clap-derive` API.

Any of these attributes can be used multiple times on the same struct to create multiple constraints that apply to that struct.

#### validation predicate

For more complex constraints, `conf` supports user-defined validation predicates.

A validation predicate is a function that takes `&T` where `T` is the struct at hand, and returns `Result<(), impl Display>`.

Example:

```rust,ignore
use conf::Conf;

#[derive(Conf)]
#[conf(validation_predicate = Config::validate)]
pub struct FooConfig {
    #[conf(short, long)]
    pub a: bool,
    #[conf(short, long)]
    pub b: Option<String>,
    #[conf(repeat, long, env)]
    pub c: Vec<String>,
}

impl FooConfig {
    fn validate(&self) -> Result<(), impl Display> {
        ...
    }
}

```

Any number of validation predicates can be specified.

The idea here is, rather than adding increasing numbers of one-off constraint types to `conf`, or enabling you to write non-local constraints using proc-macro attributes, it
will be more maintainable for you and for `conf` if you just express what you want in rust code, once your constraints get sophisticated enough.
There's both less API for you to learn and remember, and less API surface area for `conf` to test and maintain. You will also be able to generate very precise error messages when complex constraints fail.

If a predicate fails, `conf` is still able to report those errors and any other errors that occurred elsewhere in the tree.

For example, in this config struct:

```rust,ignore
#[derive(Conf)]
pub struct Config {
    #[conf(flatten, prefix="auth")]
    pub auth_service: HttpClientConfig,

    #[conf(flatten, prefix="friend")]
    pub friend_service: HttpClientConfig,

    #[conf(flatten, prefix="snaps")]
    pub snaps_service: Option<HttpClientConfig>,
}
```

It's possible that when parsing a `Config`, the `auth_service` fails to parse because of a missing required argument, `friend_service` fails to parse because of a missing argument and an invalid value, and `snaps_service` parses but fails its validation predicate. In this scenario `conf` will report all of these errors, which distinguishes it from other crates in this genre.

## Who should use this crate?

The crate is probably most attractive if:

* you have a medium-to-large project, and you run into limits of `clap-derive`. You start to have "diamond pattern" in your structs, and you need flatten-with-prefix
* you want to do layered config, including with config files, but
  * you don't want to define the same config parameters over and over again (for args, for env, and for serde)
  * you want the auto-generated help to be useful
  * you want very complete error reporting
* you find clap's large API and documentation to be confusing, and you want to use something with less surface area and fewer right ways to do a particular thing

If you think that this crate is a good fit for you, the suggested way to use it is:

* Whenever you have a component that you think should use a value that is read on startup, you should create a config struct for that component.
  You should `derive(Conf)` on that struct, and pass that config struct to the component on initialization.
  The config struct should live in the same module as the component that it is configuring.
* If your component is initialized by a larger component, then that component should have its own config struct and you should use `flatten` to assemble it.
* Each binary target should have a config struct, and should `::parse()` it in `fn main()`. It should also have the `conf(test)` attribute.

### When should clap-derive be preferred to this crate?

This crate defines itself somewhat differently from [`clap-derive`](https://docs.rs/clap/4.5.8/clap/) and has different features and goals.

* `clap-derive` is meant to be an alternative to the clap builder API, and exposes essentially all of the features of the builder.
* `clap` itself is primarily a CLI argument parser [per maintainers](https://github.com/clap-rs/clap/discussions/5432), and many simple features around `env` support, like, arguments that can only be read from `env`, are considered out of scope.

`conf` places emphasis on features differently.

* `env` is actually the most important thing for a 12-factor web app.
* `conf` has a different architecture, such that it's easier to pass information at runtime between a `struct` and the `struct` that it is flattened into, in both directions. This enables many new features. The details are not part of the public API, the way that they are in `clap`, so that we can add more features in the future without a breaking change.
* `conf` has very specific goals around error reporting. We want to return as many config errors as possible at once, because deployment might take a relatively long time.

In order to meet its goals, `conf` does not use `clap` to handle `env` at all. `clap` is only used to parse CLI arguments as strings, and to render help text, which are the two things that it is best at.

This crate can get closer towards the full feature set offered by `clap-derive`, but will probably never achieve feature parity -- in fact the use of `clap` is an implementation detail, and it's conceivable that we'll drop `clap` and just use `clap-lex` directly.

If you have very specific CLI argument parsing needs, or if you need pixel-perfect help text, you will be better off using `clap` directly instead of this crate, because you will have more control that way. `clap` is the most mature and feature-complete CLI argument parser out there, by a wide margin.

In many applications, you don't really have such needs. You aren't making very sophisticated use of `clap`, your project is small, and you don't particularly need any features of `conf` either, so you will be able to use `clap-derive` or `conf` equally well and not notice very much difference.

If you prefer, you can stick with `clap-derive`, and then only if you find that you need flatten-with-prefix or another feature, try to switch to `conf` at that point.

`conf` is designed to make this migration relatively easy for such projects. (Indeed, I started working on `conf` because I had several large projects on `clap-derive` and I was hitting limitations and being forced info workarounds that I wasn't happy with, and I couldn't find a wholly satsifactory alternative.) If you find that you get stuck when trying to migrate, you can open a discussion and we can try to help.

## License

Code is available under MIT or Apache 2 at your option.