Crate gong

source ·
Expand description

A flexible, lightweight and simple-to-use library for processing command line arguments. A ‘getopt’ next-gen replacement.

About

The ‘getopt’ and related ‘getopt_long’ functions, have, for a long time, served to assist in the processing of arguments supplied to a program. This library provides a “next-gen” replacement for use with Rust programs.

Licensed under the MIT license or the Apache license, Version 2.0, at your option.

Design

This is not the only solution available to projects built with the Rust programming language, and each one may have it’s own positive and negative aspects. Unlike some of these other “next-gen”/modern solutions, this library does not try to completely take over all aspects of argument handling; doing so can rather easily impose restrictions making a solution unsuitable for some program designs. This solution, like ‘getopt’, is used to assist in argument processing, not take over, and as such is highly flexible and lightweight.

The basic premise of usage is simple - provide the processing function with a set of available options and the input arguments to be processed, and it returns the results of its analysis. From there you can take further action - output error information if the user made a mistake; output help/usage information if requested; store state information from flag type options; and store data (converting values as necessary) from non-options and options with data, before proceeding with whatever your program was designed to do.

Some major differences to the old ‘getopt’/‘getopt_long’ solution include:

  1. All processing can be done in one go, rather than with recursive function calls;
  2. “Non-options” are not shuffled to the end of the list, unlike the default behaviour of ‘getopt’. Not doing this preserves possibly invaluable information;
  3. The “convenience” functionality of -W foo being treated as --foo is not supported (unnecessary complexity).

This library could also be used as a foundation for other libraries that want to take over more of the workload of argument handling than this library does.

Functionality

Basic feature support is on par with legacy ‘getopt_long’.

Option support

Two option processing modes are available, supporting two different popular styles of options.

Mode 1 - Standard (default)

This mode supports traditional long and short options.

An argument starting with a dash (-) and followed by additional characters, is treated as an option argument, anything else is a non-option. An argument of -- followed by additional characters is a long option, which, after the -- prefix, consists of an option name (followed optionally by a data sub-argument, as discussed below). An argument of a single dash (-) followed by additional (non-dash) characters is a short option set, where each character after the - prefix is a short option character (except with respect to data sub-arguments, as mentioned below). An argument of exactly -- only is special - an early terminator - symbolising early termination of argument interpretation, meaning that all subsequent arguments should be assumed to be non-options (useful in some situations/designs for separating your option arguments from those to be passed on to something else).

Options may have a single mandatory data sub-argument. For long options, data is provided either in the next argument (e.g. --foo bar) or in the same argument, separated from the name by an = (e.g. --foo=bar). For short options, data is provided either in the next argument (e.g. -o arg), or if the option character is not the last character in the argument, the remaining characters are taken to be its data arg (e.g. -oarg). An argument can contain multiple short options grouped together as a set (e.g. -abc), but of course users need to be careful doing so with those requiring data - for correct interpretation only one short option with data can be grouped, and it must be the last one in the set. (If in -abc all three characters are valid options, and b takes data, c will be consumed as b’s data instead of being interpreted as an option).

If a long option is encountered where the argument contains one or more = characters, then the left hand portion of the first = character is taken to be the long option name, and the right hand portion as a data sub-argument, thus valid available option names cannot contain = characters. If the name does not match any available long option, a failed match is reported and the data sub-arg is completely ignored. If there is a match and it requires a data sub-arg, but the = was the last character in the argument, (e.g. --foo=), then the data sub-arg is taken to be an empty string. If there is a match with an option that does not require a data sub-arg, but one was provided and it is not an empty string, this will be noted as unexpected in the results of analysis.

Abbreviated long option name matching is supported, i.e. the feature than users can use an abbreviated form of a long option’s name and get a match, so long as the abbreviation uniquely matches a single long option. As an example, if foo and foobar are available long options, then for the possible input arguments of { --f, --fo, --foo, --foob, --fooba, and --foobar }, --foo and --foobar are exact matches for foo and foobar respectively; --f and --fo are invalid as being ambiguous (and noted as such in the results); and --foob and --fooba both uniquely match foobar and so are valid. This feature is enabled by default, but can be disabled if desired.

Mode 2 - Alternate

This mode is very similar to mode 1, with the main difference simply being that only long options are supported, and long options use a single dash (-) as a prefix rather than two, i.e. -help rather than --help. Some people simply prefer this style, and support for it was very easy to add.

Note: Short options can still be added to the option set in this mode, and it will still pass as valid; they will simply be ignored when performing matching.

Mismatch suggestions

This library does not (currently) itself provide any suggestion mechanism for failed option matches - i.e. the ability to take an unmatched long option and pick the most likely of the available options that the user may have actually meant to use, to suggest to them when reporting the error. There is nothing however stopping users of this library from running unmatched options through a third-party library to obtain the suggestion to display.

Utf-8 support

This library expects all provided strings to be valid Utf-8.

Native Utf-8 support in Rust makes handling Utf-8 strings largely trivial. It is important to understand that in Rust a char is four bytes (it was only one byte in older languages like C); but a sequence of chars are typically stored more efficiently than this in a string. This widened char type broadens the range of possible characters that can be used as short options, without us worrying about any multi-byte complexity. This allows for instance 💖 (the “sparkle heart” char) to be a short option, if you wanted, along with a huge set of other characters of various types to choose from. (The “sparkle heart” char take three bytes in a Utf-8 string, and would not have been easy to support in C with the legacy ‘getopt’ solution).

With respect to long options, --foo, --föö and --föö are all different options (the last two may look the same, but read on), and are all perfectly valid options to make available. The first consists of simple latin characters only. The second and third use “umlauts” (diaeresis) above the o’s, however the first of these uses a char with the umlaut built in (U+F6) and the second uses the standard o (U+6F) followed by the special umlaut combining char (U+0308), thus they appear the same but are actually different “under the hood”. (It would not be efficient or worthwhile to try to handle the latter two as being the same option).

Only single chars are supported for short options. A char paired with one or more special combinator/selector chars thus cannot be specified as an available short option. Such special chars are treated by this library as perfectly valid available short options in their own right. Thus, whilst (using U+F6) results in a single matched/unmatched entry in the results returned from the process function, -ö (using U+6F followed by the U+0308 combinator) will result in two entries, for what looks visibly to be one character. As another example, is the “black heart” character, and ❤️ is it along with the U+FE0F “variant #16 - emoji” selector char; with the selector, --❤️ is a single matched/unmatched long option, while -❤️ is a pair of matched/unmatched short options, one for the “black heart” char and one for the selector char.

Usage

To use this library, start by adding a dependency entry for it in your project’s Cargo.toml file; then make sure to declare use of the crate at the root of the module hierarchy (src/main.rs or src/lib.rs):

extern crate gong;

Now proceed with the following steps.

Step #1: Describe the available options

First, you need to compile a list of available options. For example:

let mut opts = gong::Options::new(6, 4); //Estimate counts for efficiency
opts.add_long("help")
    .add_short('h')
    .add_long("foo")
    .add_long("version")
    .add_long("foobar")
    .add_long("ábc")      // Using a combining char (accent)
    .add_long_data("hah") // This one expects a data arg
    .add_short('❤')
    .add_short('x')
    .add_short_data('o'); // So does this one
debug_assert!(opts.is_valid());

Note: The underlying data structures used to represent options actually have publicly accessible attributes, thus leaving open less tidy, but more efficient means of declaring a data set, bypassing the function calls used here, if desired.

Set mode

If you want to use alternate option mode rather than standard (default), as discussed above, the Options::set_mode method is available.

You can control whether or not to allow abbreviated matching with the Options::set_allow_abbreviations method.

Validation

Some validation is performed by the add_* methods, but for full validation (including checking for duplicates) the Options::is_valid method is provided, as used above. Details of any problems identified by this method are output to stderr. It is recommended that you only use it in a debug assert variant, as here, to allow catching mistakes in development, but otherwise avoid wasting energy for option sets in release builds that you know must be perfectly valid.

Note: With respect to what is or is not a duplicate, only the name/char matters; the expects_data attribute makes no difference.

Step #2: Gather arguments to be processed

You also need to retrieve (or build) a set of arguments to be processed. This must be a set of String objects (as Rust provides actual program args from the environment as String not &str). You can collect program arguments as follows:

let args: Vec<String> = std::env::args().collect();

The very first entry in the list is the program path/name, and often you will not be interested in it. You can skip it in two easy ways, either: a) when passing the arguments to the processing function in the next step, use &args[1..] instead of &args[..]; or b) use the iterator skip method, as here:

let args: Vec<String> = std::env::args().skip(1).collect();

Note: Of course you do not have to provide the real program args, you can provide any set of String objects, and you can even of course take the real set and modify it first if you wish.

Step #3: Processing

With data gathered, you now simply need to give it to the process function. This function will perform an analysis and return a set of results that describe what it identified.

let results = gong::process(&args[..], &opts);

Of course if for any reason you do not want to process all arguments in one go, you always have the option of processing one argument at a time (or in groups of whatever number you choose), calling process for each. (Naturally though you must beware of complications handling in-next-arg data sub-arguments doing this).

Step #4: Take action

It is now up to you to take appropriate action in response to what was found.

The Results object returned by the process function contains error and warn booleans, which give a quick indication of problems. It also contains a list of items, describing in detail what was found. The items in the item list are stored in the same order as found in the input arguments.

The entries in the item list are ItemClass variants, which wrap variants of Item, ItemW or ItemE (okay/warn/error), thus making it simple to match by class. All variants of each item class hold a usize value to be used for indicating the index of the argument in which the item was found. For simple scenarios, this may be ignored, but in some situations it is highly valuable information. Similarly, information is returned where applicable with data sub-args as to whether the data arg was located in the same argument or the next.

Note: some item variants that may be returned in the result set hold &str references to strings that were provided in the argument and option data provided to process. This is done for efficiency. Beware of this with respect to lifetimes.

Have a play

The source code repository includes a small test application for trying out the library’s analysis capabilities. It has a small set of built-in example options of different kinds, and when run, outputs details of them along with details of analysing any provided arguments against them.

To use it, see the instructions in the README.md file found in the bin sub-directory.

Structs

Description of an available long option
Used to supply the set of information about available options to match against
Result data returned from analysing an argument list
Description of an available short option

Enums

Used to describe where data was located, for options that require data.
Non-problematic items. See ItemClass documentation for details.
The possible classes of items identified and extracted from command line arguments.
Error-level items. See ItemClass documentation for details.
Warn-level items. See ItemClass documentation for details.
Used to assert which option processing mode to use

Functions

Process provided command-line arguments