[][src]Crate grep_cli

This crate provides common routines used in command line applications, with a focus on routines useful for search oriented applications. As a utility library, there is no central type or function. However, a key focus of this crate is to improve failure modes and provide user friendly error messages when things go wrong.

To the best extent possible, everything in this crate works on Windows, macOS and Linux.

Standard I/O

The is_readable_stdin, is_tty_stderr, is_tty_stdin and is_tty_stdout routines query aspects of standard I/O. is_readable_stdin determines whether stdin can be usefully read from, while the tty methods determine whether a tty is attached to stdin/stdout/stderr.

is_readable_stdin is useful when writing an application that changes behavior based on whether the application was invoked with data on stdin. For example, rg foo might recursively search the current working directory for occurrences of foo, but rg foo < file might only search the contents of file.

The tty methods are useful for similar reasons. Namely, commands like ls will change their output depending on whether they are printing to a terminal or not. For example, ls shows a file on each line when stdout is redirected to a file or a pipe, but condenses the output to show possibly many files on each line when stdout is connected to a tty.

Coloring and buffering

The stdout, stdout_buffered_block and stdout_buffered_line routines are alternative constructors for StandardStream. A StandardStream implements termcolor::WriteColor, which provides a way to emit colors to terminals. Its key use is the encapsulation of buffering style. Namely, stdout will return a line buffered StandardStream if and only if stdout is connected to a tty, and will otherwise return a block buffered StandardStream. Line buffering is important for use with a tty because it typically decreases the latency at which the end user sees output. Block buffering is used otherwise because it is faster, and redirecting stdout to a file typically doesn't benefit from the decreased latency that line buffering provides.

The stdout_buffered_block and stdout_buffered_line can be used to explicitly set the buffering strategy regardless of whether stdout is connected to a tty or not.

Escaping

The escape, escape_os, unescape and unescape_os routines provide a user friendly way of dealing with UTF-8 encoded strings that can express arbitrary bytes. For example, you might want to accept a string containing arbitrary bytes as a command line argument, but most interactive shells make such strings difficult to type. Instead, we can ask users to use escape sequences.

For example, a\xFFz is itself a valid UTF-8 string corresponding to the following bytes:

This example is not tested
[b'a', b'\\', b'x', b'F', b'F', b'z']

However, we can interpret \xFF as an escape sequence with the unescape/unescape_os routines, which will yield

This example is not tested
[b'a', b'\xFF', b'z']

instead. For example:

use grep_cli::unescape;

// Note the use of a raw string!
assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz"));

The escape/escape_os routines provide the reverse transformation, which makes it easy to show user friendly error messages involving arbitrary bytes.

Building patterns

Typically, regular expression patterns must be valid UTF-8. However, command line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the standard library's UTF-8 conversion functions from OsStrs do not provide good error messages. However, the pattern_from_bytes and pattern_from_os do, including reporting exactly where the first invalid UTF-8 byte is seen.

Additionally, it can be useful to read patterns from a file while reporting good error messages that include line numbers. The patterns_from_path, patterns_from_reader and patterns_from_stdin routines do just that. If any pattern is found that is invalid UTF-8, then the error includes the file path (if available) along with the line number and the byte offset at which the first invalid UTF-8 byte was observed.

Read process output

Sometimes a command line application needs to execute other processes and read its stdout in a streaming fashion. The CommandReader provides this functionality with an explicit goal of improving failure modes. In particular, if the process exits with an error code, then stderr is read and converted into a normal Rust error to show to end users. This makes the underlying failure modes explicit and gives more information to end users for debugging the problem.

As a special case, DecompressionReader provides a way to decompress arbitrary files by matching their file extensions up with corresponding decompression programs (such as gzip and xz). This is useful as a means of performing simplistic decompression in a portable manner without binding to specific compression libraries. This does come with some overhead though, so if you need to decompress lots of small files, this may not be an appropriate convenience to use.

Each reader has a corresponding builder for additional configuration, such as whether to read stderr asynchronously in order to avoid deadlock (which is enabled by default).

Miscellaneous parsing

The parse_human_readable_size routine parses strings like 2M and converts them to the corresponding number of bytes (2 * 1<<20 in this case). If an invalid size is found, then a good error message is crafted that typically tells the user how to fix the problem.

Structs

CommandError

An error that can occur while running a command and reading its output.

CommandReader

A streaming reader for a command's output.

CommandReaderBuilder

Configures and builds a streaming reader for process output.

DecompressionMatcher

A matcher for determining how to decompress files.

DecompressionMatcherBuilder

A builder for a matcher that determines which files get decompressed.

DecompressionReader

A streaming reader for decompressing the contents of a file.

DecompressionReaderBuilder

Configures and builds a streaming reader for decompressing data.

InvalidPatternError

An error that occurs when a pattern could not be converted to valid UTF-8.

ParseSizeError

An error that occurs when parsing a human readable size description.

StandardStream

A writer that supports coloring with either line or block buffering.

Functions

escape

Escapes arbitrary bytes into a human readable string.

escape_os

Escapes an OS string into a human readable string.

is_readable_stdin

Returns true if and only if stdin is believed to be readable.

is_tty_stderr

Returns true if and only if stderr is believed to be connectted to a tty or a console.

is_tty_stdin

Returns true if and only if stdin is believed to be connectted to a tty or a console.

is_tty_stdout

Returns true if and only if stdout is believed to be connectted to a tty or a console.

parse_human_readable_size

Parse a human readable size like 2M into a corresponding number of bytes.

pattern_from_bytes

Convert arbitrary bytes into a regular expression pattern.

pattern_from_os

Convert an OS string into a regular expression pattern.

patterns_from_path

Read patterns from a file path, one per line.

patterns_from_reader

Read patterns from any reader, one per line.

patterns_from_stdin

Read patterns from stdin, one per line.

stdout

Returns a possibly buffered writer to stdout for the given color choice.

stdout_buffered_block

Returns a block buffered writer to stdout for the given color choice.

stdout_buffered_line

Returns a line buffered writer to stdout for the given color choice.

unescape

Unescapes a string.

unescape_os

Unescapes an OS string.