grep_cli/
lib.rs

1/*!
2This crate provides common routines used in command line applications, with a
3focus on routines useful for search oriented applications. As a utility
4library, there is no central type or function. However, a key focus of this
5crate is to improve failure modes and provide user friendly error messages
6when things go wrong.
7
8To the best extent possible, everything in this crate works on Windows, macOS
9and Linux.
10
11
12# Standard I/O
13
14[`is_readable_stdin`] determines whether stdin can be usefully read from. It
15is useful when writing an application that changes behavior based on whether
16the application was invoked with data on stdin. For example, `rg foo` might
17recursively search the current working directory for occurrences of `foo`, but
18`rg foo < file` might only search the contents of `file`.
19
20
21# Coloring and buffering
22
23The [`stdout`], [`stdout_buffered_block`] and [`stdout_buffered_line`] routines
24are alternative constructors for [`StandardStream`]. A `StandardStream`
25implements `termcolor::WriteColor`, which provides a way to emit colors to
26terminals. Its key use is the encapsulation of buffering style. Namely,
27`stdout` will return a line buffered `StandardStream` if and only if
28stdout is connected to a tty, and will otherwise return a block buffered
29`StandardStream`. Line buffering is important for use with a tty because it
30typically decreases the latency at which the end user sees output. Block
31buffering is used otherwise because it is faster, and redirecting stdout to a
32file typically doesn't benefit from the decreased latency that line buffering
33provides.
34
35The `stdout_buffered_block` and `stdout_buffered_line` can be used to
36explicitly set the buffering strategy regardless of whether stdout is connected
37to a tty or not.
38
39
40# Escaping
41
42The [`escape`](crate::escape()), [`escape_os`], [`unescape`] and
43[`unescape_os`] routines provide a user friendly way of dealing with UTF-8
44encoded strings that can express arbitrary bytes. For example, you might want
45to accept a string containing arbitrary bytes as a command line argument, but
46most interactive shells make such strings difficult to type. Instead, we can
47ask users to use escape sequences.
48
49For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the
50following bytes:
51
52```ignore
53[b'a', b'\\', b'x', b'F', b'F', b'z']
54```
55
56However, we can
57interpret `\xFF` as an escape sequence with the `unescape`/`unescape_os`
58routines, which will yield
59
60```ignore
61[b'a', b'\xFF', b'z']
62```
63
64instead. For example:
65
66```
67use grep_cli::unescape;
68
69// Note the use of a raw string!
70assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz"));
71```
72
73The `escape`/`escape_os` routines provide the reverse transformation, which
74makes it easy to show user friendly error messages involving arbitrary bytes.
75
76
77# Building patterns
78
79Typically, regular expression patterns must be valid UTF-8. However, command
80line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the standard
81library's UTF-8 conversion functions from `OsStr`s do not provide good error
82messages. However, the [`pattern_from_bytes`] and [`pattern_from_os`] do,
83including reporting exactly where the first invalid UTF-8 byte is seen.
84
85Additionally, it can be useful to read patterns from a file while reporting
86good error messages that include line numbers. The [`patterns_from_path`],
87[`patterns_from_reader`] and [`patterns_from_stdin`] routines do just that. If
88any pattern is found that is invalid UTF-8, then the error includes the file
89path (if available) along with the line number and the byte offset at which the
90first invalid UTF-8 byte was observed.
91
92
93# Read process output
94
95Sometimes a command line application needs to execute other processes and
96read its stdout in a streaming fashion. The [`CommandReader`] provides this
97functionality with an explicit goal of improving failure modes. In particular,
98if the process exits with an error code, then stderr is read and converted into
99a normal Rust error to show to end users. This makes the underlying failure
100modes explicit and gives more information to end users for debugging the
101problem.
102
103As a special case, [`DecompressionReader`] provides a way to decompress
104arbitrary files by matching their file extensions up with corresponding
105decompression programs (such as `gzip` and `xz`). This is useful as a means of
106performing simplistic decompression in a portable manner without binding to
107specific compression libraries. This does come with some overhead though, so
108if you need to decompress lots of small files, this may not be an appropriate
109convenience to use.
110
111Each reader has a corresponding builder for additional configuration, such as
112whether to read stderr asynchronously in order to avoid deadlock (which is
113enabled by default).
114
115
116# Miscellaneous parsing
117
118The [`parse_human_readable_size`] routine parses strings like `2M` and converts
119them to the corresponding number of bytes (`2 * 1<<20` in this case). If an
120invalid size is found, then a good error message is crafted that typically
121tells the user how to fix the problem.
122*/
123
124#![deny(missing_docs)]
125
126mod decompress;
127mod escape;
128mod hostname;
129mod human;
130mod pattern;
131mod process;
132mod wtr;
133
134pub use crate::{
135    decompress::{
136        resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
137        DecompressionReader, DecompressionReaderBuilder,
138    },
139    escape::{escape, escape_os, unescape, unescape_os},
140    hostname::hostname,
141    human::{parse_human_readable_size, ParseSizeError},
142    pattern::{
143        pattern_from_bytes, pattern_from_os, patterns_from_path,
144        patterns_from_reader, patterns_from_stdin, InvalidPatternError,
145    },
146    process::{CommandError, CommandReader, CommandReaderBuilder},
147    wtr::{
148        stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
149    },
150};
151
152/// Returns true if and only if stdin is believed to be readable.
153///
154/// When stdin is readable, command line programs may choose to behave
155/// differently than when stdin is not readable. For example, `command foo`
156/// might search the current directory for occurrences of `foo` where as
157/// `command foo < some-file` or `cat some-file | command foo` might instead
158/// only search stdin for occurrences of `foo`.
159///
160/// Note that this isn't perfect and essentially corresponds to a heuristic.
161/// When things are unclear (such as if an error occurs during introspection to
162/// determine whether stdin is readable), this prefers to return `false`. That
163/// means it's possible for an end user to pipe something into your program and
164/// have this return `false` and thus potentially lead to ignoring the user's
165/// stdin data. While not ideal, this is perhaps better than falsely assuming
166/// stdin is readable, which would result in blocking forever on reading stdin.
167/// Regardless, commands should always provide explicit fallbacks to override
168/// behavior. For example, `rg foo -` will explicitly search stdin and `rg foo
169/// ./` will explicitly search the current working directory.
170pub fn is_readable_stdin() -> bool {
171    use std::io::IsTerminal;
172
173    #[cfg(unix)]
174    fn imp() -> bool {
175        use std::{
176            fs::File,
177            os::{fd::AsFd, unix::fs::FileTypeExt},
178        };
179
180        let stdin = std::io::stdin();
181        let fd = match stdin.as_fd().try_clone_to_owned() {
182            Ok(fd) => fd,
183            Err(err) => {
184                log::debug!(
185                    "for heuristic stdin detection on Unix, \
186                     could not clone stdin file descriptor \
187                     (thus assuming stdin is not readable): {err}",
188                );
189                return false;
190            }
191        };
192        let file = File::from(fd);
193        let md = match file.metadata() {
194            Ok(md) => md,
195            Err(err) => {
196                log::debug!(
197                    "for heuristic stdin detection on Unix, \
198                     could not get file metadata for stdin \
199                     (thus assuming stdin is not readable): {err}",
200                );
201                return false;
202            }
203        };
204        let ft = md.file_type();
205        let is_file = ft.is_file();
206        let is_fifo = ft.is_fifo();
207        let is_socket = ft.is_socket();
208        let is_readable = is_file || is_fifo || is_socket;
209        log::debug!(
210            "for heuristic stdin detection on Unix, \
211             found that \
212             is_file={is_file}, is_fifo={is_fifo} and is_socket={is_socket}, \
213             and thus concluded that is_stdin_readable={is_readable}",
214        );
215        is_readable
216    }
217
218    #[cfg(windows)]
219    fn imp() -> bool {
220        let stdin = winapi_util::HandleRef::stdin();
221        let typ = match winapi_util::file::typ(stdin) {
222            Ok(typ) => typ,
223            Err(err) => {
224                log::debug!(
225                    "for heuristic stdin detection on Windows, \
226                     could not get file type of stdin \
227                     (thus assuming stdin is not readable): {err}",
228                );
229                return false;
230            }
231        };
232        let is_disk = typ.is_disk();
233        let is_pipe = typ.is_pipe();
234        let is_readable = is_disk || is_pipe;
235        log::debug!(
236            "for heuristic stdin detection on Windows, \
237             found that is_disk={is_disk} and is_pipe={is_pipe}, \
238             and thus concluded that is_stdin_readable={is_readable}",
239        );
240        is_readable
241    }
242
243    #[cfg(not(any(unix, windows)))]
244    fn imp() -> bool {
245        log::debug!("on non-{{Unix,Windows}}, assuming stdin is not readable");
246        false
247    }
248
249    !std::io::stdin().is_terminal() && imp()
250}
251
252/// Returns true if and only if stdin is believed to be connected to a tty
253/// or a console.
254///
255/// Note that this is now just a wrapper around
256/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
257/// Callers should prefer using the `IsTerminal` trait directly. This routine
258/// is deprecated and will be removed in the next semver incompatible release.
259#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
260pub fn is_tty_stdin() -> bool {
261    use std::io::IsTerminal;
262    std::io::stdin().is_terminal()
263}
264
265/// Returns true if and only if stdout is believed to be connected to a tty
266/// or a console.
267///
268/// This is useful for when you want your command line program to produce
269/// different output depending on whether it's printing directly to a user's
270/// terminal or whether it's being redirected somewhere else. For example,
271/// implementations of `ls` will often show one item per line when stdout is
272/// redirected, but will condensed output when printing to a tty.
273///
274/// Note that this is now just a wrapper around
275/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
276/// Callers should prefer using the `IsTerminal` trait directly. This routine
277/// is deprecated and will be removed in the next semver incompatible release.
278#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
279pub fn is_tty_stdout() -> bool {
280    use std::io::IsTerminal;
281    std::io::stdout().is_terminal()
282}
283
284/// Returns true if and only if stderr is believed to be connected to a tty
285/// or a console.
286///
287/// Note that this is now just a wrapper around
288/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
289/// Callers should prefer using the `IsTerminal` trait directly. This routine
290/// is deprecated and will be removed in the next semver incompatible release.
291#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
292pub fn is_tty_stderr() -> bool {
293    use std::io::IsTerminal;
294    std::io::stderr().is_terminal()
295}