grep_cli/lib.rs
1/*!
2This crate provides common routines used in command line applications, with a
3focus on routines useful for search oriented applications. As a utility
4library, there is no central type or function. However, a key focus of this
5crate is to improve failure modes and provide user friendly error messages
6when things go wrong.
7
8To the best extent possible, everything in this crate works on Windows, macOS
9and Linux.
10
11
12# Standard I/O
13
14[`is_readable_stdin`] determines whether stdin can be usefully read from. It
15is useful when writing an application that changes behavior based on whether
16the application was invoked with data on stdin. For example, `rg foo` might
17recursively search the current working directory for occurrences of `foo`, but
18`rg foo < file` might only search the contents of `file`.
19
20
21# Coloring and buffering
22
23The [`stdout`], [`stdout_buffered_block`] and [`stdout_buffered_line`] routines
24are alternative constructors for [`StandardStream`]. A `StandardStream`
25implements `termcolor::WriteColor`, which provides a way to emit colors to
26terminals. Its key use is the encapsulation of buffering style. Namely,
27`stdout` will return a line buffered `StandardStream` if and only if
28stdout is connected to a tty, and will otherwise return a block buffered
29`StandardStream`. Line buffering is important for use with a tty because it
30typically decreases the latency at which the end user sees output. Block
31buffering is used otherwise because it is faster, and redirecting stdout to a
32file typically doesn't benefit from the decreased latency that line buffering
33provides.
34
35The `stdout_buffered_block` and `stdout_buffered_line` can be used to
36explicitly set the buffering strategy regardless of whether stdout is connected
37to a tty or not.
38
39
40# Escaping
41
42The [`escape`](crate::escape()), [`escape_os`], [`unescape`] and
43[`unescape_os`] routines provide a user friendly way of dealing with UTF-8
44encoded strings that can express arbitrary bytes. For example, you might want
45to accept a string containing arbitrary bytes as a command line argument, but
46most interactive shells make such strings difficult to type. Instead, we can
47ask users to use escape sequences.
48
49For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the
50following bytes:
51
52```ignore
53[b'a', b'\\', b'x', b'F', b'F', b'z']
54```
55
56However, we can
57interpret `\xFF` as an escape sequence with the `unescape`/`unescape_os`
58routines, which will yield
59
60```ignore
61[b'a', b'\xFF', b'z']
62```
63
64instead. For example:
65
66```
67use grep_cli::unescape;
68
69// Note the use of a raw string!
70assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz"));
71```
72
73The `escape`/`escape_os` routines provide the reverse transformation, which
74makes it easy to show user friendly error messages involving arbitrary bytes.
75
76
77# Building patterns
78
79Typically, regular expression patterns must be valid UTF-8. However, command
80line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the standard
81library's UTF-8 conversion functions from `OsStr`s do not provide good error
82messages. However, the [`pattern_from_bytes`] and [`pattern_from_os`] do,
83including reporting exactly where the first invalid UTF-8 byte is seen.
84
85Additionally, it can be useful to read patterns from a file while reporting
86good error messages that include line numbers. The [`patterns_from_path`],
87[`patterns_from_reader`] and [`patterns_from_stdin`] routines do just that. If
88any pattern is found that is invalid UTF-8, then the error includes the file
89path (if available) along with the line number and the byte offset at which the
90first invalid UTF-8 byte was observed.
91
92
93# Read process output
94
95Sometimes a command line application needs to execute other processes and
96read its stdout in a streaming fashion. The [`CommandReader`] provides this
97functionality with an explicit goal of improving failure modes. In particular,
98if the process exits with an error code, then stderr is read and converted into
99a normal Rust error to show to end users. This makes the underlying failure
100modes explicit and gives more information to end users for debugging the
101problem.
102
103As a special case, [`DecompressionReader`] provides a way to decompress
104arbitrary files by matching their file extensions up with corresponding
105decompression programs (such as `gzip` and `xz`). This is useful as a means of
106performing simplistic decompression in a portable manner without binding to
107specific compression libraries. This does come with some overhead though, so
108if you need to decompress lots of small files, this may not be an appropriate
109convenience to use.
110
111Each reader has a corresponding builder for additional configuration, such as
112whether to read stderr asynchronously in order to avoid deadlock (which is
113enabled by default).
114
115
116# Miscellaneous parsing
117
118The [`parse_human_readable_size`] routine parses strings like `2M` and converts
119them to the corresponding number of bytes (`2 * 1<<20` in this case). If an
120invalid size is found, then a good error message is crafted that typically
121tells the user how to fix the problem.
122*/
123
124#![deny(missing_docs)]
125
126mod decompress;
127mod escape;
128mod hostname;
129mod human;
130mod pattern;
131mod process;
132mod wtr;
133
134pub use crate::{
135 decompress::{
136 resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
137 DecompressionReader, DecompressionReaderBuilder,
138 },
139 escape::{escape, escape_os, unescape, unescape_os},
140 hostname::hostname,
141 human::{parse_human_readable_size, ParseSizeError},
142 pattern::{
143 pattern_from_bytes, pattern_from_os, patterns_from_path,
144 patterns_from_reader, patterns_from_stdin, InvalidPatternError,
145 },
146 process::{CommandError, CommandReader, CommandReaderBuilder},
147 wtr::{
148 stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
149 },
150};
151
152/// Returns true if and only if stdin is believed to be readable.
153///
154/// When stdin is readable, command line programs may choose to behave
155/// differently than when stdin is not readable. For example, `command foo`
156/// might search the current directory for occurrences of `foo` where as
157/// `command foo < some-file` or `cat some-file | command foo` might instead
158/// only search stdin for occurrences of `foo`.
159///
160/// Note that this isn't perfect and essentially corresponds to a heuristic.
161/// When things are unclear (such as if an error occurs during introspection to
162/// determine whether stdin is readable), this prefers to return `false`. That
163/// means it's possible for an end user to pipe something into your program and
164/// have this return `false` and thus potentially lead to ignoring the user's
165/// stdin data. While not ideal, this is perhaps better than falsely assuming
166/// stdin is readable, which would result in blocking forever on reading stdin.
167/// Regardless, commands should always provide explicit fallbacks to override
168/// behavior. For example, `rg foo -` will explicitly search stdin and `rg foo
169/// ./` will explicitly search the current working directory.
170pub fn is_readable_stdin() -> bool {
171 use std::io::IsTerminal;
172
173 #[cfg(unix)]
174 fn imp() -> bool {
175 use std::{
176 fs::File,
177 os::{fd::AsFd, unix::fs::FileTypeExt},
178 };
179
180 let stdin = std::io::stdin();
181 let fd = match stdin.as_fd().try_clone_to_owned() {
182 Ok(fd) => fd,
183 Err(err) => {
184 log::debug!(
185 "for heuristic stdin detection on Unix, \
186 could not clone stdin file descriptor \
187 (thus assuming stdin is not readable): {err}",
188 );
189 return false;
190 }
191 };
192 let file = File::from(fd);
193 let md = match file.metadata() {
194 Ok(md) => md,
195 Err(err) => {
196 log::debug!(
197 "for heuristic stdin detection on Unix, \
198 could not get file metadata for stdin \
199 (thus assuming stdin is not readable): {err}",
200 );
201 return false;
202 }
203 };
204 let ft = md.file_type();
205 let is_file = ft.is_file();
206 let is_fifo = ft.is_fifo();
207 let is_socket = ft.is_socket();
208 let is_readable = is_file || is_fifo || is_socket;
209 log::debug!(
210 "for heuristic stdin detection on Unix, \
211 found that \
212 is_file={is_file}, is_fifo={is_fifo} and is_socket={is_socket}, \
213 and thus concluded that is_stdin_readable={is_readable}",
214 );
215 is_readable
216 }
217
218 #[cfg(windows)]
219 fn imp() -> bool {
220 let stdin = winapi_util::HandleRef::stdin();
221 let typ = match winapi_util::file::typ(stdin) {
222 Ok(typ) => typ,
223 Err(err) => {
224 log::debug!(
225 "for heuristic stdin detection on Windows, \
226 could not get file type of stdin \
227 (thus assuming stdin is not readable): {err}",
228 );
229 return false;
230 }
231 };
232 let is_disk = typ.is_disk();
233 let is_pipe = typ.is_pipe();
234 let is_readable = is_disk || is_pipe;
235 log::debug!(
236 "for heuristic stdin detection on Windows, \
237 found that is_disk={is_disk} and is_pipe={is_pipe}, \
238 and thus concluded that is_stdin_readable={is_readable}",
239 );
240 is_readable
241 }
242
243 #[cfg(not(any(unix, windows)))]
244 fn imp() -> bool {
245 log::debug!("on non-{{Unix,Windows}}, assuming stdin is not readable");
246 false
247 }
248
249 !std::io::stdin().is_terminal() && imp()
250}
251
252/// Returns true if and only if stdin is believed to be connected to a tty
253/// or a console.
254///
255/// Note that this is now just a wrapper around
256/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
257/// Callers should prefer using the `IsTerminal` trait directly. This routine
258/// is deprecated and will be removed in the next semver incompatible release.
259#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
260pub fn is_tty_stdin() -> bool {
261 use std::io::IsTerminal;
262 std::io::stdin().is_terminal()
263}
264
265/// Returns true if and only if stdout is believed to be connected to a tty
266/// or a console.
267///
268/// This is useful for when you want your command line program to produce
269/// different output depending on whether it's printing directly to a user's
270/// terminal or whether it's being redirected somewhere else. For example,
271/// implementations of `ls` will often show one item per line when stdout is
272/// redirected, but will condensed output when printing to a tty.
273///
274/// Note that this is now just a wrapper around
275/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
276/// Callers should prefer using the `IsTerminal` trait directly. This routine
277/// is deprecated and will be removed in the next semver incompatible release.
278#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
279pub fn is_tty_stdout() -> bool {
280 use std::io::IsTerminal;
281 std::io::stdout().is_terminal()
282}
283
284/// Returns true if and only if stderr is believed to be connected to a tty
285/// or a console.
286///
287/// Note that this is now just a wrapper around
288/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
289/// Callers should prefer using the `IsTerminal` trait directly. This routine
290/// is deprecated and will be removed in the next semver incompatible release.
291#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
292pub fn is_tty_stderr() -> bool {
293 use std::io::IsTerminal;
294 std::io::stderr().is_terminal()
295}