Crate quotemeta

source ·
Expand description

Shell-quoting, à la Perl’s quotemeta function.

This crate currently provides a quotemeta function which shell-escapes a filename or similar data, and its corresponding inverse, unquotemeta.

It is anticipated that the crate may expand to include fine-tuning of the escaping strategy, but for now quotemeta returns a string which bash and zsh will expand back into the original string. unquotemeta correctly round-trips the output of quotemeta, but does not support arbitrary shell expansions.

At present, this crate is Unix-only. Windows filenames (and thus OsStr) have edge cases which cannot be round-tripped in a way that is compatible with a Unix implementation.

Rationale and implementation details

This crate exists because I transferred a peculiar design trope from some of my Perl utility scripts to their Rust replacements, which is to emit a shell script to do a task rather than do it directly. This separation of responsibilities aligns with the Unix philosophy of composing small tools and gives a lot of interesting benefits: the script can be saved and executed later (or not at all) rather than piped into a shell, and not necessarily even on the same machine which generated it.

But you’re not here to listen to a sales pitch on Unix.

The problem with generating a shell script is that Unix filenames are an arbitrary sequence of octets, many of them being shell metacharacters which need to be quoted before passing to a shell. Perl’s quotemeta simply backslashes “all ASCII characters not matching /[A-Za-z_0-9]/” which works most of the time, but often only by accident. For example, a backslash-quoted newline is removed by POSIX-compliant shells. Non-ASCII characters are not quoted at all, but this might still work provided the locale settings are just so and the Perl string isn’t marked as UTF-8. Good luck!

Rust introduces a new gotcha because it is less laissez-faire with string types than Perl and shells. One cannot println! text which is not valid UTF-8, and so filenames which are not valid UTF-8 cannot be printed as-is. Write::write() makes it possible to write non-UTF-8 text to stdout, but &[u8] lacks a lot of useful string-handling functions, has an unhelpful Debug representation, and this generally just makes the code harder to write and more unreadable. So the quoted form really needs to be valid UTF-8, and the path of least resistance is plain ASCII. Sometimes you just want to println!("cat {}", quotemeta(path)) and it work properly.

I’m specifically targeting bash, and the only way to shell-quote high-bit-set octets without actually including them literally is using ANSI-C quoting. Pure POSIX shells do not understand this, but zsh also does, thus the two major religions are covered. fish doesn’t seem to allow encoding high-bit-set octets at all,dash needs different syntax, and we don’t talk about csh in polite company. There is sadly no one-size-fits-all solution. Welcome to Unix.

Functions

  • Shell-quotes the given OS string into a string.
  • Shell-unquotes a string into an OS string.