Crate sscanf

source ·
Expand description

A Rust crate with a sscanf (inverse of format!()) Macro based on Regex

Tests Crates.io Documentation Dependency status

sscanf is originally a C-function that takes a string, a format string with placeholders and several variables (in the Rust version replaced with types). It then parses the input string, writing the values behind the placeholders into the variables (Rust: returns a tuple). sscanf can be thought of as reversing a call to format!():

// format: takes format string and values, returns String
let msg = format!("Hello {}{}!", "World", 5);
assert_eq!(msg, "Hello World5!");

// sscanf: takes string, format string and types, returns tuple
let parsed = sscanf::sscanf!(msg, "Hello {}{}!", str, usize);

// parsed is Result<(&str, usize), ...>
assert_eq!(parsed.unwrap(), ("World", 5));

// alternative syntax:
let parsed2 = sscanf::sscanf!(msg, "Hello {str}{usize}!");
assert_eq!(parsed2.unwrap(), ("World", 5));

sscanf!() takes a format string like format!(), but doesn’t write the values into the placeholders ({}), but extracts the values at those {} into the return tuple.

If matching the format string failed, an Error is returned:

let msg = "Text that doesn't match the format string";
let parsed = sscanf::sscanf!(msg, "Hello {str}{usize}!");
assert!(matches!(parsed, Err(sscanf::Error::MatchFailed)));

Types in Placeholders:

The types can either be given as a separate parameter after the format string, or directly inside of the {} placeholder.
The first allows for autocomplete while typing, syntax highlighting and better compiler errors generated by sscanf in case that the wrong types are given.
The second imitates the Rust format!() behavior since 1.58. This option gives worse compiler errors when using stable Rust, but is otherwise identical to the first option.

More examples of the capabilities of sscanf:

use sscanf::sscanf;
use std::num::NonZeroUsize;

let input = "<x=3, y=-6, z=6>";
let parsed = sscanf!(input, "<x={i32}, y={i32}, z={i32}>");
assert_eq!(parsed.unwrap(), (3, -6, 6));

let input = "Move to N36E21";
let parsed = sscanf!(input, "Move to {char}{usize}{char}{usize}");
assert_eq!(parsed.unwrap(), ('N', 36, 'E', 21));

let input = "Escape literal { } as {{ and }}";
let parsed = sscanf!(input, "Escape literal {{ }} as {{{{ and }}}}");
assert_eq!(parsed.unwrap(), ());

let input = "Indexing types: N36E21";
let parsed = sscanf!(input, "Indexing types: {1}{0}{1}{0}", NonZeroUsize, char);
// output is in the order of the placeholders
assert_eq!(parsed.unwrap(), ('N', NonZeroUsize::new(36).unwrap(),
                             'E', NonZeroUsize::new(21).unwrap()));

let input = "A Sentence with Spaces. Another Sentence.";
// str and String do the same, but String clones from the input string
// to take ownership instead of borrowing.
let (a, b) = sscanf!(input, "{String}. {str}.").unwrap();
assert_eq!(a, "A Sentence with Spaces");
assert_eq!(b, "Another Sentence");

// Number format options
let input = "ab01  127  101010  1Z";
let parsed = sscanf!(input, "{usize:x}  {i32:o}  {u8:b}  {u32:r36}");
let (a, b, c, d) = parsed.unwrap();
assert_eq!(a, 0xab01);     // Hexadecimal
assert_eq!(b, 0o127);      // Octal
assert_eq!(c, 0b101010);   // Binary

assert_eq!(d, 71);         // any radix (r36 = Radix 36)
assert_eq!(d, u32::from_str_radix("1Z", 36).unwrap());

let input = "color: #D4AF37";
// Number types take their size into account, and hexadecimal u8 can
// have at most 2 digits => only possible match is 2 digits each.
let (r, g, b) = sscanf!(input, "color: #{u8:x}{u8:x}{u8:x}").unwrap();
assert_eq!((r, g, b), (0xD4, 0xAF, 0x37));

The input in this case is a &'static str, but it can be String, &str, &String, … Basically anything with Deref<Target=str>. and without taking ownership. See here for a few examples of possible inputs.

The parsing part of this macro has very few limitations, since it replaces the {} with a Regular Expression (regex) that corresponds to that type. For example:

  • char is just one character (regex ".")
  • str is any sequence of characters (regex ".+?")
  • Numbers are any sequence of digits (regex "[-+]?\d+")

And so on. The actual implementation for numbers tries to take the size of the type into account and some other details, but that is the gist of the parsing.

This means that any sequence of replacements is possible as long as the Regex finds a combination that works. In the char, usize, char, usize example above it manages to assign the N and E to the chars because they cannot be matched by the usizes.

Format Options

All options are inside '{' '}' and after a :, so either as {<type>:<option>} or as {:<option>}. Note: The type might still have a path that contains ::. Any double colons are ignored and only single colons are used to separate the options.

Procedural macro don’t have any reliable type info and can only compare types by name. This means that the number options below only work with a literal type like “i32”, NO Paths (std::i32) or Wrappers (struct Wrapper(i32);) or Aliases (type Alias = i32;). ONLY i32, usize, u16, …

configdescriptionpossible types
{:/ <regex> /}custom regexany
{:x}hexadecimal numbersintegers
{:o}octal numbersintegers
{:b}binary numbersintegers
{:r2} - {:r36}radix 2 - radix 36 numbersintegers
#“alternate” formvarious types

Custom Regex:

  • {:/.../}: Match according to the Regex between the / /

For example:

let input = "random Text";
let parsed = sscanf::sscanf!(input, "{str:/[^m]+/}{str}");

// regex  [^m]+  matches anything that isn't an 'm'
// => stops at the 'm' in 'random'
assert_eq!(parsed.unwrap(), ("rando", "m Text"));

The regex uses the same escaping logic as JavaScripts /.../ syntax, meaning that the normal regex escaping with \d for digits etc. is in effect, with the addition that any / need to be escaped as \/ since they are used to end the regex.

NOTE: You should use raw strings for a format string containing a regex, since otherwise you need to escape any \ as \\:

use sscanf::sscanf;
let input = "1234";
let parsed = sscanf!(input, r"{u8:/\d{2}/}{u8}"); // regex  \d{2}  matches 2 digits
let _ =      sscanf!(input, "{u8:/\\d{2}/}{u8}"); // the same with a non-raw string
assert_eq!(parsed.unwrap(), (12, 34));

Note: If you use any unescaped ( ) in your regex, you have to prevent them from forming a capture group by adding a ?: at the beginning: {:/..(..)../} becomes {:/..(?:..)../}. This won’t change their functionality in any way, but is necessary for sscanf’s parsing process to work.

This also means that custom regexes cannot be used on custom types that derive FromScanf since those rely on having an exact number of capture groups inside of their regex.

Radix Options:

Only work on primitive integer types (u8, …, u128, i8, …, i128, usize, isize).

  • x: hexadecimal Number (Digits 0-9 and a-f or A-F), optional prefix 0x or 0X
  • o: octal Number (Digits 0-7), optional prefix 0o or 0O
  • b: binary Number (Digits 0-1), optional prefix 0b or 0B
  • r2 - r36: any radix Number (Digits 0-9 and a-z or A-Z for higher radices)

Alternate form:

If used alongside a radix option: makes the number require a prefix (0x, 0o, 0b).

A note on prefixes: r2, r8 and r16 match the same numbers as b, o and x respectively, but without a prefix. Thus:

  • {:x} may have a prefix, matching numbers like 0xab or ab
  • {:r16} has no prefix and would only match ab
  • {:#x} must have a prefix, matching only 0xab
  • {:#r16} gives a compile error

More uses for # may be added in the future. Let me know if you have a suggestion for this.

Custom Types

sscanf works with most primitive Types from std as well as String by default. The full list can be seen here: Implementations of RegexRepresentation.

To add more types there are three options:

The simplest option is to use derive:

#[derive(sscanf::FromScanf)]
#[sscanf(format = "#{r:x}{g:x}{b:x}")] // matches '#' followed by 3 hexadecimal u8s
struct Color {
    r: u8,
    g: u8,
    b: u8,
}

let input = "color: #ff00cc";
let parsed = sscanf::sscanf!(input, "color: {Color}").unwrap();
assert!(matches!(parsed, Color { r: 0xff, g: 0x00, b: 0xcc }));

Also works for enums:

#[derive(sscanf::FromScanf)]
enum HasChanged {
    #[sscanf(format = "received {added} additions and {deleted} deletions")]
    Yes {
        added: usize,
        deleted: usize,
    },
    #[sscanf("has not changed")] // the `format =` part can be omitted
    No
}

let input = "Your file has not changed since your last visit!";
let parsed = sscanf::sscanf!(input, "Your file {HasChanged} since your last visit!").unwrap();
assert!(matches!(parsed, HasChanged::No));

let input = "Your file received 325 additions and 15 deletions since your last visit!";
let parsed = sscanf::sscanf!(input, "Your file {HasChanged} since your last visit!").unwrap();
assert!(matches!(parsed, HasChanged::Yes { added: 325, deleted: 15 }));

More details can be found in the FromScanf documentation and the derive documentation

Changelog

See Changelog.md

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

A Note on Compiler Errors

Errors in the format string would ideally point to the exact position in the string that caused the error. This is already the case if you compile/check with nightly, but not on stable, or at least until Rust Issue #54725 is far enough to allow for this method to be called from stable.

Compiler Errors on nightly currently look like this:

sscanf!("", "Too many placeholders: {}{}{}", usize);
error: more placeholders than types provided
  |
4 | sscanf!("", "Too many placeholders: {}{}{}", usize);
  |                                       ^^

But on stable, you are limited to only pointing at the entire format string:

4 | sscanf!("", "Too many placeholders: {}{}{}", usize);
  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The current workaround is to replicate that behavior in the error message itself:

error: more placeholders than types provided:
       At "Too many placeholders: {}{}{}"
                                    ^^
  |
4 | sscanf!("", "Too many placeholders: {}{}{}", usize);
  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The alternative is to use cargo +nightly check to see the better errors whenever something goes wrong, or setting your Editor plugin to check with nightly.

This does not influence the functionality in any way. This Crate works entirely on stable with no drawbacks in functionality or performance. The only difference is the compiler errors that you get while writing format strings.

Modules

Macros

  • A Macro to parse a string based on a format-string, similar to sscanf in C
  • Same as sscanf, but returns the regex without running it. Useful for debugging or efficiency.
  • Same as sscanf, but allows use of Regex in the format String.

Structs

  • FullF32Deprecated
    An obsolete type, currently identical to f32
  • FullF64Deprecated
    An obsolete type, currently identical to f64
  • HexNumberDeprecated
    Matches a Hexadecimal Number with optional 0x prefix. Deprecated in favor of format options

Enums

Traits

  • A trait that allows you to use a custom regex for parsing a type.
  • A Trait used by sscanf to obtain the Regex of a Type

Derive Macros