Module commands::tokenizer [] [src]

Command Tokenization

The command parser needs to be able to tokenize commands into their constituent words and whitespace.

The tokenizer breaks source text into a vector of tokens which can be either whitespace or a word. The tokenizer handles using single and double quotes to provide a single token which may include whitespace.

Tokens also track their source location within the source text. This allows the parser using the tokenizer to provide better error highlighting and other functionality.

Examples

use commands::tokenizer::{tokenize, TokenType};

if let Ok(tokens) = tokenize("word") {
    assert_eq!(tokens.len(), 1);
}

// This is 3 tokens due to the whitespace token
// between the 2 words.
if let Ok(tokens) = tokenize("show interface") {
    assert_eq!(tokens.len(), 3);
    assert_eq!(tokens[1].token_type, TokenType::Whitespace);
}

// Double quoted strings are treated as a single token.
if let Ok(tokens) = tokenize(r#"echo -n "a b c""#) {
    assert_eq!(tokens.len(), 5);
    assert_eq!(tokens[0].text, "echo");
    assert_eq!(tokens[2].text, "-n");
    assert_eq!(tokens[4].text, r#""a b c""#);
}

// Single quoted strings are treated as a single token
// as well.
if let Ok(tokens) = tokenize(r#"'"One token"' 'and another'"#) {
    assert_eq!(tokens.len(), 3);
}

// Or you can use a \ to escape a space.
if let Ok(tokens) = tokenize(r#"ls My\ Documents"#) {
    assert_eq!(tokens.len(), 3);
    assert_eq!(tokens[2].text, r#"My\ Documents"#);
}

Structs

SourceLocation

A range within a body of text.

SourceOffset

A position within a body of text.

Token

A token from a body of text.

Enums

TokenType

The role that a token plays: Whitespace or Word.

TokenizerError

Errors

Functions

tokenize

Tokenize a body of text.