Struct xxcalc::tokenizer::Tokenizer
[−]
[src]
pub struct Tokenizer { /* fields omitted */ }
Tokenizer performs the very first step of parsing mathematical expression into Tokens. These tokens can be then processed by TokensProcessor.
Tokenizer is a state machine, which can be reused multiple times. Internally it stores a buffer of Tokens, which can be reused multiple times without requesting new memory from the operating system. If Tokenizer lives long enough this behaviour can greatly reduce time wasted on mallocs.
Examples
let mut tokenizer = Tokenizer::default(); { let tokens = tokenizer.process("2.0+2"); assert_eq!(tokens[0], (0, Token::Number(2.0))); assert_eq!(tokens[1], (3, Token::Operator('+'))); assert_eq!(tokens[2], (4, Token::Number(2.0))); } { let tokens = tokenizer.process("x+log10(100)+x"); assert_eq!(tokens[0], (0, Token::Identifier(0))); assert_eq!(tokens.identifiers[0], "x"); assert_eq!(tokens[1], (1, Token::Operator('+'))); assert_eq!(tokens[2], (2, Token::Identifier(1))); assert_eq!(tokens.identifiers[1], "log10"); assert_eq!(tokens[3], (7, Token::BracketOpening)); assert_eq!(tokens[4], (8, Token::Number(100.0))); assert_eq!(tokens[5], (11, Token::BracketClosing)); assert_eq!(tokens[6], (12, Token::Operator('+'))); assert_eq!(tokens[7], (13, Token::Identifier(0))); }
Trait Implementations
impl Default for Tokenizer
[src]
Creates a new default Tokenizer.
Such tokenizer is optimized (but not limited) for values up to 10 characters and up to 10 tokens. However these are default space capacities and they can extend dynamically.
impl StringProcessor for Tokenizer
[src]
This is a main processing unit in the tokenizer. It takes a string expression and creates a list of tokens representing this string using a state machine.
This tokenizer supports floating point numbers in traditional
and scientific notation (as well as shorthand point notation),
text identifiers and operators such as +
, -
, *
, /
, ^
and =
. Parentheses ()
and comma ,
are supported too.
Whitespaces are always skipped, not recognized characters
are wrapped into Unknown token.
Signed numbers are detected when they cannot be mistaken
for operators +
or -
. Implicit multiplication before an
identifier or a parantheses is replaced with explicit multiplication
with *
operator.
Extending
New features can be add to tokenizer by either embedding this tokenizer into new one and replacing Unknown tokens with some other tokens or by implementing a TokensProcessor which takes output of this tokenizer and replaces Unknown tokens or some combination of tokens with other ones.
State machine
Complete, hand-designed state machine used by this StringProcessor can be seen in the image below: