Skip to main content

TokenizerConfig

Struct TokenizerConfig 

Source
pub struct TokenizerConfig {
Show 14 fields pub keywords: HashMap<String, TokenType>, pub single_tokens: HashMap<char, TokenType>, pub quotes: HashMap<String, String>, pub identifiers: HashMap<char, char>, pub comments: HashMap<String, Option<String>>, pub string_escapes: Vec<char>, pub nested_comments: bool, pub escape_follow_chars: Vec<char>, pub b_prefix_is_byte_string: bool, pub numeric_literals: HashMap<String, String>, pub identifiers_can_start_with_digit: bool, pub hex_number_strings: bool, pub hex_string_is_integer_type: bool, pub string_escapes_allowed_in_raw_strings: bool,
}
Expand description

Tokenizer configuration for a dialect

Fields§

§keywords: HashMap<String, TokenType>

Keywords mapping (uppercase keyword -> token type)

§single_tokens: HashMap<char, TokenType>

Single character tokens

§quotes: HashMap<String, String>

Quote characters (start -> end)

§identifiers: HashMap<char, char>

Identifier quote characters (start -> end)

§comments: HashMap<String, Option<String>>

Comment definitions (start -> optional end)

§string_escapes: Vec<char>

String escape characters

§nested_comments: bool

Whether to support nested comments

§escape_follow_chars: Vec<char>

Valid escape follow characters (for MySQL-style escaping). When a backslash is followed by a character NOT in this list, the backslash is discarded. When empty, all backslash escapes preserve the backslash for unrecognized sequences.

§b_prefix_is_byte_string: bool

Whether b’…’ is a byte string (true for BigQuery) or bit string (false for standard SQL). Default is false (bit string).

§numeric_literals: HashMap<String, String>

Numeric literal suffixes (uppercase suffix -> type name), e.g. {“L”: “BIGINT”, “S”: “SMALLINT”} Used by Hive/Spark to parse 1L as CAST(1 AS BIGINT)

§identifiers_can_start_with_digit: bool

Whether unquoted identifiers can start with a digit (e.g., 1a, 1_a). When true, a number followed by letters/underscore is treated as an identifier. Used by Hive, Spark, MySQL, ClickHouse.

§hex_number_strings: bool

Whether 0x/0X prefix should be treated as hex literals. When true, 0XCC is tokenized instead of Number(“0”) + Identifier(“XCC”). Used by BigQuery, SQLite, Teradata.

§hex_string_is_integer_type: bool

Whether hex string literals from 0x prefix represent integer values. When true (BigQuery), 0xA is tokenized as HexNumber (integer in hex notation). When false (SQLite, Teradata), 0xCC is tokenized as HexString (binary/blob value).

§string_escapes_allowed_in_raw_strings: bool

Whether string escape sequences (like ') are allowed in raw strings. When true (BigQuery default), ' inside r’…’ escapes the quote. When false (Spark/Databricks), backslashes in raw strings are always literal. Python sqlglot: STRING_ESCAPES_ALLOWED_IN_RAW_STRINGS (default True)

Trait Implementations§

Source§

impl Clone for TokenizerConfig

Source§

fn clone(&self) -> TokenizerConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TokenizerConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for TokenizerConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.