Skip to main content

TokenType

Trait TokenType 

Source
pub trait TokenType:
    Copy
    + Eq
    + Hash
    + Send
    + Sync
    + 'static
    + Debug {
    type Role: TokenRole;

    const END_OF_STREAM: Self;

    // Required method
    fn role(&self) -> Self::Role;

    // Provided methods
    fn is_role(&self, role: Self::Role) -> bool { ... }
    fn is_universal(&self, role: UniversalTokenRole) -> bool { ... }
    fn is_comment(&self) -> bool { ... }
    fn is_whitespace(&self) -> bool { ... }
    fn is_error(&self) -> bool { ... }
    fn is_ignored(&self) -> bool { ... }
    fn is_end_of_stream(&self) -> bool { ... }
}
Expand description

Token type definitions for tokens in the parsing system.

This module provides the TokenType trait which serves as the foundation for defining different types of tokens in the parsing system. It enables categorization of token elements and provides methods for identifying their roles in the language grammar.

§Universal Grammar Philosophy

The role mechanism in Oak is inspired by the concept of “Universal Grammar”. While every language has its own unique “Surface Structure” (its specific token kinds), most share a common “Deep Structure” (syntactic roles).

By mapping language-specific kinds to UniversalTokenRole, we enable generic tools like highlighters and formatters to work across 100+ languages without deep knowledge of each one’s specific grammar.

§Implementation Guidelines

When implementing this trait for a specific language:

  • Use an enum with discriminant values for efficient matching
  • Ensure all variants are Copy and Eq for performance
  • Include an END_OF_STREAM variant to signal input termination
  • Define a Role associated type and implement the role() method to provide syntactic context.

§Examples

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
enum SimpleToken {
    Identifier,
    Number,
    Plus,
    EndOfStream,
}

impl TokenType for SimpleToken {
    const END_OF_STREAM: Self = SimpleToken::EndOfStream;
    type Role = UniversalTokenRole; // Or a custom Role type

    fn role(&self) -> Self::Role {
        match self {
            SimpleToken::Identifier => UniversalTokenRole::Name,
            SimpleToken::Number => UniversalTokenRole::Literal,
            SimpleToken::Plus => UniversalTokenRole::Operator,
            _ => UniversalTokenRole::None,
        }
    }

    // ... other methods
}

Required Associated Constants§

Source

const END_OF_STREAM: Self

A constant representing the end of the input stream.

This special token type is used to signal that there are no more tokens to process in the input. It’s essential for parsers to recognize when they’ve reached the end of the source code.

§Implementation Notes

This should be a specific variant of your token enum that represents the end-of-stream condition. It’s used throughout the parsing framework to handle boundary conditions and termination logic.

Required Associated Types§

Source

type Role: TokenRole

The associated role type for this token kind.

Required Methods§

Source

fn role(&self) -> Self::Role

Returns the general syntactic role of this token.

This provides a language-agnostic way for tools to understand the purpose of a token (e.g., is it a name, a literal, or a keyword) across diverse languages like SQL, ASM, YAML, or Rust.

Provided Methods§

Source

fn is_role(&self, role: Self::Role) -> bool

Returns true if this token matches the specified language-specific role.

Source

fn is_universal(&self, role: UniversalTokenRole) -> bool

Returns true if this token matches the specified universal role.

Source

fn is_comment(&self) -> bool

Returns true if this token represents a comment.

§Default Implementation

Based on UniversalTokenRole::Comment.

Source

fn is_whitespace(&self) -> bool

Returns true if this token represents whitespace.

§Default Implementation

Based on UniversalTokenRole::Whitespace.

Source

fn is_error(&self) -> bool

Returns true if this token represents an error condition.

§Default Implementation

Based on UniversalTokenRole::Error.

Source

fn is_ignored(&self) -> bool

Returns true if this token represents trivia (whitespace, comments, etc.).

Trivia tokens are typically ignored during parsing but preserved for formatting and tooling purposes. They don’t contribute to the syntactic structure of the language but are important for maintaining the original source code formatting.

§Default Implementation

The default implementation considers a token as trivia if it is either whitespace or a comment. Language implementations can override this method if they have additional trivia categories.

§Examples
// Skip over trivia tokens during parsing
while current_token.is_ignored() {
    advance_to_next_token();
}
Source

fn is_end_of_stream(&self) -> bool

Returns true if this token represents the end of the input stream.

This method provides a convenient way to check if a token is the special END_OF_STREAM token without directly comparing with the constant.

§Examples
// Loop until we reach the end of the input
while !current_token.is_end_of_stream() {
    process_token(current_token);
    current_token = next_token();
}

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§