Skip to main content

decode_byte_level_token

Function decode_byte_level_token 

Source
pub fn decode_byte_level_token(raw_token: &str) -> Vec<u8> 
Expand description

Decode a byte-level BPE token (e.g. "Ġhello") to its raw bytes by reversing the GPT-2 byte→unicode table. Characters outside the table fall back to UTF-8 bytes (defensive — shouldn’t happen for valid vocab entries).