Module mixed816leengine

Module mixed816leengine 

Source
Expand description

Deencoding engine for a mixed UTF-8/UTF-16LE scheme

This encoding scheme is equivalent to UTF-8 for scalars in ASCII, and equivalent to UTF-16LE for other scalars.

“But cigix, such an encoding is extremely dumb, surely nobody is actually doing that”, you might say.

Well, yes. I couldn’t find any encoding scheme matching this one, so I had to roll out my own implementation, and indeed it breaks in all sorts of ways, you can even have strings that you can encode but then not decode.

But I once had an insurance card on which "Clément" had become "Cl<some CJK ideogram>ent", that is, it didn’t only mangle the 'é', but also somehow the following 'm'. The presence of a CJK ideogram made me think it would have interpreted 2 1-byte units into 1 2-byte UTF-16 unit, but then how come the rest of the string was not mangled… The quest for what happened is ultimately how I ended up making this crate, and this custom encoding scheme, which does yield some funky results, including one that could be what was on that insurance card (which I have since lost): encoding "Clément" as Latin-1 and then decoding it with this scheme gives "Cl淩ent".

Structs§

Mixed816LEEngine