Expand description
Deencoding engine for a mixed UTF-8/UTF-16LE scheme
This encoding scheme is equivalent to UTF-8 for scalars in ASCII, and equivalent to UTF-16LE for other scalars.
“But cigix, such an encoding is extremely dumb, surely nobody is actually doing that”, you might say.
Well, yes. I couldn’t find any encoding scheme matching this one, so I had to roll out my own implementation, and indeed it breaks in all sorts of ways, you can even have strings that you can encode but then not decode.
But I once had an insurance card on which "Clément" had become "Cl<some CJK ideogram>ent", that is, it didn’t only mangle the 'é', but also
somehow the following 'm'. The presence of a CJK ideogram made me think it
would have interpreted 2 1-byte units into 1 2-byte UTF-16 unit, but then
how come the rest of the string was not mangled… The quest for what
happened is ultimately how I ended up making this crate, and this custom
encoding scheme, which does yield some funky results, including one that
could be what was on that insurance card (which I have since lost): encoding
"Clément" as Latin-1 and then decoding it with this scheme gives
"Cl淩ent".