<h1 align="center" width="100%">POSIX Portable String Conversion</h1>
Converts Unicode strings to ones containing only characters from the [POSIX Portable File Name Character Set (Wikipedia)](https://en.wikipedia.org/wiki/Portable_character_set#Portable_filename_character_set). Other characters are converted to the closest ASCII representation using [deunicode (docs.rs)](https://docs.rs/deunicode/latest/deunicode/) where possible and removed otherwise, and delimiters are automatically inserted where necessary using an algorithm described further down. The converted strings may then be used as user-facing filenames or keys in systems where portability is required. Filenames also have a max length of 255 enforced while preserving the extension.
## Examples
- "Horsey 🦄🦄" → `horsey_unicorn_unicorn`
- "Næstved" → `naestved`
- "晒后假日" → `shai_hou_jia_ri`
- "Београд - Добановци" → `beograd-dobanovtsi`
- " 🌵 . 🌵 Prickly/delimiters 🌵!" → `cactus.cactus_prickly_delimiters_cactus`
## Documentation
Read the [DOCUMENTATION HERE](https://docs.rs/posix_string/latest/posix_string/).
## Delimiter insertion algorithm
The goal is to insert a delimiting `_` before and after a conversion like `😃 → smiley` to ensure that e.g. `晒后假日` gets converted as `shai_hou_jia_ri` and not `shaihoujiari`. However, this can't be done carte blanche since the input symbol may already be surrounded by a delimiter or string terminals; e.g., we would otherwise get conversions like `😃.😃 → _smiley_._smiley_`. Instead of inserting a delimiter directly, we therefore insert a special _marker_ character, which indicates "we here need a delimiter". A marker is then reified as a delimiting `_` if both the following conditions are met:
- The next character is not a delimiter, string terminal, or marker.
- The previous non-marker character is not a delimiter or string terminal.
The "non-marker" clauses above ensure that multiple sequential markers get reified as at most one delimiter.
The markers are inserted around a conversion if one of the following conditions are met:
- There was no viable conversion. If so, we assume that the input character was non-alphabetic and is therefore best represented as a delimiter.
- The conversion has length > 1 and was from a non-alphabetic input character. This ensures that we're not adding markers around e.g. `ä` in `aäa → aaa` or `æ` in `aæa → aaea`.
Additionally, input characters can be wholly replaced with a marker if it's an ASCII symbol not among the allowed ones (`._-`). We do this instead of directly replacing them with an allowed delimiter since it ensures that multiple sequential non-allowed symbols are replaced with at most one delimiter. E.g., `a!"#b` gets converted as `a_b` and not `a___b`.
Note that there are simpler ways of ensuring the same delimiter requirements by creating an intermediate buffer and filtering superfluous delimiters, but this would require dynamic allocations.