[][src]Function mangling::mangle

pub fn mangle<T>(name: impl IntoIterator<Item = T>) -> String where
    T: Borrow<u8>, 

Takes an iterator over bytes and produces a String whose contents obey the rules for an identifier in the C language.

The length N of the output in bytes, relative to the input length K, follows these rules, which are considered to be requirements on future implementations:

  • N > K
  • N ≤ 4 * K + 2
  • N ≤ ceil(3.5 * K) + 2 when K > 1

Additionally, the current implementation satisfies these additional constraints:

  • N = 1 + ceil(log10(K + 1)) + K when input matches ^[A-Za-z_]*$
  • N = 2 + ceil(log10(K + 1)) + 2 * K when input matches ^[^A-Za-z_]+$

Examples

let mangle_list = &[
    (""                , "_"                           ),
    ("_123"            , "_4_123"                      ),
    ("123"             , "_03_313233"                  ),
    ("(II)I"           , "_01_282II01_291I"            ),
    ("<init>"          , "_01_3c4init01_3e"            ),
    ("<init>:()V"      , "_01_3c4init04_3e3a28291V"    ),
    ("GCD"             , "_3GCD"                       ),
    ("StackMapTable"   , "_13StackMapTable"            ),
    ("java/lang/Object", "_4java01_2f4lang01_2f6Object"),
];

for &(before, after) in mangle_list {
    assert_eq!(after, mangle(before.bytes()));
}

Implementation details

The resulting symbol begins with an underscore character _, and is followed by zero or more groups of two types: printables and non-printables. The content of the input byte stream determines which type of group comes first, after which the two types alternate strictly.

  • A printable group corresponds to the longest substring of the input that can be consumed while matching the (case-insensitive) regular expression [a-z][a-z0-9_]*. The mangled form is Naaa where N is the unbounded decimal length of the substring in the original input, and aaa is the literal substring.
  • A non-printable group represents the shortest substring in the input that can be consumed before a printable substring begins to match. The mangled form is 0N_xxxxxx where 0 and _ are literal, N is the unbounded decimal length of the substring in the original input, and xxxxxx is the lowercase hexadecimal expansion of the original bytes (two hexadecimal digits per input byte, most significant nybble first).

Note that despite the description above, the current implementation does not actually use regular expressions for matching.