cfmt - Format output without Rust code segment in binary to reduce the ultimate binary size
Restricted on embedded systems, the goal of cfmt is to reduce the ultimate
binary size. With cfmt, one could avoid the uses of Rust format print
function by converting them into formatted print in C.
Usage
The specification of the formatted strings is defined as follows:
format-spec = {:d|u|x|p|e|cs|rs|rb|cc|rc}
d: print int as digits, see %lld
u: print int as hexdecimals, see %llu
x: print int as hexdecimals a/b/c/d/e/f, see %llx
p: print pointer,see %p
e: print floating point numbers, see %e
cs: print C string pointers, see %s
rs: print Rust string &str, see %.*s
rb: print Rust slice &[u8], see %.*s
cc: print ASCII char into int type in C, see %c
rc: print Rust char into unicode scalar value, see %s
The converted C function has the spec dprintf(int fd, const char* format, ...);,
which needs to be implemented in user's code. The value of the first argument fd,
1 stands for stdout, 2 stands for stderr.
cfmt provides the following macros:
// print to stdout, converted into dprintf(1, format, ...)
cprint!;
print!;
// append \n to cprint!, converted into dprintf(1, format "\n", ...)
cprintln!;
println!;
// print to stderr, converted into dprintf(2, format, ...)
ceprint!;
eprint!;
// append \n to ceprint!, converted into dprintf(2, format "\n", ...)
ceprintln!;
eprintln!;
//write to buf, converted into snprintf(buf.as_byte().as_ptr(), buf.len(), format, ...)
csprint!
sprint!
//write to buf, converted into snprintf(buf.as_ptr(), buf.len(), format, ...)
cbprint!
bprint!
The usage in Rust is shown as follows:
extern "C"
After cargo expand, the above code becomes:
extern "C"
Design Rationale
While mixing Rust/C, unconditionally convert Rust's formated prints into C's API could completely remove the dependencies on Display/Debug traits, thereby eliminating the overhead of Rust formatted printing and achieving the optimal size.
Ideally, the formatted print follows the spec in Rust as follows:
After expanding with the proc macro cprintln!, it becomes
extern "C"
To implement the above, we need to have the proc macro satisfy the following requirements:
- RUST strings need to be ended with \0 in C;
- RUST argument size needs to be recognized by the proc macro so as to determine which C format to use, e.g., whether it is
%dor%lld; - RUST argument type needs to be recognized by the proc macro: the format needs to specify the length if it is a string, and
separately treating char arguments with an
*const u8pointer with length。
Unfortunately, proc macros cannot achieve all that. When the are expanded, the parsing has not been done to determine the variable's types. For example, the i32 type in the following code:
type my_i32 = i32;
let i: my_i32 = 100;
cprintln!;
At best, the proc macro can tell the type of i is my_i32, without knowing that actually my_i32 is equivalent to i32.
In fact, in more complex scenarios, the arguments could be variables, or the value returned from a function call. Therefore, it is unrealistic to expect that the proc macro could recognize the type of certain arguments, making it impossible to realize the above ideal solution.
The current implementation of Rust defines Display/Debug traits in response to the type problem by unifying all types into
Display/Debug trait, and perform the conversion based on the interfaces of such traits.
Our objective is to further eliminate the needs of Display/Debug traits, so we have to determine argument types based on the format string.
In fact, Rust also use special characters such as '?' to determine whether a Display or a Debug trait is to be used. o
Following the same principle, we could leverage on the format strings as follows:
This makes it feasible to rely on proc macros. However, there is a problem in
the above, that is, the format string also restricts the argument sizes. For
example, {:x} is int, while {:lld} is long long int in C. It requires
the programmer to guarantee the consistency between the format string and the
argument size. Otherwise, invalid address access could lower the safety of
code. In this regard, we need to provide a simplification, whereby the format
string only defines data type, whitout specifying data size, which in effect
unify the data types into long long int or double in C.
As a result, the proc macro generates the following code:
As such, the safety of Rust code could be ensured: if a wrong argument type is passed on, the compiler would reject it rather than hiding the problem.
Special treatment of string
For strings, the length information has to be passed on, therefore an argument in Rust will to converted into two, causing some side effect. This is illustrated below:
cprintln!;
The generated code reads as follows:
unsafe
Note that get_str() has been invoked twice, which is like side effect in macro of C where the effect is unknown when the macro
is to be expanded more than once. This problem needs to be avoided.
In simple terms, the programmer needs to guarantee the string format output cannot pass function calls that return strings instead of variables. That would reduce usability.
A best choice is to judge whether the argument is a function call. If so, generate a temporary variable. Alternatively, define every string argument
as a temporary variable unconditionally, and report errors explicitly when the string argument is not a &str.
Special treatment of Rust char
Rust char is encoded in unicode, its format output needs to be based on char::encode_utf8 to convert into strings; however,
the use of char::encode_utf8 would automatically introduce symbols of core::fmt, causing the bloat of binary size.
To avoid introducing core::fmt crate, we need to implement the conversion for Rust char.
The first version of implementation is shown below:
Although nothing is done explicitly, the binary still include related symbols in core::fmt:
|
These symbols are added to check the array indices dynamically in Rust to prevent buffer overflow. To eliminate such code in binary, we need to disable all array index checks, which become the following:
This reminds us, that use code needs to avoid the use of dynamic check of array indices, in order to avoid introducing core::fmt dependency.