cfi\_types
==========

CFI types for cross-language LLVM CFI support.
Installation
------------
To install the `cfi_types` crate:
1. On a command prompt or terminal with your package root's directory as the
current working directory, run the following command:
cargo add cfi-types
Or:
1. Add the `cfi_types` crate to your package root's `Cargo.toml` file and
replace `X.Y.Z` by the version you want to install:
[dependencies]
cfi-types = "X.Y.Z"
2. On a command prompt or terminal with your package root's directory as the
current working directory, run the following command:
cargo fetch
Usage
-----
To use the `cfi_types` crate:
1. Import the CFI types from the `cfi_types` crate. E.g.:
use cfi_types::c_long;
2. Replace uses of C type aliases by CFI types. E.g.:
extern "C" {
fn func(arg: c_long);
}
fn main() {
unsafe { func(c_long(5)) };
}
Background
----------
### Type metadata
LLVM uses [type metadata](https://llvm.org/docs/TypeMetadata.html) to allow IR
modules to aggregate pointers by their types. This type metadata is used by
LLVM CFI to test whether a given pointer is associated with a type identifier
(i.e., test type membership).
Clang uses the [Itanium C++
ABI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html)'s [virtual tables and
RTTI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-special-vtables)
`typeinfo` structure name as type metadata identifiers for function pointers.
For cross-language LLVM CFI support, a compatible encoding must be used. The
compatible encoding chosen for cross-language LLVM CFI support is the Itanium
C++ ABI mangling with vendor extended type qualifiers and types for Rust types
that are not used across the FFI boundary (see Type metadata in the [design
document](https://rcvalle.com/docs/rust-cfi-design-doc.pdf)).
### Encoding C integer types
Rust defines `char` as an Unicode scalar value, while C defines `char` as an
integer type. Rust also defines explicitly-sized integer types (i.e., `i8`,
`i16`, `i32`, …), while C defines abstract integer types (i.e., `char`,
`short`, `long`, …), which actual sizes are implementation defined and may vary
across different data models. This causes ambiguity if Rust integer types are
used in `extern "C"` function types that represent C functions because the
Itanium C++ ABI specifies encodings for C integer types (e.g., `char`, `short`,
`long`, …), not their defined representations (e.g., 8-bit signed integer,
16-bit signed integer, 32-bit signed integer, …).
For example, the Rust compiler currently is unable to identify if an
```rust
extern "C" {
fn func(arg: i64);
}
```
Fig. 1. Example extern "C" function using Rust integer type.
represents a `void func(long arg)` or `void func(long long arg)` in an LP64 or
equivalent data model.
For cross-language LLVM CFI support, the Rust compiler must be able to identify
and correctly encode C types in `extern "C"` function types indirectly called
across the FFI boundary when CFI is enabled.
For convenience, Rust provides some C-like type aliases for use when
interoperating with foreign code written in C, and these C type aliases may be
used for disambiguation. However, at the time types are encoded, all type
aliases are already resolved to their respective `ty::Ty` type representations
(i.e., their respective Rust aliased types), making it currently impossible to
identify C type aliases use from their resolved types.
For example, the Rust compiler currently is also unable to identify that an
```rust
extern "C" {
fn func(arg: c_long);
}
```
Fig. 2. Example extern "C" function using C type alias.
used the `c_long` type alias and is not able to disambiguate between it and an
`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model.
Consequently, the Rust compiler is unable to identify and correctly encode C
types in `extern "C"` function types indirectly called across the FFI boundary
when CFI is enabled:
```c
#include <stdio.h>
#include <stdlib.h>
// This definition has the type id "_ZTSFvlE".
void
hello_from_c(long arg)
{
printf("Hello from C!\n");
}
// This definition has the type id "_ZTSFvPFvlElE"--this can be ignored for the
// purposes of this example.
void
indirect_call_from_c(void (*fn)(long), long arg)
{
// This call site tests whether the destination pointer is a member of the
// group derived from the same type id of the fn declaration, which has the
// type id "_ZTSFvlE".
//
// Notice that since the test is at the call site and is generated by Clang,
// the type id used in the test is encoded by Clang.
fn(arg);
}
```
Fig. 3. Example C library using C integer types and Clang encoding.
```rust
use std::ffi::c_long;
#[link(name = "foo")]
extern "C" {
// This declaration would have the type id "_ZTSFvlE", but at the time types
// are encoded, all type aliases are already resolved to their respective
// Rust aliased types, so this is encoded either as "_ZTSFvu3i32E" or
// "_ZTSFvu3i64E", depending to what type c_long type alias is resolved to,
// which currently uses the u<length><type-name> vendor extended type
// encoding for the Rust integer types--this is the problem demonstrated in
// this example.
fn hello_from_c(_: c_long);
// This declaration would have the type id "_ZTSFvPFvlElE", but is encoded
// either as "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E"
// (compressed), similarly to the hello_from_c declaration above--this can
// be ignored for the purposes of this example.
fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long);
}
// This definition would have the type id "_ZTSFvlE", but is encoded either as
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust(_: c_long) {
println!("Hello, world!");
}
// This definition would have the type id "_ZTSFvlE", but is encoded either as
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust_again(_: c_long) {
println!("Hello from Rust again!");
}
// This definition would also have the type id "_ZTSFvPFvlElE", but is encoded
// either as "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E"
// (compressed), similarly to the hello_from_c declaration above--this can be
// ignored for the purposes of this example.
fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
// This indirect call site tests whether the destination pointer is a member
// of the group derived from the same type id of the f declaration, which
// would have the type id "_ZTSFvlE", but is encoded either as
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c
// declaration above.
//
// Notice that since the test is at the call site and is generated by the
// Rust compiler, the type id used in the test is encoded by the Rust
// compiler.
unsafe { f(arg) }
}
// This definition has the type id "_ZTSFvvE"--this can be ignored for the
// purposes of this example.
fn main() {
// This demonstrates an indirect call within Rust-only code using the same
// encoding for hello_from_rust and the test at the indirect call site at
// indirect_call (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E").
indirect_call(hello_from_rust, 5);
// This demonstrates an indirect call across the FFI boundary with the Rust
// compiler and Clang using different encodings for hello_from_c and the
// test at the indirect call site at indirect_call (i.e., "_ZTSFvu3i32E" or
// "_ZTSFvu3i64E" vs "_ZTSFvlE").
//
// When using rustc LTO (i.e., -Clto), this works because the type id used
// is from the Rust-declared hello_from_c, which is encoded by the Rust
// compiler (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E").
//
// When using (proper) LTO (i.e., -Clinker-plugin-lto), this does not work
// because the type id used is from the C-defined hello_from_c, which is
// encoded by Clang (i.e., "_ZTSFvlE").
indirect_call(hello_from_c, 5);
// This demonstrates an indirect call to a function passed as a callback
// across the FFI boundary with the Rust compiler and Clang using different
// encodings for the hello_from_rust_again and the test at the indirect call
// site at indirect_call_from_c (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E" vs
// "_ZTSFvlE").
//
// When Rust functions are passed as callbacks across the FFI boundary to be
// called back from C code, the tests are also at the call site but
// generated by Clang instead, so the type ids used in the tests are encoded
// by Clang, which do not match the type ids of declarations encoded by the
// Rust compiler (e.g., hello_from_rust_again). (The same happens the other
// way around for C functions passed as callbacks across the FFI boundary to
// be called back from Rust code.)
unsafe {
indirect_call_from_c(hello_from_rust_again, 5);
}
}
```
Fig. 4. Example Rust program using Rust integer types and the Rust compiler
encoding.
Whenever there is an indirect call across the FFI boundary or an indirect call
to a function passed as a callback across the FFI boundary, the Rust compiler
and Clang use different encodings for C integer types for function definitions
and declarations, and at indirect call sites when CFI is enabled (see Figs.
3–4).
### The cfi\_types crate
To solve the encoding C integer types problem, this crate provides a new set of
C types as user-defined types using the `cfi_encoding` attribute and
`repr(transparent)` to be used for cross-language LLVM CFI support.
```rust
use cfi_types::c_long;
#[link(name = "foo")]
extern "C" {
// This declaration has the type id "_ZTSFvlE" because it uses the CFI types
// for cross-language LLVM CFI support. The cfi_types crate provides a new
// set of C types as user-defined types using the cfi_encoding attribute and
// repr(transparent) to be used for cross-language LLVM CFI support. This
// new set of C types allows the Rust compiler to identify and correctly
// encode C types in extern "C" function types indirectly called across the
// FFI boundary when CFI is enabled.
fn hello_from_c(_: c_long);
// This declaration has the type id "_ZTSFvPFvlElE" because it uses the CFI
// types for cross-language LLVM CFI support--this can be ignored for the
// purposes of this example.
fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long);
}
// This definition has the type id "_ZTSFvlE" because it uses the CFI types for
// cross-language LLVM CFI support, similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust(_: c_long) {
println!("Hello, world!");
}
// This definition has the type id "_ZTSFvlE" because it uses the CFI types for
// cross-language LLVM CFI support, similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust_again(_: c_long) {
println!("Hello from Rust again!");
}
// This definition also has the type id "_ZTSFvPFvlElE" because it uses the CFI
// types for cross-language LLVM CFI support, similarly to the hello_from_c
// declaration above--this can be ignored for the purposes of this example.
fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
// This indirect call site tests whether the destination pointer is a member
// of the group derived from the same type id of the f declaration, which
// has the type id "_ZTSFvlE" because it uses the CFI types for
// cross-language LLVM CFI support, similarly to the hello_from_c
// declaration above.
unsafe { f(arg) }
}
// This definition has the type id "_ZTSFvvE"--this can be ignored for the
// purposes of this example.
fn main() {
// This demonstrates an indirect call within Rust-only code using the same
// encoding for hello_from_rust and the test at the indirect call site at
// indirect_call (i.e., "_ZTSFvlE").
indirect_call(hello_from_rust, c_long(5));
// This demonstrates an indirect call across the FFI boundary with the Rust
// compiler and Clang using the same encoding for hello_from_c and the test
// at the indirect call site at indirect_call (i.e., "_ZTSFvlE").
indirect_call(hello_from_c, c_long(5));
// This demonstrates an indirect call to a function passed as a callback
// across the FFI boundary with the Rust compiler and Clang the same
// encoding for the hello_from_rust_again and the test at the indirect call
// site at indirect_call_from_c (i.e., "_ZTSFvlE").
unsafe {
indirect_call_from_c(hello_from_rust_again, c_long(5));
}
}
```
Fig. 5. Example Rust program using Rust integer types and the Rust compiler
encoding with the cfi\_types crate types.
This new set of C types allows the Rust compiler to identify and correctly
encode C types in `extern "C"` function types indirectly called across the FFI
boundary when CFI is enabled (see Fig 5).
Contributing
------------
See [CONTRIBUTING.md](CONTRIBUTING.md).
License
-------
Licensed under the Apache License, Version 2.0 or the MIT License. See
[LICENSE-APACHE](LICENSE-APACHE) or [LICENSE-MIT](LICENSE-MIT) for license text
and copyright information.