cfi-types 0.0.8

CFI types for cross-language LLVM CFI support
Documentation
cfi\_types
==========

![Build Status](https://github.com/rcvalle/rust-crate-cfi-types/workflows/build/badge.svg)

CFI types for cross-language LLVM CFI support.


Installation
------------

To install the `cfi_types` crate:

1. On a command prompt or terminal with your package root's directory as the
   current working directory, run the following command:

       cargo add cfi-types

Or:

1. Add the `cfi_types` crate to your package root's `Cargo.toml` file and
   replace `X.Y.Z` by the version you want to install:

       [dependencies]
       cfi-types = "X.Y.Z"

2. On a command prompt or terminal with your package root's directory as the
   current working directory, run the following command:

       cargo fetch


Usage
-----

To use the `cfi_types` crate:

1. Import the CFI types from the `cfi_types` crate. E.g.:

       use cfi_types::c_long;

2. Replace uses of C type aliases by CFI types. E.g.:

       extern "C" {
           fn func(arg: c_long);
       }

       fn main() {
           unsafe { func(c_long(5)) };
       }


Background
----------

### Type metadata

LLVM uses [type metadata](https://llvm.org/docs/TypeMetadata.html) to allow IR
modules to aggregate pointers by their types. This type metadata is used by
LLVM CFI to test whether a given pointer is associated with a type identifier
(i.e., test type membership).

Clang uses the [Itanium C++
ABI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html)'s [virtual tables and
RTTI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-special-vtables)
`typeinfo` structure name as type metadata identifiers for function pointers.

For cross-language LLVM CFI support, a compatible encoding must be used. The
compatible encoding chosen for cross-language LLVM CFI support is the Itanium
C++ ABI mangling with vendor extended type qualifiers and types for Rust types
that are not used across the FFI boundary (see Type metadata in the [design
document](https://rcvalle.com/docs/rust-cfi-design-doc.pdf)).


### Encoding C integer types

Rust defines `char` as an Unicode scalar value, while C defines `char` as an
integer type. Rust also defines explicitly-sized integer types (i.e., `i8`,
`i16`, `i32`, …), while C defines abstract integer types (i.e., `char`,
`short`, `long`, …), which actual sizes are implementation defined and may vary
across different data models. This causes ambiguity if Rust integer types are
used in `extern "C"` function types that represent C functions because the
Itanium C++ ABI specifies encodings for C integer types (e.g., `char`, `short`,
`long`, …), not their defined representations (e.g., 8-bit signed integer,
16-bit signed integer, 32-bit signed integer, …).

For example, the Rust compiler currently is unable to identify if an

```rust
extern "C" {
    fn func(arg: i64);
}
```
Fig. 1. Example extern "C" function using Rust integer type.

represents a `void func(long arg)` or `void func(long long arg)` in an LP64 or
equivalent data model.

For cross-language LLVM CFI support, the Rust compiler must be able to identify
and correctly encode C types in `extern "C"` function types indirectly called
across the FFI boundary when CFI is enabled.

For convenience, Rust provides some C-like type aliases for use when
interoperating with foreign code written in C, and these C type aliases may be
used for disambiguation. However, at the time types are encoded, all type
aliases are already resolved to their respective `ty::Ty` type representations
(i.e., their respective Rust aliased types), making it currently impossible to
identify C type aliases use from their resolved types.

For example, the Rust compiler currently is also unable to identify that an

```rust
extern "C" {
    fn func(arg: c_long);
}
```
Fig. 2. Example extern "C" function using C type alias.

used the `c_long` type alias and is not able to disambiguate between it and an
`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model.

Consequently, the Rust compiler is unable to identify and correctly encode C
types in `extern "C"` function types indirectly called across the FFI boundary
when CFI is enabled:

```c
#include <stdio.h>
#include <stdlib.h>

// This definition has the type id "_ZTSFvlE".
void
hello_from_c(long arg)
{
    printf("Hello from C!\n");
}

// This definition has the type id "_ZTSFvPFvlElE"--this can be ignored for the
// purposes of this example.
void
indirect_call_from_c(void (*fn)(long), long arg)
{
    // This call site tests whether the destination pointer is a member of the
    // group derived from the same type id of the fn declaration, which has the
    // type id "_ZTSFvlE".
    //
    // Notice that since the test is at the call site and is generated by Clang,
    // the type id used in the test is encoded by Clang.
    fn(arg);
}
```
Fig. 3. Example C library using C integer types and Clang encoding.

```rust
use std::ffi::c_long;

#[link(name = "foo")]
extern "C" {
    // This declaration would have the type id "_ZTSFvlE", but at the time types
    // are encoded, all type aliases are already resolved to their respective
    // Rust aliased types, so this is encoded either as "_ZTSFvu3i32E" or
    // "_ZTSFvu3i64E", depending to what type c_long type alias is resolved to,
    // which currently uses the u<length><type-name> vendor extended type
    // encoding for the Rust integer types--this is the problem demonstrated in
    // this example.
    fn hello_from_c(_: c_long);

    // This declaration would have the type id "_ZTSFvPFvlElE", but is encoded
    // either as "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E"
    // (compressed), similarly to the hello_from_c declaration above--this can
    // be ignored for the purposes of this example.
    fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long);
}

// This definition would have the type id "_ZTSFvlE", but is encoded either as
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust(_: c_long) {
    println!("Hello, world!");
}

// This definition would have the type id "_ZTSFvlE", but is encoded either as
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust_again(_: c_long) {
    println!("Hello from Rust again!");
}

// This definition would also have the type id "_ZTSFvPFvlElE", but is encoded
// either as "_ZTSFvPFvu3i32ES_E" (compressed) or "_ZTSFvPFvu3i64ES_E"
// (compressed), similarly to the hello_from_c declaration above--this can be
// ignored for the purposes of this example.
fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
    // This indirect call site tests whether the destination pointer is a member
    // of the group derived from the same type id of the f declaration, which
    // would have the type id "_ZTSFvlE", but is encoded either as
    // "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c
    // declaration above.
    //
    // Notice that since the test is at the call site and is generated by the
    // Rust compiler, the type id used in the test is encoded by the Rust
    // compiler.
    unsafe { f(arg) }
}

// This definition has the type id "_ZTSFvvE"--this can be ignored for the
// purposes of this example.
fn main() {
    // This demonstrates an indirect call within Rust-only code using the same
    // encoding for hello_from_rust and the test at the indirect call site at
    // indirect_call (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E").
    indirect_call(hello_from_rust, 5);

    // This demonstrates an indirect call across the FFI boundary with the Rust
    // compiler and Clang using different encodings for hello_from_c and the
    // test at the indirect call site at indirect_call (i.e., "_ZTSFvu3i32E" or
    // "_ZTSFvu3i64E" vs "_ZTSFvlE").
    //
    // When using rustc LTO (i.e., -Clto), this works because the type id used
    // is from the Rust-declared hello_from_c, which is encoded by the Rust
    // compiler (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E").
    //
    // When using (proper) LTO (i.e., -Clinker-plugin-lto), this does not work
    // because the type id used is from the C-defined hello_from_c, which is
    // encoded by Clang (i.e., "_ZTSFvlE").
    indirect_call(hello_from_c, 5);

    // This demonstrates an indirect call to a function passed as a callback
    // across the FFI boundary with the Rust compiler and Clang using different
    // encodings for the hello_from_rust_again and the test at the indirect call
    // site at indirect_call_from_c (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E" vs
    // "_ZTSFvlE").
    //
    // When Rust functions are passed as callbacks across the FFI boundary to be
    // called back from C code, the tests are also at the call site but
    // generated by Clang instead, so the type ids used in the tests are encoded
    // by Clang, which do not match the type ids of declarations encoded by the
    // Rust compiler (e.g., hello_from_rust_again). (The same happens the other
    // way around for C functions passed as callbacks across the FFI boundary to
    // be called back from Rust code.)
    unsafe {
        indirect_call_from_c(hello_from_rust_again, 5);
    }
}
```
Fig. 4. Example Rust program using Rust integer types and the Rust compiler
encoding.

Whenever there is an indirect call across the FFI boundary or an indirect call
to a function passed as a callback across the FFI boundary, the Rust compiler
and Clang use different encodings for C integer types for function definitions
and declarations, and at indirect call sites when CFI is enabled (see Figs.
3–4).


### The cfi\_types crate

To solve the encoding C integer types problem, this crate provides a new set of
C types as user-defined types using the `cfi_encoding` attribute and
`repr(transparent)` to be used for cross-language LLVM CFI support.

```rust
use cfi_types::c_long;

#[link(name = "foo")]
extern "C" {
    // This declaration has the type id "_ZTSFvlE" because it uses the CFI types
    // for cross-language LLVM CFI support. The cfi_types crate provides a new
    // set of C types as user-defined types using the cfi_encoding attribute and
    // repr(transparent) to be used for cross-language LLVM CFI support. This
    // new set of C types allows the Rust compiler to identify and correctly
    // encode C types in extern "C" function types indirectly called across the
    // FFI boundary when CFI is enabled.
    fn hello_from_c(_: c_long);

    // This declaration has the type id "_ZTSFvPFvlElE" because it uses the CFI
    // types for cross-language LLVM CFI support--this can be ignored for the
    // purposes of this example.
    fn indirect_call_from_c(f: unsafe extern "C" fn(c_long), arg: c_long);
}

// This definition has the type id "_ZTSFvlE" because it uses the CFI types for
// cross-language LLVM CFI support, similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust(_: c_long) {
    println!("Hello, world!");
}

// This definition has the type id "_ZTSFvlE" because it uses the CFI types for
// cross-language LLVM CFI support, similarly to the hello_from_c declaration
// above.
unsafe extern "C" fn hello_from_rust_again(_: c_long) {
    println!("Hello from Rust again!");
}

// This definition also has the type id "_ZTSFvPFvlElE" because it uses the CFI
// types for cross-language LLVM CFI support, similarly to the hello_from_c
// declaration above--this can be ignored for the purposes of this example.
fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
    // This indirect call site tests whether the destination pointer is a member
    // of the group derived from the same type id of the f declaration, which
    // has the type id "_ZTSFvlE" because it uses the CFI types for
    // cross-language LLVM CFI support, similarly to the hello_from_c
    // declaration above.
    unsafe { f(arg) }
}

// This definition has the type id "_ZTSFvvE"--this can be ignored for the
// purposes of this example.
fn main() {
    // This demonstrates an indirect call within Rust-only code using the same
    // encoding for hello_from_rust and the test at the indirect call site at
    // indirect_call (i.e., "_ZTSFvlE").
    indirect_call(hello_from_rust, c_long(5));

    // This demonstrates an indirect call across the FFI boundary with the Rust
    // compiler and Clang using the same encoding for hello_from_c and the test
    // at the indirect call site at indirect_call (i.e., "_ZTSFvlE").
    indirect_call(hello_from_c, c_long(5));

    // This demonstrates an indirect call to a function passed as a callback
    // across the FFI boundary with the Rust compiler and Clang the same
    // encoding for the hello_from_rust_again and the test at the indirect call
    // site at indirect_call_from_c (i.e., "_ZTSFvlE").
    unsafe {
        indirect_call_from_c(hello_from_rust_again, c_long(5));
    }
}
```
Fig. 5. Example Rust program using Rust integer types and the Rust compiler
encoding with the cfi\_types crate types.

This new set of C types allows the Rust compiler to identify and correctly
encode C types in `extern "C"` function types indirectly called across the FFI
boundary when CFI is enabled (see Fig 5).


Contributing
------------

See [CONTRIBUTING.md](CONTRIBUTING.md).


License
-------

Licensed under the Apache License, Version 2.0 or the MIT License. See
[LICENSE-APACHE](LICENSE-APACHE) or [LICENSE-MIT](LICENSE-MIT) for license text
and copyright information.