cargo-compiler-interrupts 3.0.1

Cargo subcommands that integrate the Compiler Interrupts to the package
Documentation
# Documentation

## Objective

Since [Compiler Interrupts](https://pldi21.sigplan.org/details/pldi-2021-papers/82/Frequent-Background-Polling-on-a-Shared-Thread-using-Light-Weight-Compiler-Interrupt) is an LLVM pass, we want to extend `cargo` to support applying a third-party LLVM pass to the binary during the compilation process seamlessly. `cargo-compiler-interrupts` made specifically to integrate the Compiler Interrupts in just one command.

## Usage

`cargo-compiler-interrupts` provides three subcommands:

``` sh
cargo-build-ci
Compile and integrate the Compiler Interrupts to a local package

USAGE:
    cargo-build-ci [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -r, --release    Build artifacts in release mode
    -v, --verbose    Use verbose output (-vv very verbose output)
    -V, --version    Prints version information

OPTIONS:
    -t, --target <TRIPLE>    Build for the target triple
```

``` sh
cargo-run-ci
Run a Compiler Interrupts-integrated binary

USAGE:
    cargo-run-ci [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -r, --release    Run the binary in release mode
    -v, --verbose    Use verbose output (-vv very verbose output)
    -V, --version    Prints version information

OPTIONS:
    -b, --bin <BINARY_NAME>    Name of the binary
    -t, --target <TRIPLE>      Target triple for the binary
```

``` sh
cargo lib-ci
Manage the Compiler Interrupts library

USAGE:
    cargo-lib-ci [FLAGS] [OPTIONS] <--install|--uninstall>

FLAGS:
    -h, --help         Prints help information
    -i, --install      Install the library
    -u, --uninstall    Uninstall the library
    -v, --verbose      Use verbose output (-vv very verbose output)
    -V, --version      Prints version information

OPTIONS:
    -a, --args <args>...    Set default arguments for the library
    -p, --path <path>       Path to the library when installing
```

## Project structure

`cargo-compiler-interrupts` has the following project structure, some files are omitted for brevity.

``` sh
├── Cargo.toml
├── crates
│   └── cargo-util
└── src
    ├── args.rs
    ├── bin
    │   ├── build.rs
    │   ├── library.rs
    │   └── run.rs
    ├── cargo.rs
    ├── config.rs
    ├── error.rs
    ├── lib.rs
    ├── ops
    │   ├── build.rs
    │   ├── library.rs
    │   ├── mod.rs
    │   └── run.rs
    └── util.rs
```

## What are those files?

- Root directory
  - `Cargo.toml` is the manifest file that contains the configuration of the package.
  - `crates` contains local external dependencies. For now, it has only `cargo-util` which is extracted from `cargo`.
  - `src` contains the source code.
  
- `src` directory
  - `args.rs` — handles the CLI interface.
  - `cargo.rs` — `cargo` commands.
  - `config.rs` — handles library configuration.
  - `error.rs` — defines errors.
  - `util.rs` — helper functions.
  - `bin` directory — entry function of subcommands.
  - `ops` directory — main routine of subcommands.

## How does it work?

1. `cargo build-ci` will invoke `cargo build` with `RUSTC_LOG=rustc_codegen_ssa::back::link=info` to output internal linker invocations. It also adds a bunch of extra flags to all `rustc` invocations. Extra flags are:
    - `--emit=llvm-ir` — emit LLVM IR bitcode in the LLVM assembly language format.
    - `-C save-temps=y` — all temporary output files during the compilation.
    - `-C relocation-model=static` — do not generate PIC/PIE.
    - `-C passes=...` — LLVM optimization passes for optimizing CI overhead.
2. After `cargo build` completed, we should have these:
    - Output from `cargo build` contains internal linker commands that are generated by `rustc` for every library and binary.
    - Object `*.o` files and IR bitcode in the LLVM assembly language `*.ll` files in the `$CARGO_TARGET_DIR/<build_mode>/deps` directory. Moreover, each file should have a corresponding intermediate version that contains `rcgu` (rust codegen unit) in their name.
    - Rust static library with extra metadata `*.rlib` files. These files are generated if the project has extra modules and dependencies.
3. Run `opt` on all intermediate IR bitcode `*.ll` files to integrate the Compiler Interrupts. All CI-integrated files have the suffix `_ci` in their name.
4. Run `llc` to convert CI-integrated IR bitcode `*.ll` files to object `*.o` files.
5. Parse the output from `cargo build` to get the linker command for the binary. The linker command consists of a variety of arguments relating to the output file, linking rust-std/system libraries, and specifying `*.rlib` dependencies for the binary.
6. Find the allocator shim, which is a special intermediate object file that contains the symbols for the Rust memory allocator. `rustc` automatically generates the allocator shim behind the scene.
7. Replace the object file in the `*.rlib` with the CI-integrated one.
8. Execute the linker command again to output the final CI-integrated binary.

## Limitations

- `cargo build` outputs the artifacts for us to replace the object files with the CI-integrated one, then we invoke the linker one more time to output the new CI-integrated binary. Therefore, we have to compile the binary twice.
- Assuming the Compiler Interrupts does not depend on built-in `opt` optimizations, we can make some changes to `rustc` so that it can load and register a third-party LLVM pass during the compilation, hence eliminating the `opt` stage and linking after that, making the process done in one go. As a matter of fact, `clang` supports loading and registering a third-party LLVM pass by running `clang -Xclang -load -Xclang mypass.so`, albeit the usage is more complicated than `opt` and does not support built-in passes from `opt`. Currently, there is a [request](https://github.com/rust-lang/compiler-team/issues/419) to the Rust compiler team to enable this functionality.
- Since we have to depend on the build output, `cargo-compiler-interrupts` might not be robust against major changes.
- Compiler Interrupts integration is not fast on huge IR bitcode from crates such as `clap`, `derive`, `proc`, `regex`, `serde`, `syn`, `toml`,... We roughly estimate the integration process takes about an hour for 500,000 lines of IR bitcode on an x86-64 quad-core machine.