timing_shield/barriers.rs
1/// Identity function accepting a `u8` as input and outputting that same `u8` as
2/// output, while blocking the compiler from applying optimizations across this node in the data
3/// dependence graph.
4///
5/// This is an **internal utility** of rust-timing-shield that users of the framework will not
6/// normally need to use.
7///
8/// # Background
9///
10/// rust-timing-shield has essentially two goals: (1) prevent the programmer from writing code that
11/// they shouldn't, and (2) prevent the compiler from transforming a programmer's secure code
12/// into vulnerable code. This function is a small primitive that assists in accomplishing the 2nd
13/// goal.
14///
15/// An important rule of protecting code from timing attacks is to ensure that a program never
16/// branches on a secret value. In rust-timing-shield terminology, we should never emit a
17/// branch conditional on a `TpBool`. The Rust compiler uses many LLVM optimization passes that may
18/// produce a conditional branch where none was written by the programmer, so we must remove the
19/// compiler's ability to identify situations where a conditional branch could reasonably be
20/// introduced. The most problematic optimizations identified so far output [an LLVM IR `select`
21/// instruction](https://llvm.org/docs/LangRef.html#select-instruction).
22///
23/// Of course, it is legal for the compiler to emit `select` instructions in any situation as long
24/// as the output and side effects of the code remain the same. For example, it could expand this
25/// code:
26///
27/// ```
28/// fn add(a: u8, b: u8) -> u8 {
29/// a + b
30/// }
31/// ```
32///
33/// to:
34///
35/// ```
36/// fn add(a: u8, b: u8) -> u8 {
37/// if a == 5 && b == 5 {
38/// return 10;
39/// }
40/// if a == 7 && b == 5 {
41/// return 12;
42/// }
43/// a + b
44/// }
45/// ```
46///
47/// The current implementation of rust-timing-shield assumes that this is ridiculous and wouldn't
48/// happen. In particular, an assumption is made that the compiler will only insert a `select` for
49/// "values of boolean origin". For example, this branchless code:
50///
51/// ```
52/// fn add_bool(number: u8, b: bool) -> u8 {
53/// number + (b as u8)
54/// }
55/// ```
56///
57/// could reasonably, under this assumption, be rewritten with a branch:
58///
59/// ```
60/// fn add_bool(number: u8, b: bool) -> u8 {
61/// if b {
62/// return number + 1;
63/// }
64///
65/// number
66/// }
67/// ```
68///
69/// since `b as u8` is a "value of boolean origin". Based on the empirical tests of the Rust
70/// compiler and a review of how LLVM optimization passes are implemented, this assumption appears
71/// to be reasonable.
72///
73/// Now, with this assumption in place, rust-timing-shield only needs to ensure that all
74/// timing-protected values either are not of boolean origin or hide their origin from the
75/// compiler. This is accomplished in two steps:
76///
77/// 1. All timing-protected boolean values (`TpBool`) are stored as `u8` values and never as
78/// `bool`, even in intermediate computations.
79/// 2. `TpBool::protect(bool)` converts the `bool` to a `u8` and uses an optimization barrier (this
80/// function) to hide the origin of the `u8`.
81///
82/// # Usage
83///
84/// `optimization_barrier_u8` is a low-cost (but not zero-cost) way to hide a value's origin from the compiler. If a
85/// value passes through an optimization barrier, the compiler will be able to perform
86/// optimizations on computations prior to the barrier and optimizations on computations after the
87/// barrier, but will be unable to perform optimizations *across* the barrier.
88///
89/// It is important that the barrier interrupt the flow of a value from one part of the program to
90/// the next. As an example of incorrect usage:
91///
92/// ```
93/// # use timing_shield::barriers::optimization_barrier_u8;
94/// #
95/// fn add_bool(number: u8, secret_condition: bool) -> u8 {
96/// let secret_condition_u8 = secret_condition as u8;
97///
98/// // WRONG: barrier does not interrupt data flow
99/// optimization_barrier_u8(secret_condition_u8);
100///
101/// number + secret_condition_u8
102/// }
103/// ```
104///
105/// Here, the return value from `optimization_barrier_u8` is unused, so the optimization barrier
106/// has no effect. An optimization barrier must be the single connection between two parts of the
107/// program's data dependence graph:
108///
109/// ```
110/// # use timing_shield::barriers::optimization_barrier_u8;
111/// #
112/// fn add_bool(number: u8, secret_condition: bool) -> u8 {
113/// let secret_condition_u8 = secret_condition as u8;
114///
115/// // Override the previous definition to avoid accidentally using the pre-barrier value
116/// let secret_condition_u8 = optimization_barrier_u8(secret_condition_u8);
117///
118/// number + secret_condition_u8
119/// }
120/// ```
121///
122/// In rust-timing-shield, optimization barriers are used to hide when a `u8` is of boolean origin.
123/// The use of a barrier prevents the compiler from identifying that the `u8` value after the
124/// barrier is the same value that was cast from a `bool` before the barrier. This suppresses the
125/// many optimizations that would transform the timing-leak-proof branchless computations with `u8`
126/// values that rust-timing-shield produces into branching computations that leak boolean values.
127///
128/// # Performance considerations
129///
130/// `optimization_barrier_u8` is currently implemented as an empty inline assembly block. The `u8`
131/// input value is provided as an input/output register to the assembly block and then immediately
132/// returned as is. Although the assembly block is a no-op, LLVM is forced to assume that the value
133/// may have changed and can make no assumptions about what the output may be. The Rust Unstable
134/// reference [describes the assembly block as a black
135/// box](https://doc.rust-lang.org/unstable-book/library-features/asm.html):
136///
137/// > “The compiler cannot assume that the instructions in the asm are the ones that will
138/// actually end up executed. This effectively means that the compiler must treat the `asm!` as a
139/// black box and only take the interface specification into account, not the instructions
140/// themselves.”
141///
142/// Since no actual assembler instructions are provided, it might seem that this function call
143/// would have zero overhead after inlining. However, there are other considerations that may
144/// affect performance:
145///
146/// - The barrier will (obviously) prevent matching on code patterns that span across the barrier.
147/// This is intended.
148/// - The barrier will force the compiler to schedule an actual register to hold the value
149/// temporarily.
150/// - The barrier will force the value into a single register, which may impair the compiler's
151/// ability to perform optimizations such as auto-vectorization.
152/// - Constant folding cannot proceed past a barrier.
153///
154/// For these reasons, optimization barriers are only used where necessary to minimize any
155/// potential impact on performance, keeping rust-timing-shield zero-cost for as many applications
156/// as possible.
157#[inline(always)]
158pub fn optimization_barrier_u8(mut value: u8) -> u8 {
159 unsafe {
160 std::arch::asm!(
161 // Rust requires us to use every register defined, so we use it inside of a comment.
162 "/* optimization_barrier_u8 {unused} */",
163
164 // Define a single input/output register called "unused".
165 // The Rust compiler will perceive this as a mutation of `value`.
166 unused = inout(reg_byte) value,
167
168 // By guaranteeing more invariants we improve the compiler's ability to optimize.
169 // Since the assembly block is a no-op, we easily uphold all of these invariants.
170 options(pure, nomem, nostack, preserves_flags)
171 );
172 }
173
174 value
175}
176
177#[cfg(test)]
178mod tests {
179 use super::*;
180 use quickcheck::quickcheck;
181
182 quickcheck! {
183 fn optimization_barrier_is_identity(value: u8) -> bool {
184 optimization_barrier_u8(value) == value
185 }
186 }
187
188 // TODO I would like to add a test that checks the compiled LLVM IR for `select` instructions.
189 // This would help confirm that the optimization barrier is working correctly.
190 //
191 // According to my tests, the following function compiles to a single `select`:
192 //
193 // pub fn select(cond: bool, a: u32, b: u32) -> u32 {
194 // let mask = (cond as u32) - 1;
195 //
196 // (a & (!mask)) | (b & mask)
197 // }
198 //
199 // while this function is branchless in LLVM IR and x86_64:
200 //
201 // pub fn select(cond: bool, a: u32, b: u32) -> u32 {
202 // let mask = (optimization_barrier_u8(cond as u8) as u32) - 1;
203 //
204 // (a & (!mask)) | (b & mask)
205 // }
206 //
207 // Tested with `cargo llvm-ir` from the `cargo-asm` tool.
208}