In the last post we covered Wasmjit’s mitigations for Spectre Variant 1, also known as Branch Check Bypass, or BCB. In this post we’ll cover Wasmjit’s mitigations for Spectre Variant 2, also known as Branch Target Injection, or BTI.

Description of the Vulnerability

Like BCB, the primary danger of BTI is unintended information leakage in the CPU’s cache after speculative execution through an incorrect branch prediction. Specifically, BTI is a vulnerability in how the branch predictor handles indirect branches on certain CPUs. An indirect branch is a branch whose destination is dynamically loaded from memory, for example:

With detailed knowledge of the internals of a CPU’s branch predictor, an attacker is able to exploit weaknesses in the branch predictor to control the destination of an indirect branch during speculative execution. Once the attacker is able to branch to arbitrary locations during mis-speculation, they can use the observed effects on the cache to infer the value of data that would otherwise be inaccessible.

There are multiple ways for an attacker to manipulate the CPU’s indirect branch predictor. One particularly insidious way allows an attacker to influence the indirect branch predictor from another thread. As in the last post, I’ll direct you to Google Project Zero’s post for more details.

Retpoline Mitigation for BTI

Different CPUs may use different methods for predicting indirect branches and, in turn, may require different BTI mitigations. For x86_64 CPUs, the de facto mitigation technique was invented by engineers at Google and is called “retpoline”. The retpoline technique is also recommended by Intel.

In many modern x86_64 CPUs, the method for predicting the destination of the ret instruction is different than the method used to predict the destination of the jmp or call instructions, which are vulnerable to BTI. Taking that into account, retpoline works by using the ret instruction to perform indirect branches instead of the usual jmp or call instructions. To show how that works, first consider an indirect jmp:

Now consider the equivalent retpoline sequence:

 call go
loop: pause lfence jmp loop
go: mov %rax, (%rsp) ret

Upon execution of the call instruction, the CPU pushes the return address, i.e. the address of loop:, to the top of stack at (%rsp) and branches to go:. At go:, retpoline overwrites the top of the stack with the desired destination address in %rax. ret then pops the address from the top of the stack and branches there. If the CPU doesn’t know the value of the destination address upon execution of ret, then, due to the way ret speculation works, it will speculatively execute starting at loop: and loop endlessly until the destination address is resolved.

The corresponding retpoline sequence for call *%rax works on a similar principle.

Wasmjit Mitigations

Every indirect jump in Wasmjit is vulnerable to BTI. Fortunately, BTI can be automatically mitigated by the compiler. All major compilers provide automatic mitigation.

Since Wasmjit is a JIT, however, it must also make sure to not emit vulnerable indirect branches at runtime. There are 2 WebAssembly instructions that require an indirect branch: br_table and call_indirect. Additionally Wasmjit emits indirect jumps in a few other places for convenience of implementation. It was enough to change the instruction sequences emitted in the affected areas.

Final Words

The burden incurred on general software developers by BTI is relatively low compared to BCB. It really only affects projects written in assembly code, or projects that emit assembly code, like like compilers and JITs. The impact on Wasmjit is minimal but persistent. BTI will need to be considered each time a new JIT backend is added. For now, Wasmjit only supports x86_64, but may need to address BTI again when AArch64 support is added.