//! Heuristics for correcting instruction pointers based on the CPU architecture.
use crate;
const SIGILL: u32 = 4;
const SIGBUS: u32 = 10;
const SIGSEGV: u32 = 11;
/// Helper to work with instruction addresses.
///
/// Directly symbolicated stack traces may show the wrong calling symbols, as the stack frame's
/// return addresses point a few bytes past the original call site, which may place the address
/// within a different symbol entirely.
///
/// The most useful function is [`caller_address`], which applies some heuristics to determine the
/// call site of a function call based on the return address.
///
/// # Examples
///
/// ```
/// use symbolic_common::{Arch, InstructionInfo};
///
/// const SIGSEGV: u32 = 11;
///
/// let caller_address = InstructionInfo::new(Arch::Arm64, 0x1337)
/// .is_crashing_frame(false)
/// .signal(Some(SIGSEGV))
/// .ip_register_value(Some(0x4242))
/// .caller_address();
///
/// assert_eq!(caller_address, 0x1330);
/// ```
///
/// # Background
///
/// When *calling* a function, it is necessary for the *called* function to know where it should
/// return to upon completion. To support this, a *return address* is supplied as part of the
/// standard function call semantics. This return address specifies the instruction that the called
/// function should jump to upon completion of its execution.
///
/// When a crash reporter generates a backtrace, it first collects the thread state of all active
/// threads, including the **actual** current execution address. The reporter then iterates over
/// those threads, walking backwards to find calling frames – what it's actually finding during this
/// process are the **return addresses**. The actual address of the call instruction is not recorded
/// anywhere. The only address available is the address at which execution should resume after
/// function return.
///
/// To make things more complicated, there is no guarantee that a return address be set to exactly
/// one instruction after the call. It's entirely proper for a function to remove itself from the
/// call stack by setting a different return address entirely. This is why you never see
/// `objc_msgSend` in your backtrace unless you actually crash inside of `objc_msgSend`. When
/// `objc_msgSend` jumps to a method's implementation, it leaves its caller's return address in
/// place, and `objc_msgSend` itself disappears from the stack trace. In the case of `objc_msgSend`,
/// the loss of that information is of no great importance, but it's hardly the only function that
/// elides its own code from the return address.
///
/// # Heuristics
///
/// To resolve this particular issue, it is necessary for the symbolication implementor to apply a
/// per-architecture heuristics to the return addresses, and thus derive the **likely** address of
/// the actual calling instruction. There is a high probability of correctness, but absolutely no
/// guarantee.
///
/// This derived address **should** be used as the symbolication address, but **should not** replace
/// the return address in the crash report. This derived address is a best guess, and if you replace
/// the return address in the report, the end-user will have lost access to the original canonical
/// data from which they could have made their own assessment.
///
/// These heuristics must not be applied to frame #0 on any thread. The first frame of all threads
/// contains the actual register state of that thread at the time that it crashed (if it's the
/// crashing thread), or at the time it was suspended (if it is a non-crashing thread). These
/// heuristics should only be applied to frames *after* frame #0 – that is, starting with frame #1.
///
/// Additionally, these heuristics assume that your symbolication implementation correctly handles
/// addresses that occur within an instruction, rather than directly at the start of a valid
/// instruction. This should be the case for any reasonable implementation, but is something to be
/// aware of when deploying these changes.
///
/// ## x86 and x86-64
///
/// x86 uses variable-width instruction encodings; subtract one byte from the return address to
/// derive an address that should be within the calling instruction. This will provide an address
/// within a calling instruction found directly prior to the return address.
///
/// ## ARMv6 and ARMv7
///
/// - **Step 1:** Strip the low order thumb bit from the return address. ARM uses the low bit to
/// inform the processor that it should enter thumb mode when jumping to the return address. Since
/// all instructions are at least 2 byte aligned, an actual instruction address will never have
/// the low bit set.
///
/// - **Step 2:** Subtract 2 Bytes. 32-bit ARM instructions are either 2 or 4 bytes long, depending
/// on the use of thumb. This will place the symbolication address within the likely calling
/// instruction. All ARM64 instructions are 4 bytes long; subtract 4 bytes from the return address
/// to derive the likely address of the calling instruction.
///
/// # More Information
///
/// The above information was taken and slightly updated from the now-gone *PLCrashReporter Wiki*.
/// An old copy can still be found in the [internet archive].
///
/// [internet archive]: https://web.archive.org/web/20161012225323/https://opensource.plausible.coop/wiki/display/PLCR/Automated+Crash+Report+Analysis
/// [`caller_address`]: struct.InstructionInfo.html#method.caller_address