Function dr_register_bb_event

Source

pub unsafe extern "C" fn dr_register_bb_event(
    func: Option<unsafe extern "C" fn(drcontext: *mut c_void, tag: *mut c_void, bb: *mut instrlist_t, for_trace: bool_, translating: bool_) -> dr_emit_flags_t>,
)

Expand description

Registers a callback function for the basic block event. DR calls \p func before inserting a new basic block into the code cache. When adding a basic block to a new trace, DR calls \p func again with \p for_trace set to true, giving the client the opportunity to keep its same instrumentation in the trace, or to change it. The original basic block’s instrumentation is unchanged by whatever action is taken in the \p for_trace call.

DR constructs dynamic basic blocks, which are distinct from a compiler’s classic basic blocks. DR does not know all entry points ahead of time, and will end up duplicating the tail of a basic block if a later entry point is discovered that targets the middle of a block created earlier, or if a later entry point targets straight-line code that falls through into code already present in a block.

DR may call \p func again if it needs to translate from code cache addresses back to application addresses, which happens on faulting instructions as well as in certain situations involving suspended threads or forcibly relocated threads. The \p translating parameter distinguishes the two types of calls and is further explained below.

\p drcontext is a pointer to the input program’s machine context. Clients should not inspect or modify the context; it is provided as an opaque pointer (i.e., void *) to be passed to API routines that require access to this internal data. drcontext is specific to the current thread, but in normal configurations the basic block being created is thread-shared: thus, when allocating data structures with the same lifetime as the basic block, usually global heap (#dr_global_alloc()) is a better choice than heap tied to the thread that happened to first create the basic block (#dr_thread_alloc()). Thread-private heap is fine for temporary structures such as instr_t and instrlist_t.
\p tag is a unique identifier for the basic block fragment. Use dr_fragment_app_pc() to translate it to an application address.
\p bb is a pointer to the list of instructions that comprise the basic block. Clients can examine, manipulate, or completely replace the instructions in the list.
\p translating indicates whether this callback is for basic block creation (false) or is for address translation (true). This is further explained below.

The callback function should return a #dr_emit_flags_t flag.

The user is free to inspect and modify the block before it executes, but must adhere to the following restrictions:

If there is more than one application branch, only the last can be conditional.
An application conditional branch must be the final instruction in the block.
An application direct call must be the final instruction in the block unless it is inserted by DR for elision and the subsequent instructions are the callee.
There can only be one indirect branch (call, jump, or return) in a basic block, and it must be the final instruction in the block.
There can only be one far branch (call, jump, or return) in a basic block, and it must be the final instruction in the block.
The AArch64 instruction ISB must be the final instruction in the block.
The exit control-flow of a block ending in a system call or int instruction cannot be changed, nor can instructions be inserted after the system call or int instruction itself, unless the system call or int instruction is removed entirely.
The number of an interrupt cannot be changed. (Note that the parameter to a system call, normally kept in the eax register, can be freely changed in a basic block: but not in a trace.)
A system call or interrupt instruction can only be added if it satisfies the above constraints: i.e., if it is the final instruction in the block and the only system call or interrupt.
Any AArch64 #OP_isb instruction must be the last instruction in its block.
All IT blocks must be legal. For example, application instructions inside an IT block cannot be removed or added to without also updating the OP_it instruction itself. Clients can use the combination of dr_remove_it_instrs() and dr_insert_it_instrs() to more easily manage IT blocks while maintaining the simplicity of examining individual instructions in isolation.
The block’s application source code (as indicated by the translation targets, set by #instr_set_translation()) must remain within the original bounds of the block (the one exception to this is that a jump can translate to its target). Otherwise, DR’s cache consistency algorithms cannot guarantee to properly invalidate the block if the source application code is modified. To send control to other application code regions, truncate the block and use a direct jump to target the desired address, which will then materialize in the subsequent block, rather than embedding the desired instructions in this block.
There is a limit on the size of a basic block in the code cache. DR performs its own modifications, especially on memory writes for cache consistency of self-modifying (or false sharing) code regions. If an assert fires in debug build indicating a limit was reached, either truncate blocks or use the -max_bb_instrs runtime option to ask DR to make them smaller.

To support transparent fault handling, DR must translate a fault in the code cache into a fault at the corresponding application address. DR must also be able to translate when a suspended thread is examined by the application or by DR itself for internal synchronization purposes. If the client is only adding observational instrumentation (i.e., meta instructions: see #instr_set_meta()) (which should not fault) and is not modifying, reordering, or removing application instructions, these details can be ignored. In that case the client should return #DR_EMIT_DEFAULT and set up its basic block callback to be deterministic and idempotent. If the client is performing modifications, then in order for DR to properly translate a code cache address the client must use #instr_set_translation() in the basic block creation callback to set the corresponding application address (the address that should be presented to the application as the faulting address, or the address that should be restarted after a suspend) for each modified instruction and each added application instruction (see #instr_set_app()).

There are two methods for using the translated addresses:

-# Return #DR_EMIT_STORE_TRANSLATIONS from the basic block creation callback. DR will then store the translation addresses and use the stored information on a fault. The basic block callback for \p tag will not be called with \p translating set to true. Note that unless #DR_EMIT_STORE_TRANSLATIONS is also returned for \p for_trace calls (or #DR_EMIT_STORE_TRANSLATIONS is returned in the trace callback), each constituent block comprising the trace will need to be re-created with both \p for_trace and \p translating set to true. Storing translations uses additional memory that can be significant: up to 20% in some cases, as it prevents DR from using its simple data structures and forces it to fall back to its complex, corner-case design. This is why DR does not store all translations by default. -# Return #DR_EMIT_DEFAULT from the basic block creation callback. DR will then call the callback again during fault translation with \p translating set to true. All modifications to \p bb that were performed on the creation callback must be repeated on the translating callback. This option is only possible when basic block modifications are deterministic and idempotent, but it saves memory. Naturally, global state changes triggered by block creation should be wrapped in checks for \p translating being false. Even in this case, #instr_set_translation() should be called for application instructions even when \p translating is false, as DR may decide to store the translations at creation time for reasons of its own.

Furthermore, if the client’s modifications change any part of the machine state besides the program counter, the client should use #dr_register_restore_state_event() or #dr_register_restore_state_ex_event() to restore the registers and application memory to their original application values.

For meta instructions that do not reference application memory (i.e., they should not fault), leave the translation field as NULL. A NULL value instructs DR to use the subsequent application instruction’s translation as the application address, and to fail when translating the full state. Since the full state will only be needed when relocating a thread (as stated, there will not be a fault here), failure indicates that this is not a valid relocation point, and DR’s thread synchronization scheme will use another spot. If the translation field is set to a non-NULL value, the client should be willing to also restore the rest of the machine state at that point (restore spilled registers, etc.) via #dr_register_restore_state_event() or #dr_register_restore_state_ex_event(). This is necessary for meta instructions that reference application memory. DR takes care of such potentially-faulting instructions added by its own API routines (#dr_insert_clean_call() arguments that reference application data, #dr_insert_mbr_instrumentation()’s read of application indirect branch data, etc.)

\note In order to present a more straightforward code stream to clients, this release of DR disables several internal optimizations. As a result, some applications may see a performance degradation. Applications making heavy use of system calls are the most likely to be affected. Future releases may allow clients some control over performance versus visibility. The \ref op_speed “-opt_speed” option can regain some of this performance at the cost of more complex basic blocks that cross control transfers.

\note If multiple clients are present, the instruction list for a basic block passed to earlier-registered clients will contain the instrumentation and modifications put in place by later-registered clients.

\note Basic blocks can be deleted due to hitting capacity limits or cache consistency events (when the source application code of a basic block is modified). In that case, the client will see a new basic block callback if the block is then executed again after deletion. The deletion event (#dr_register_delete_event()) will be raised at deletion time.

\note If the -thread_private runtime option is specified, clients should expect to see duplicate tags for separate threads, albeit with different dcrcontext values. Additionally, DR employs a cache-sizing algorithm for thread private operation that proactively deletes fragments. Even with thread-shared caches enabled, however, certain situations cause DR to emit thread-private basic blocks (e.g., self-modifying code). In this case, clients should be prepared to see duplicate tags without an intermediate deletion.

\note A client can change the control flow of the application by changing the control transfer instruction at end of the basic block. If a basic block is ended with a non-control transfer instruction, an application jump instruction can be inserted. If a basic block is ended with a conditional branch, \p instrlist_set_fall_through_target can be used to change the fall-through target. If a basic block is ended with a call instruction, \p instrlist_set_return_target can be used to change the return target of the call.

dr_register_bb_event

Function dr_register_bb_event Copy item path

Function dr_register_bb_event