Trait yaxpeax_core::analyses::DFG [−][src]
pub trait DFG<V: Value, A: Arch + ValueLocations, When = <A as Arch>::Address> where
When: Copy, {
type Indirect: IndirectQuery<V>;
fn read_loc(&self, when: When, loc: A::Location) -> V;
fn write_loc(&mut self, when: When, loc: A::Location, value: V);
fn indirect_loc(&self, _when: When, _loc: A::Location) -> Self::Indirect;
fn read<T: ToDFGLoc<A::Location>>(&self, when: When, loc: &T) -> V { ... }
fn write<T: ToDFGLoc<A::Location>>(&mut self, when: When, loc: &T, value: V) { ... }
fn indirect<T: ToDFGLoc<A::Location>>(
&self,
when: When,
loc: &T
) -> Self::Indirect { ... }
fn query_at(
&self,
when: When
) -> DFGLocationQueryCursor<'_, When, V, A, Self> { ... }
fn query_at_mut(
&mut self,
when: When
) -> DFGLocationQueryCursorMut<'_, When, V, A, Self> { ... }
}
Expand description
interface to query a data flow graph (dfg). this interface is …. in flux.
TODOs in order of “how hard i think they are”:
- it should be possible to look up a def site for a value
- it should be possible to iterate the use sites of a value
- perhaps it should be possible to insert new values to the dfg? optionally? this approaches supporting general patching
- it should be possible to detach and move values
conceptually, these graphs have vertices at places where values are read or written, edges
from uses to some write, and a value associated with the write describing what subsequent reads
will see. these graphs describe the relation between values in a machine with
architecture-defined locations for values to exist. in many cases these graphs are operated on
in a manner consistent with the most atomic changes for a given architcture - typically an
instruction’s execution. in an ideal world, this means DFG
would have vertices at a pair
(A::Address, A::Instruction, A::Location)
; “at a given address in memory, with a
corresponding instruction, the value at a specific architectural location is ___”.
why is using an (Address, Location)
pair, like (0x1234, rdi)
not sufficient to uniquely
identify a location? because, dear reader, data at an address is not constant. if you decode
data at address 0x1234
, is that before or after relocations are applied? if that address is
known to be modified after loading, is the instruction there before or after the modification?
different answers to this temporal question mean the architectural locations referenced by the
corresponding instruction can be totally different!
so, really, a DFG
describes the architectural state of a program at every discrete point of
change for any point in the program. an eventual TODO is to key on (Address, Generation)
where a “Generation” describes some series of memory edits. this is approximately supported in
SSA-based DFG construction, where Memory
is a single architectural location that can be
versioned - perhaps “the program” may be inferred to a distinct memory region from
unknown-destination memorry accesses by default? in a way, a DFG
might be self-describing if
at some location (0x1234, Gen1)
the instruction modifies code memory by writing (0x1236, Gen2)
, where finding bytes to decode the next instruction would have to be a DFG query? this
suggests that in the most precise case, a DFG might be backed by a MemoryRepr
with a series
of edits for each generation layered on top? it’s not clear how this might interact with
disjoint memory regions that are versioned independently.