Crate pai

source ·
Expand description

§Process Analyzer and Instrumenter

A tool to analyze and instrument running processes. Currently, only Linux is supported and the only tracing backend supported is currently ptrace.

§API

Main interface for for controlling the tracee is the Context objects.

§Context

This is the interface a script has to control the tracee. It has three layers:

  1. ctx::Main
  • The first context created, there will only be one of these created during the session.
  • This is where you spawn or connect to you target and the entrypoint for everything.
  1. ctx::Secondary
  1. Client

§Design of scripts

The scripts are generally designed to be event-driven. So you register the different events you are interested and provide callbacks for those events. The callback can then also register new events to monitor. This can be a bit cumbersome in the beginning, so there is also a mechanism to run until some event has happened and continue with the script as usual from there on. See the example scripts for more details.

§Error handling

  • User script errors

    • Any time you call a function to interact with the tracee, that can generally cause an error. Say for instance, you tried to read from an invalid memory address.
    • These errors should be packaged up and sent to the calling thread. It is then up to the user script to handle the error appropriately.
    • Most errors returned from crate::ctx::Main, crate::ctx::Secondary and crate::Client falls into this category.
  • Unexpected non-fatal errors

    • We detected some error when calling some code, but we can recover from it.
    • This generally happens because of bug in this crate, but we try and be a bit resillient in handling it.
    • It will be reported by sending a crate::api::Response::Error as response to the client in the request which generated the error.
    • The caller may use this error to work around the issue, but it should be reported as a bug.
  • Unexpected results

    • There are some cases, where we expect certain errors to happen and it’s not necessarily a bug in this crate.
    • One example of this is if we think a syscall argument is a pointer, so we try and read from it, but it’s actually an int. This will cause an error.
    • This will be logged, but no other action is taken.
    • If you receive unexpected results, the log may hold information as to why.
  • Fatal error

    • Error is generated and propagated to the main thread.
    • This is usually one of two cases:
        1. The target dies
        1. There’s a bug in this code.

§Features

  • Syscall tracing – to get details about each syscall argument, include syscalls feature
  • Manage breakpoints
  • Single stepping
  • Call function / system call in target context
  • Resolve symbols in ELF-files
  • Read/write memory to process
  • Allocate memory in process
  • Multiple clients can trace a process, unaware of eachother

§Examples

All the examples listed here and more can be found in the examples/ folder.

minimal.rs

Below is a minimal example spawning a program and tracing it. Since no handlers are registered, it doesn’t do anything useful.

This is the example minimal.rs

use pai::ctx;
fn main() -> anyhow::Result<()> {
	// We need something to run, so just spawn a program
	let cmd = std::process::Command::new("true");

	// To start, one would typically use ctx::Main::new_{spawn|attach|main]
	let ctx: ctx::Main<(), pai::Error> = ctx::Main::new_spawn(cmd, ())?;

	// Here we would typically register callback(s)

	// Run until program finishes or we are detached.
	ctx.loop_until_exit()?;
	Ok(())
}

strace.rs

A slightly more complicated example is the strace-like program below. Enable feature syscalls to run it, like: cargo run --features=syscalls --example strace

This is the example strace.rs

A more feature-complete strace program can be found in pai-strace.

use pai::{api::messages::CbAction, ctx};
fn main() -> anyhow::Result<()> {
	let cmd = std::process::Command::new("true");
	let mut ctx: ctx::Main<(), pai::Error> = ctx::Main::new_spawn(cmd, ())?;

	// Register callback to be executed on every system call
	#[cfg(feature = "syscalls")]
	ctx.secondary_mut()
		.set_generic_syscall_handler_exit(|_cl, sys| {
			println!("{sys}");
			Ok(CbAction::None)
		});
	#[cfg(not(feature = "syscalls"))]
	println!(
		"program will do noting without 'syscalls' \
		feature enabled, run: cargo run --features=syscalls --example strace"
	);

	ctx.loop_until_exit()?;
	Ok(())
}

state.rs

The second argument passed in ctx::Main::new_spawn is a state which the caller can access on each callback. The following example is very similar to the previous one, but it counts the number of system calls instead.

This is the example state.rs

use pai::{api::messages::CbAction, ctx};
fn main() -> anyhow::Result<()> {
	let cmd = std::process::Command::new("true");
	let mut ctx: ctx::Main<usize, pai::Error> = ctx::Main::new_spawn(cmd, 0_usize)?;
	let sec = ctx.secondary_mut();

	#[cfg(feature = "syscalls")]
	sec.set_generic_syscall_handler_entry(|cl, sys| {
		assert!(sys.is_entry());
		*(cl.data_mut()) += 1;
		Ok(CbAction::None)
	});
	#[cfg(not(feature = "syscalls"))]
	println!(
		"program will do noting without 'syscalls' \
		feature enabled, run: cargo run --features=syscalls --example state"
	);

	let (_, count) = ctx.loop_until_exit()?;
	println!("hit {count} syscalls");
	Ok(())
}

breakpoint.rs

This shows an example of inserting a breakpoint.

This is the example breakpoint.rs

use pai::{api::messages::BpRet, ctx};
fn main() -> anyhow::Result<()> {
	env_logger::init();
	let cmd = std::process::Command::new("true");
	let mut ctx: ctx::Main<usize, pai::Error> = ctx::Main::new_spawn(cmd, 0_usize)?;

	// Get a handle to secondary context, for more concise code.
	let sec = ctx.secondary_mut();

	// Some commands require a target thread id to interact with, the target has
	// stopped, so just get the first thread id which is stopped.
	let tid = sec.get_first_stopped()?;

	// Program has not executed any code yet, so resolve ELF entry and run until
	// we hit it.
	let entry = sec.resolve_entry()?;

	// Register callback to be executed on entry point.
	sec.register_breakpoint_handler(tid, entry, |cl, tid, _addr| {
		*(cl.data_mut()) += 1; // So we can check afterwards

		// With libraries loaded, we can resolve `getpid` and call it
		if let Some(getpid) = cl.lookup_symbol_in_any("getpid")? {
			log::info!("getpid {getpid:?}");
			let v = cl.call_func(tid, getpid.value, &[]).unwrap();
			log::info!("getpid -> {v}");

			// The thread id we get when hitting the BP should be the same as
			// when injection function call to `getpid`
			assert!(v == tid.into());
		}
		Ok(BpRet::Keep) // keep breakpoint, is never hit again
	})?;

	let (_, res) = ctx.loop_until_exit()?;
	assert_eq!(res, 1); // Check that we've hit our breakpoint
	Ok(())
}

breakpoint-noevent.rs

This shows an example of inserting breakpoint without using the event-driven method.

This is the example breakpoint-noevent.rs

use pai::{api::Response, ctx};
fn main() -> anyhow::Result<()> {
	env_logger::init();
	let cmd = std::process::Command::new("true");
	let mut ctx: ctx::Main<usize, pai::Error> = ctx::Main::new_spawn(cmd, 0_usize)?;
	let sec = ctx.secondary_mut();

	// Run until we've hit entry
	let entry = sec.resolve_entry()?;
	let stopped = sec.run_until_entry()?;

	// Verify that entry was hit, this is just to check against bugs in pai
	assert_eq!(stopped.expect("didn't hit breakpoint"), entry);

	// Now we can resolve functions in libraries loaded
	let tid = sec.get_first_stopped()?;
	if let Some(getpid) = sec.lookup_symbol_in_any("getpid")? {
		log::info!("getpid {getpid:?}");
		let v = sec.call_func(tid, getpid.value, &[])?;
		assert!(v == tid.into());
	}
	let (r, _res) = ctx.loop_until_exit()?;
	assert_eq!(r, Response::TargetExit);
	Ok(())
}

§Internal stuff

§Benchmarking and speed

Speed is not the main goal in the development of this crate, it is however still recognized as an important attribute of tracing. There are some key benchmark tests to evaluate speed over time:

  • bench_baseline_true
    • Execute the true to get a baseline for how long it takes to execute
  • bench_trace_inner / bench_trace_outer
    • Execute program under tracing, but don’t do anything
    • Tracing directly at the ptrace-level and at the Context level
    • This is used to measure the overhead of tracing and Context-level code
  • bench_baseline_strace
    • Execute command under strace
    • Gives us something to compare against
  • bench_trace_strace_raw / bench_trace_strace_basic / bench_trace_strace_full
    • Trace syscalls with various levels of details read about each call
    • If you run these tests, you will likely see a spike in time for bench_trace_strace_full
      • If you’re tracing something time-critical, this is something to be aware of.

Re-exports§

Modules§

  • Different structures, enums and functions serving as the available API
  • Architecture-specific code
  • Context object(s) the client should hold to control the traced process.
  • Various code related to the specific target traced.
  • Various utility functions not fitting in anywhere else.

Structs§

Enums§

  • All the different error-values the crate can generate

Type Aliases§