agentpprof 0.1.0

pprof-style semantic profiler for local AI coding-agent sessions
agentpprof-0.1.0 is not a library.

agentpprof

agentpprof is a Rust CLI for pprof-style semantic profiles over local AI coding-agent history. It reads local Codex and Claude JSONL sessions through AgentSight's agent-session crate, asks a llama.cpp-compatible server for one lowercase word per session, prompt, and LLM call, then writes reusable JSON, folded stacks, SVG flamegraphs, and a local dashboard.

The profiles are semantic profiles, not CPU profiles. Width can represent tool events or token counts, depending on the projection.

Install

cargo install agentpprof

From this repository during development:

cargo run --manifest-path agentpprof/Cargo.toml -- run \
  --project-root . \
  --out .agentsight/agentpprof/latest

Run

Start a local llama.cpp server with a real GGUF model:

llama-server -m /path/to/model.gguf --port 8080

Generate a report:

agentpprof run --project-root /path/to/repo

Pass repeated --session-file /path/to/session.jsonl values to analyze a specific set of local sessions instead of scanning the newest files under the Codex and Claude roots.

The default llama.cpp API endpoint is http://127.0.0.1:8080. Override it with:

agentpprof run \
  --llama-url http://127.0.0.1:8080 \
  --model local

agentpprof has no heuristic label path. If the LLM server is missing, or if the model does not return one valid lowercase word after retry, the run fails. The default scope is session + prompt for system-effect views, plus per-LLM-call tags for token views. For a faster exploratory run, pass --tag-llm-calls false; the default is true.

Outputs

Default output directory:

.agentsight/agentpprof/latest/

Important files:

  • agentpprof.json: redacted machine-readable analysis for AgentSight or other tools.
  • tags.json: reusable local tag cache containing one-word tags, hashes, and LLM provenance, not raw prompt text.
  • index.html: dashboard with tag bars, command/effect bars, timeline, semantic flamegraphs, dimension projections, and mixed baseline buckets.
  • *.svg: standalone charts.
  • semantic-system.folded.txt: prompt/session-tagged system footprint stacks.
  • semantic-token.folded.txt: prompt/session/LLM-tagged token stacks.
  • session-system.folded.txt, prompt-system.folded.txt, session-token.folded.txt, prompt-token.folded.txt, llm-token.folded.txt: dimension projections.

Folded Stack Shape

System-effect stacks use:

project:<repo>;agent:<agent>;session:<sessionTag>;prompt:<promptTag>;call:tool/<kind>;process:<p0>;process:<p1>;effect:<effect>;path:<group>;status:<status>

Token stacks use:

project:<repo>;agent:<agent>;session:<sessionTag>;prompt:<promptTag>;call:llm/<llmCallTag>;model:<model>;kind:<tokenKind>

The process:* segment can repeat. Offline session-history mode derives the visible process entrypoint from shell commands, including simple shell wrappers such as bash -lc. Exact child-process nesting is supplied by AgentSight runtime trace data when the report is correlated with a captured snapshot.

JSON Contract

agentpprof.json uses stable top-level sections:

  • project: project name and root.
  • inputs: session roots and scan limits.
  • llm_tagger: LLM request/cache/failure stats.
  • sessions: per-session counts and redacted prompt tag rows.
  • summary: stack totals, top prompt tags, command summaries, timeline, and baseline-mixing examples.
  • prompt_tags: prompt hash to tag mapping.
  • artifacts: relative paths to folded stacks and dashboard files.

This contract is meant to be consumed by AgentSight Web without re-reading raw agent history.

Benchmark Models

Benchmark real local models by letting agentpprof start one llama.cpp server per model:

cargo run --manifest-path agentpprof/Cargo.toml -- bench \
  --llama-server /path/to/llama-server \
  --runs 2 \
  --out .agentsight/agentpprof/model-benchmarks.json \
  --model 3b=/path/to/model-3b.gguf \
  --model 1b=/path/to/model-1b.gguf \
  --model 0.6b=/path/to/model-0.6b.gguf

Use repeated --server-arg values for model-specific llama.cpp options, for example --server-arg=--reasoning --server-arg=off for no-thinking tag runs.

The benchmark writes latency, success count, and invalid-output errors for each real model. It does not synthesize model responses.

Python Prototype

The earlier Python pprof exporter now lives under docs/visexp/agentpprof-python/. It is kept as research material and is not the default user entrypoint.

Development Test

cargo test --manifest-path agentpprof/Cargo.toml