Expand description
Evaluating AI frameworks for agentic AI use.
Where languages profiles the language an agent writes
in, this module profiles the AI framework an agent builds with — the
library it must discover, drive, and debug autonomously. Same four axes,
framework-flavored:
- token efficiency — how many tokens a working model/pipeline costs (API verbosity, config boilerplate, import surface).
- determinism — seeded-run reproducibility, version stability, and whether artifacts (checkpoints, graphs) are byte-stable.
- reliability — when the agent misuses the API, does it get an early structured error (shape checks at graph build) or a runtime tensor explosion three layers deep?
- safety — does loading/running third-party artifacts execute arbitrary code (pickle!), and is the compute surface effect-gated?
Plus one framework-specific axis the others don’t need:
- discoverability — can an agent learn the surface from the framework itself (machine-readable schemas/ontology, introspectable ops, stable programmatic docs) instead of scraping prose?
Profiles are curated 0.0–1.0 static judgments with evidence, like the
language profiles — deterministic, serializable, comparable.
use agentic_eval::frameworks::{profile, rank_frameworks, Framework};
let torch = profile(Framework::PyTorch);
assert!(torch.evidence.len() >= 3);
let ranked = rank_frameworks();
assert!(ranked[0].fitness() >= ranked[ranked.len() - 1].fitness());Structs§
- Framework
Comparison - Compare two frameworks: positive deltas mean
afits agentic use better. - Framework
Profile - A curated agentic profile of an AI framework: the four shared axes plus framework-specific discoverability, with evidence.
Enums§
- Framework
- AI frameworks with curated agentic profiles.
Functions§
- compare_
frameworks - Compare framework
aagainst baselinebacross all five axes. - profile
- The curated profile for
fw(static, documented judgments — see module docs). - profiles
- Profiles for all frameworks, in
Framework::allorder (deterministic). - rank_
frameworks - All profiles ranked best-first by
FrameworkProfile::fitness(stable order on ties).