Expand description
Evaluating programming languages for agentic AI use.
The other modules score a program. This module scores the language a program is written in — the standing properties that determine how well an LLM agent can write, verify, and recover in it, on the same four axes:
- token efficiency — how many tokens typical code costs (syntax weight, boilerplate, type annotations) and how much standing context (imports, project config) a working snippet drags in.
- determinism — does the toolchain behave reproducibly (lockfiles, hermetic builds, stable formatting) so agent-driven edit→run loops converge?
- reliability — when the agent gets it wrong, does the language catch it (static types, compile errors with spans, no undefined behavior) and is the error message structured enough to self-correct from?
- safety — what blast radius does running generated code have by default (memory safety, sandboxability, capability gating)?
Scores are 0.0–1.0 static profiles: curated, documented judgments encoded
as data — deterministic, comparable, and serializable — not measurements of
your codebase (use the program-level axes for that). Each profile carries
evidence strings so an agent can see why a score is what it is, and the
per-axis rationale survives serialization.
use agentic_eval::languages::{profile, rank_languages, Language};
let rust = profile(Language::Rust);
assert!(rust.reliability >= 0.8); // compiler catches agent mistakes
let ranked = rank_languages();
assert_eq!(ranked.len(), Language::all().len());
// Ranked best-first by composite fitness:
assert!(ranked[0].fitness() >= ranked[ranked.len() - 1].fitness());Structs§
- Language
Comparison - Compare two languages: positive means
afits agentic use better. - Language
Profile - A curated agentic profile of a language: four 0.0–1.0 axis scores plus the evidence behind them.
Enums§
- Language
- Languages with curated agentic profiles.
Functions§
- compare_
languages - Compare language
aagainst baselinebacross all four axes. - profile
- The curated profile for
lang. Scores are static, documented judgments (see module docs); evidence strings carry the rationale. - profiles
- Profiles for all languages, in
Language::allorder (deterministic). - rank_
languages - All profiles ranked best-first by
LanguageProfile::fitness(ties broken by the fixedLanguage::allorder, so output is deterministic).