Skip to main content

Module languages

Module languages 

Source
Expand description

Evaluating programming languages for agentic AI use.

The other modules score a program. This module scores the language a program is written in — the standing properties that determine how well an LLM agent can write, verify, and recover in it, on the same four axes:

  • token efficiency — how many tokens typical code costs (syntax weight, boilerplate, type annotations) and how much standing context (imports, project config) a working snippet drags in.
  • determinism — does the toolchain behave reproducibly (lockfiles, hermetic builds, stable formatting) so agent-driven edit→run loops converge?
  • reliability — when the agent gets it wrong, does the language catch it (static types, compile errors with spans, no undefined behavior) and is the error message structured enough to self-correct from?
  • safety — what blast radius does running generated code have by default (memory safety, sandboxability, capability gating)?

Scores are 0.0–1.0 static profiles: curated, documented judgments encoded as data — deterministic, comparable, and serializable — not measurements of your codebase (use the program-level axes for that). Each profile carries evidence strings so an agent can see why a score is what it is, and the per-axis rationale survives serialization.

use agentic_eval::languages::{profile, rank_languages, Language};
let rust = profile(Language::Rust);
assert!(rust.reliability >= 0.8); // compiler catches agent mistakes
let ranked = rank_languages();
assert_eq!(ranked.len(), Language::all().len());
// Ranked best-first by composite fitness:
assert!(ranked[0].fitness() >= ranked[ranked.len() - 1].fitness());

Structs§

LanguageComparison
Compare two languages: positive means a fits agentic use better.
LanguageProfile
A curated agentic profile of a language: four 0.0–1.0 axis scores plus the evidence behind them.

Enums§

Language
Languages with curated agentic profiles.

Functions§

compare_languages
Compare language a against baseline b across all four axes.
profile
The curated profile for lang. Scores are static, documented judgments (see module docs); evidence strings carry the rationale.
profiles
Profiles for all languages, in Language::all order (deterministic).
rank_languages
All profiles ranked best-first by LanguageProfile::fitness (ties broken by the fixed Language::all order, so output is deterministic).