# Cost Tracking
The library tracks usage and cost per request, and is honest about when a cost is
actually known.
## Token buckets
Input tokens are split into three **disjoint, additive** buckets so caching is
priced correctly across every provider's differing wire conventions:
- `uncached_input_tokens`: full-price prompt tokens,
- `cache_read_tokens`: served from a warm cache (cheap),
- `cache_write_tokens`: written to the cache this request (a premium).
Total input is the sum of the three; cost is a clean weighted sum, no subtraction.
## Resolution: never a fake $0
Every reported `CostInfo` carries a `CostResolution`:
| `Resolved` | The USD cost is authoritative (native, or tokens × a configured `TokenPrice`) |
| `Unpriced` | Tokens are real, but no native cost and no `TokenPrice` was set. `cost` is `0.0` but must NOT be treated as a free request. Set a `TokenPrice` to resolve it. |
| `Unknown` | Cost could not be determined at all (no usage, and any out-of-band query failed) |
Check `resolution` before trusting `cost`.
## A callback per completion
```rust,no_run
use minillmlib::{ChatNode, GeneratorInfo, NodeCompletionParameters, CompletionParameters, CostInfo};
use std::sync::{Arc, Mutex};
# async fn run() -> minillmlib::Result<()> {
let gen = GeneratorInfo::openrouter("google/gemini-2.5-flash-lite");
let total = Arc::new(Mutex::new(0.0));
let sink = total.clone();
let params = NodeCompletionParameters::new()
.with_params(CompletionParameters::new().with_max_tokens(200))
.with_cost_tracking(true)
.with_cost_callback(move |info: CostInfo| {
// info.cost, .prompt_tokens, .completion_tokens,
// .cache_read_tokens, .cache_write_tokens, .resolution
*sink.lock().unwrap() += info.cost;
});
let root = ChatNode::root("You are helpful.");
root.add_user("Hi").complete(&gen, Some(¶ms)).await?;
println!("total spent: {}", *total.lock().unwrap());
# Ok(()) }
```
## Enforced tracking via CompletionContext
When you want cost reporting to be structurally guaranteed (not opt-in per call),
wrap the generator in a `CompletionContext` and use `complete_tracked`. It always
reports cost through the context's async callback, and on a cancelled or
usage-less stream it resolves out-of-band (e.g. OpenRouter's `/generation` query)
or reports `Unknown`, rather than silently booking `$0`.
```rust,no_run
use minillmlib::{CompletionContext, CostInfo, AsyncCostCallback, CompletionMeta, GeneratorInfo, ChatNode};
use std::sync::Arc;
# async fn run() -> minillmlib::Result<()> {
# let generator = GeneratorInfo::openrouter("m");
let _answer = root.add_user("Hi").complete_tracked(&ctx, None).await?;
# Ok(()) }
```
For streaming, `complete_streaming_tracked` returns a `TrackedStream` that settles
cost when it finishes or is cancelled (use `cancel().await` for a reliable
settle; a plain drop is best-effort).