pub struct UsageTracker { /* private fields */ }Expand description
Tracks API usage using a sliding window algorithm for accurate quota management.
UsageTracker monitors both requests-per-minute (RPM) and tokens-per-minute (TPM)
usage against configured quotas. It uses a bucket-based sliding window approach
where each bucket represents a 1-second slot, providing more accurate rate limit
tracking than traditional fixed-window counters.
§Sliding Window Algorithm
The tracker maintains BUCKET_COUNT buckets (default: 60), each covering a 1-second
window. When checking usage or calculating backoff times, only buckets within the
last BUCKETS_WINDOW_S seconds (default: 60) are considered valid.
§Example
use thryd::tracker::{UsageTracker, count_token};
let mut tracker = UsageTracker::with_quota(
Some(100_000), // TPM quota
Some(60), // RPM quota
);
// Add a request (input + output tokens)
tracker.add_request_raw(
"Hello, world!".to_string(),
"Hi there!".to_string(),
);
// Check remaining quota
let remaining_tpm = tracker.remaining_tpm_quota();
let remaining_rpm = tracker.remaining_rpm_quota();
// Check if we can make another request
if tracker.has_capacity() {
println!("Can make request immediately");
} else {
let wait_ms = tracker.need_wait_for_string("Another request".to_string());
println!("Wait {}ms", wait_ms);
}§Thread Safety
UsageTracker uses interior mutability and is not Sync. For multi-threaded
usage, wrap in a mutex or use within a single-threaded context.
Implementations§
Source§impl UsageTracker
impl UsageTracker
Sourcepub fn with_quota(tpm_quota: Option<Quota>, rpm_quota: Option<Quota>) -> Self
pub fn with_quota(tpm_quota: Option<Quota>, rpm_quota: Option<Quota>) -> Self
Creates a new UsageTracker with the specified quotas.
Pass None for a quota type to disable tracking for that dimension.
§Arguments
tpm_quota- Tokens-per-minute quota limit, orNoneto disable TPM trackingrpm_quota- Requests-per-minute quota limit, orNoneto disable RPM tracking
§Example
use thryd::tracker::UsageTracker;
// Track both RPM and TPM
let tracker = UsageTracker::with_quota(Some(100_000), Some(60));
// Track only TPM (unlimited RPM)
let tracker = UsageTracker::with_quota(Some(100_000), None);Sourcepub fn add_request(&mut self, used_token: Quota) -> &mut Self
pub fn add_request(&mut self, used_token: Quota) -> &mut Self
Sourcepub fn add_request_raw(
&mut self,
input_text: String,
output_text: String,
) -> &mut Self
pub fn add_request_raw( &mut self, input_text: String, output_text: String, ) -> &mut Self
Records a request by automatically counting tokens in the input and output text.
This is a convenience method that calls count_token() on both strings and
then calls add_request() with the sum.
§Arguments
input_text- The input/prompt text sent to the modeloutput_text- The model’s response text
§Returns
&mut Self- Returnsselffor method chaining
§Example
use thryd::tracker::UsageTracker;
let mut tracker = UsageTracker::with_quota(Some(100_000), Some(60));
tracker.add_request_raw(
"What is the capital of France?".to_string(),
"The capital of France is Paris.".to_string(),
);Sourcepub fn rpm_usage(&self) -> Option<Quota>
pub fn rpm_usage(&self) -> Option<Quota>
Returns the total number of requests in the current sliding window.
§Returns
Option<Quota>- Current RPM usage, orNoneif RPM tracking is disabled
Sourcepub fn remaining_rpm_quota(&self) -> Option<Quota>
pub fn remaining_rpm_quota(&self) -> Option<Quota>
Returns the remaining request quota available in the current window.
§Returns
Option<Quota>- Remaining RPM quota, orNoneif RPM tracking is disabled
Sourcepub fn tpm_usage(&self) -> Option<Quota>
pub fn tpm_usage(&self) -> Option<Quota>
Returns the total number of tokens used in the current sliding window.
§Returns
Option<Quota>- Current TPM usage, orNoneif TPM tracking is disabled
Sourcepub fn remaining_tpm_quota(&self) -> Option<Quota>
pub fn remaining_tpm_quota(&self) -> Option<Quota>
Returns the remaining token quota available in the current window.
§Returns
Option<Quota>- Remaining TPM quota, orNoneif TPM tracking is disabled
Sourcepub fn need_wait_for(&self, input_token: Quota) -> u64
pub fn need_wait_for(&self, input_token: Quota) -> u64
Calculates the minimum wait time needed before a request with the given token count can be made without violating rate limits.
This considers both RPM and TPM limits, returning the maximum wait time required by either constraint.
§Arguments
input_token- Number of tokens in the incoming request
§Returns
u64- Milliseconds to wait before the request can proceed. Returns 0 if there is sufficient capacity.
Sourcepub fn need_wait_for_string(&self, input_string: String) -> u64
pub fn need_wait_for_string(&self, input_string: String) -> u64
Calculates wait time for a request with text that will be token-counted first.
Convenience method that counts tokens in the input string and calls
need_wait_for() with the result.
§Arguments
input_string- The input text to count tokens for
§Returns
u64- Milliseconds to wait before the request can proceed
Sourcepub fn has_capacity(&self) -> bool
pub fn has_capacity(&self) -> bool
Checks whether there is capacity to make a request without rate limiting.
This is a convenience check that verifies both RPM and TPM have remaining quota.
§Returns
bool-trueif at least 1 RPM and 1 TPM quota remain,falseotherwise. Returnstrueif the respective tracking is disabled (Nonequota).