Skip to main content

Module rate_limiting

Module rate_limiting 

Source
Expand description

Rate limiting for API call throttling and concurrency control.

This module provides the RateLimiter for controlling both the concurrency and frequency of API calls to prevent throttling and respect service limits.

§Main Types

  • RateLimiter: Dual-level rate limiter with semaphore-based concurrency control and time-based request throttling

§Features

  • Separate rate limiting for LLM and embedding API calls
  • Semaphore-based concurrency control (max N simultaneous calls)
  • Time-based rate limiting (max N calls per second)
  • Automatic waiting when limits are reached
  • RAII-style permit handling with automatic release
  • Health checking for congestion detection
  • Per-second rate window with automatic reset

§Rate Limiting Strategy

The rate limiter implements a two-tier approach:

  1. Concurrency Control: Uses semaphores to limit how many API calls can run simultaneously. This prevents overwhelming the system with too many parallel requests.

  2. Time-Based Rate Limiting: Tracks requests per second and automatically waits when the limit is reached. The counter resets every second.

§Basic Usage

use graphrag_core::async_processing::{RateLimiter, AsyncConfig};

let config = AsyncConfig {
    max_concurrent_llm_calls: 3,
    llm_rate_limit_per_second: 2.0,
    max_concurrent_embeddings: 5,
    embedding_rate_limit_per_second: 10.0,
    ..Default::default()
};

let rate_limiter = RateLimiter::new(&config);

// Acquire permit for LLM call (blocks if needed)
let permit = rate_limiter.acquire_llm_permit().await?;
// ... make LLM API call ...
// Permit is automatically released when dropped

// Check available capacity
let available = rate_limiter.get_available_llm_permits();
println!("Available LLM permits: {}", available);

// Health check
let status = rate_limiter.health_check();

Structs§

RateLimiter
Rate limiter for controlling API call frequency and concurrency