turnkeeper 1.1.0

An asynchronous, recurring job scheduler for Tokio with support for CRON, interval, and weekday/time schedules, plus retries, cancellation, and observability.
Documentation
# TurnKeeper API Reference

This document provides a quick reference to the main public API of the `TurnKeeper` crate.

## Core Struct

### `TurnKeeper`

The main scheduler struct. Manages job lifecycle, worker pool, and state.

**Creation:**

```rust
use turnkeeper::{TurnKeeper, scheduler::PriorityQueueType, job::RecurringJobId, error::BuildError};
use std::time::Duration;

# async fn example() -> Result<(), BuildError> {
// Use the builder pattern
let builder = TurnKeeper::builder();

// Configure (examples)
let scheduler = builder
    .max_workers(4) // Required: Max concurrent jobs
    .priority_queue(PriorityQueueType::HandleBased) // Optional: Choose PQ type
    .staging_buffer_size(16) // Optional: Set buffer sizes (default 128)
    .build()?; // Returns Result<TurnKeeper, BuildError>
# Ok(())
# }
```

See [`SchedulerBuilder`](#schedulerbuilder) for all configuration options.

**Methods:**

*   `TurnKeeper::builder() -> SchedulerBuilder`
    *   Creates a new builder instance to configure the scheduler.

*   `async fn add_job_async<F>(&self, request: RecurringJobRequest, exec_fn: F) -> Result<RecurringJobId, SubmitError>`
    *   Submits a job for scheduling. Waits asynchronously if the internal staging buffer is full.
    *   `request`: A [`RecurringJobRequest`] defining the job's schedule and parameters. Use the relevant constructors (`from_week_day`, `from_cron`, etc.).
    *   `exec_fn`: An async function or closure matching [`BoxedExecFn`].
    *   Returns the unique `RecurringJobId` (lineage ID) assigned to the job upon successful submission to the staging queue. This ID is generated by the scheduler handle and used consistently internally.
    *   **Errors:** [`SubmitError::ChannelClosed`] - Contains the original `(request, exec_fn)` tuple.

*   `fn try_add_job<F>(&self, request: RecurringJobRequest, exec_fn: F) -> Result<RecurringJobId, SubmitError>`
    *   Attempts to submit a job non-blockingly.
    *   Returns `Ok(RecurringJobId)` on immediate successful submission to the staging queue. The ID is generated by the scheduler handle and used consistently internally.
    *   **Errors:** [`SubmitError::StagingFull`], [`SubmitError::ChannelClosed`] - Both contain the original `(request, exec_fn)` tuple.

*   `async fn cancel_job(&self, job_id: RecurringJobId) -> Result<(), QueryError>`
    *   Requests cancellation of a job lineage. Operation is idempotent.
    *   Effect depends on [`PriorityQueueType`] used during build (HandleBased attempts proactive removal).
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`], [`QueryError::JobNotFound`]

*   `async fn get_job_details(&self, job_id: RecurringJobId) -> Result<JobDetails, QueryError>`
    *   Retrieves detailed information about a specific job lineage. The `schedule` field in `JobDetails` will be of type [`Schedule`].
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`], [`QueryError::JobNotFound`]

*   `async fn list_all_jobs(&self) -> Result<Vec<JobSummary>, QueryError>`
    *   Retrieves summary information for all known job lineages (including potentially cancelled ones).
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`]

*   `async fn get_metrics_snapshot(&self) -> Result<MetricsSnapshot, QueryError>`
    *   Retrieves a snapshot of the scheduler's current internal metrics.
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`]

*   `async fn shutdown_graceful(&self, timeout: Option<Duration>) -> Result<(), ShutdownError>`
    *   Initiates graceful shutdown, waiting for active jobs to complete.
    *   Waits for tasks to exit or until the optional `timeout` (from `std::time::Duration`) elapses.
    *   **Errors:** [`ShutdownError::SignalFailed`], [`ShutdownError::Timeout`], [`ShutdownError::TaskPanic`]

*   `async fn shutdown_force(&self, timeout: Option<Duration>) -> Result<(), ShutdownError>`
    *   Initiates immediate shutdown, potentially interrupting active jobs.
    *   Waits for tasks to attempt exit or until the optional `timeout` (from `std::time::Duration`) elapses.
    *   **Errors:** [`ShutdownError::SignalFailed`], [`ShutdownError::Timeout`], [`ShutdownError::TaskPanic`]

## Configuration

### `SchedulerBuilder`

Used to configure and build a `TurnKeeper` instance. Get via `TurnKeeper::builder()`.

**Methods:**

*   `max_workers(usize) -> Self`: (Required) Sets the maximum number of concurrently running jobs. Must be > 0.
*   `priority_queue(PriorityQueueType) -> Self`: (Optional) Sets the internal queue type (`BinaryHeap` or `HandleBased`). Default: `HandleBased`.
*   `staging_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the incoming job submission buffer. Default: 128.
*   `command_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the internal command buffer (queries, etc.). Default: 128.
*   `job_dispatch_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the coordinator-to-worker dispatch channel. Must be >= 1. Default: 1.
*   `build() -> Result<TurnKeeper, BuildError>`: Consumes the builder and attempts to create and start the `TurnKeeper`.

### `PriorityQueueType` (Enum)

Determines the internal scheduling mechanism.

*   `BinaryHeap`: Standard library implementation. Lazy cancellation (job discarded when it's next to run). No efficient job updates before execution. Minimal dependencies.
*   `HandleBased`: Uses `priority-queue` crate. Allows proactive cancellation removal (attempts direct removal from queue). Foundation for efficient job updates before execution. Adds dependency.

## Job Definition

### `RecurringJobRequest`

Defines a job's configuration and schedule. Passed to `add_job` methods.

**Creation (Examples):**

```rust
use turnkeeper::job::{RecurringJobRequest, Schedule}; // Import Schedule enum
use chrono::{Weekday, NaiveTime, Utc, DateTime, Duration as ChronoDuration};
use std::time::Duration as StdDuration;

// Weekday/Time schedule
let mut req_weekday = RecurringJobRequest::from_week_day(
    "Weekday Job",
    vec![(Weekday::Mon, NaiveTime::from_hms_opt(9,0,0).unwrap())],
    3 // max_retries
);

// CRON schedule (every 5 minutes)
let req_cron = RecurringJobRequest::from_cron(
    "Cron Job",
    "0 */5 * * * * *", // Cron expression (includes seconds)
    1
);

// Interval schedule (every 30 seconds)
let mut req_interval = RecurringJobRequest::from_interval(
    "Interval Job",
    StdDuration::from_secs(30),
    2
);

// One-time schedule
let run_at = Utc::now() + ChronoDuration::minutes(10);
let req_once = RecurringJobRequest::from_once(
    "One Time Task",
    run_at,
    0
);

// No automatic schedule (needs manual trigger or initial run time)
let mut req_never = RecurringJobRequest::never(
    "Manual Trigger Job",
    0
);

// Set an initial run time (useful for Interval, Never, or delaying first run)
let first_run_time = Utc::now() + ChronoDuration::seconds(5);
req_interval.with_initial_run_time(first_run_time);
req_never.with_initial_run_time(first_run_time);
```

**Constructors & Methods:**

*   `from_week_day(name: &str, weekday_times: Vec<(Weekday, NaiveTime)>, max_retries: u32) -> Self`: Creates a job using a `Schedule::WeekdayTimes`.
*   `from_cron(name: &str, cron_expression: &str, max_retries: u32) -> Self`: Creates a job using `Schedule::Cron`. Requires `cron` crate.
*   `from_interval(name: &str, interval: StdDuration, max_retries: u32) -> Self`: Creates a job using `Schedule::FixedInterval`. First run typically needs `with_initial_run_time` or occurs at `Now + Interval`.
*   `from_once(name: &str, run_at: DateTime<Utc>, max_retries: u32) -> Self`: Creates a job using `Schedule::Once`. The first run time is set automatically.
*   `never(name: &str, max_retries: u32) -> Self`: Creates a job using `Schedule::Never`. It will only run if `with_initial_run_time` is called.
*   `with_initial_run_time(&mut self, run_time: DateTime<Utc>)`: Sets a specific first run time, overriding any schedule calculation for the *first* run only. Modifies the request in place.

### `Schedule` (Enum)

Represents the different ways a job can be scheduled. Held within `RecurringJobRequest`.

*   `WeekdayTimes(Vec<(Weekday, NaiveTime)>)`: Run at specific UTC times on given weekdays.
*   `Cron(String)`: Run based on a standard CRON expression (UTC). Requires `cron` crate.
*   `FixedInterval(StdDuration)`: Run repeatedly at a fixed interval after the last scheduled/run time.
*   `Once(DateTime<Utc>)`: Run only once at the specified time.
*   `Never`: No automatic schedule. Requires `with_initial_run_time` to run even once.

### `BoxedExecFn` (Type Alias)

The required signature for the job execution function:

```rust
use std::pin::Pin;
use std::future::Future;

// Must be async, Send + Sync + 'static
type BoxedExecFn = Box<
    dyn Fn() -> Pin<Box<dyn Future<Output = bool> + Send + 'static>>
    + Send + Sync + 'static,
>;
// Return `true` for success, `false` for logical failure (triggers retry).
// Panics within the function are caught and treated as failures.
```

**Example Job Function:**

```rust
let job_fn = || {
    Box::pin(async move {
        println!("Job running!");
        tokio::time::sleep(std::time::Duration::from_millis(100)).await;
        // Perform work...
        let success = true; // Determine success/failure
        success
    })
};
```

## IDs

*   `RecurringJobId = uuid::Uuid`: Identifies a job lineage (the definition). Returned by `add_job` methods. Used in `cancel_job`, `get_job_details`.
*   `InstanceId = uuid::Uuid`: Identifies a specific scheduled *instance* of a job in the queue or executing. Seen in internal logs and `JobDetails.next_run_instance`.

## Query Results & Metrics

*   `JobDetails`: Struct with detailed configuration and current state of a job lineage (includes `schedule: Schedule`).
*   `JobSummary`: Struct with summary information for listing jobs.
*   `MetricsSnapshot`: Struct holding a point-in-time snapshot of internal counters and gauges. Includes:
    *   `jobs_submitted`, `jobs_executed_success`, `jobs_executed_fail`, `jobs_panicked`, `jobs_retried`
    *   `jobs_lineage_cancelled`: Count of `cancel_job` calls that marked a lineage.
    *   `jobs_instance_discarded_cancelled`: Count of instances discarded due to cancellation.
    *   `jobs_permanently_failed`
    *   `staging_submitted_total`, `staging_rejected_full`
    *   `job_queue_scheduled_current`, `job_staging_buffer_current`, `workers_active_current` (Gauges)
    *   `job_execution_duration_count`, `job_execution_duration_sum_micros` (Histogram data)

## Errors

*   `BuildError`: Errors during scheduler construction (`SchedulerBuilder::build`).
*   `SubmitError<(RecurringJobRequest, Arc<BoxedExecFn>)>`: Errors during job submission (`try_add_job`, `add_job_async`). Contains original job request/fn on failure.
*   `QueryError`: Errors during queries or cancellation requests (`get_...`, `list_...`, `cancel_job`).
*   `ShutdownError`: Errors during shutdown (`shutdown_graceful`, `shutdown_force`).