turnkeeper 1.0.0

An asynchronous, recurring job scheduler for Tokio with cron-like (weekday/time) scheduling, retries, observability and flexible backends.
Documentation
# TurnKeeper API Reference

This document provides a quick reference to the main public API of the `TurnKeeper` crate.

## Core Struct

### `TurnKeeper`

The main scheduler struct. Manages job lifecycle, worker pool, and state.

**Creation:**

```rust
use turnkeeper::{TurnKeeper, scheduler::PriorityQueueType, job::RecurringJobId, error::BuildError};
use std::time::Duration;

# async fn example() -> Result<(), BuildError> {
// Use the builder pattern
let builder = TurnKeeper::builder();

// Configure (examples)
let scheduler = builder
    .max_workers(4) // Required: Max concurrent jobs
    .priority_queue(PriorityQueueType::HandleBased) // Optional: Choose PQ type
    .staging_buffer_size(16) // Optional: Set buffer sizes (default 128)
    .build()?; // Returns Result<TurnKeeper, BuildError>
# Ok(())
# }
```

See [`SchedulerBuilder`](#schedulerbuilder) for all configuration options.

**Methods:**

*   `TurnKeeper::builder() -> SchedulerBuilder`
    *   Creates a new builder instance to configure the scheduler.

*   `async fn add_job_async<F>(&self, request: RecurringJobRequest, exec_fn: F) -> Result<RecurringJobId, SubmitError>`
    *   Submits a job for scheduling. Waits asynchronously if the internal staging buffer is full.
    *   `request`: A [`RecurringJobRequest`] defining the job's schedule and parameters.
    *   `exec_fn`: An async function or closure matching [`BoxedExecFn`].
    *   Returns the unique `RecurringJobId` (lineage ID) assigned to the job upon successful submission to the staging queue. This ID is generated by the scheduler handle and used consistently internally.
    *   **Errors:** [`SubmitError::ChannelClosed`] - Contains the original `(request, exec_fn)` tuple.

*   `fn try_add_job<F>(&self, request: RecurringJobRequest, exec_fn: F) -> Result<RecurringJobId, SubmitError>`
    *   Attempts to submit a job non-blockingly.
    *   Returns `Ok(RecurringJobId)` on immediate successful submission to the staging queue. The ID is generated by the scheduler handle and used consistently internally.
    *   **Errors:** [`SubmitError::StagingFull`], [`SubmitError::ChannelClosed`] - Both contain the original `(request, exec_fn)` tuple.

*   `async fn cancel_job(&self, job_id: RecurringJobId) -> Result<(), QueryError>`
    *   Requests cancellation of a job lineage. Operation is idempotent.
    *   Effect depends on [`PriorityQueueType`] used during build (HandleBased attempts proactive removal).
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`], [`QueryError::JobNotFound`]

*   `async fn get_job_details(&self, job_id: RecurringJobId) -> Result<JobDetails, QueryError>`
    *   Retrieves detailed information about a specific job lineage.
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`], [`QueryError::JobNotFound`]

*   `async fn list_all_jobs(&self) -> Result<Vec<JobSummary>, QueryError>`
    *   Retrieves summary information for all known job lineages (including potentially cancelled ones).
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`]

*   `async fn get_metrics_snapshot(&self) -> Result<MetricsSnapshot, QueryError>`
    *   Retrieves a snapshot of the scheduler's current internal metrics.
    *   **Errors:** [`QueryError::SchedulerShutdown`], [`QueryError::ResponseFailed`]

*   `async fn shutdown_graceful(&self, timeout: Option<Duration>) -> Result<(), ShutdownError>`
    *   Initiates graceful shutdown, waiting for active jobs to complete.
    *   Waits for tasks to exit or until the optional `timeout` (from `std::time::Duration`) elapses.
    *   **Errors:** [`ShutdownError::SignalFailed`], [`ShutdownError::Timeout`], [`ShutdownError::TaskPanic`]

*   `async fn shutdown_force(&self, timeout: Option<Duration>) -> Result<(), ShutdownError>`
    *   Initiates immediate shutdown, potentially interrupting active jobs.
    *   Waits for tasks to attempt exit or until the optional `timeout` (from `std::time::Duration`) elapses.
    *   **Errors:** [`ShutdownError::SignalFailed`], [`ShutdownError::Timeout`], [`ShutdownError::TaskPanic`]

## Configuration

### `SchedulerBuilder`

Used to configure and build a `TurnKeeper` instance. Get via `TurnKeeper::builder()`.

**Methods:**

*   `max_workers(usize) -> Self`: (Required) Sets the maximum number of concurrently running jobs. Must be > 0.
*   `priority_queue(PriorityQueueType) -> Self`: (Optional) Sets the internal queue type (`BinaryHeap` or `HandleBased`). Default: `HandleBased`.
*   `staging_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the incoming job submission buffer. Default: 128.
*   `command_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the internal command buffer (queries, etc.). Default: 128.
*   `job_dispatch_buffer_size(usize) -> Self`: (Optional) Sets the capacity of the coordinator-to-worker dispatch channel. Must be >= 1. Default: 1.
*   `build() -> Result<TurnKeeper, BuildError>`: Consumes the builder and attempts to create and start the `TurnKeeper`.

### `PriorityQueueType` (Enum)

Determines the internal scheduling mechanism.

*   `BinaryHeap`: Standard library implementation. Lazy cancellation (job discarded when it's next to run). No efficient job updates before execution. Minimal dependencies.
*   `HandleBased`: Uses `priority-queue` crate. Allows proactive cancellation removal (attempts direct removal from queue). Foundation for efficient job updates before execution. Adds dependency.

## Job Definition

### `RecurringJobRequest`

Defines a job's configuration and schedule. Passed to `add_job` methods.

**Creation:**

```rust
use turnkeeper::job::RecurringJobRequest;
use chrono::{Weekday, NaiveTime, Utc, DateTime, Duration}; // Added DateTime, Duration

// Recurring job
let mut req = RecurringJobRequest::new( // Use let mut if using methods below
    "My Job",                    // name: &str
    vec![(Weekday::Mon, NaiveTime::from_hms_opt(9,0,0).unwrap())], // schedule: Vec<(Weekday, NaiveTime)>
    3                            // max_retries: u32
);

// One-time job (or setting specific first run time)
let mut one_time_req = RecurringJobRequest::new("One Time Task", vec![], 0);
let first_run_time = Utc::now() + Duration::seconds(10);
one_time_req.with_initial_run_time(first_run_time); // Call method on mutable request
```

**Methods:**

*   `new(name: &str, weekday_times: Vec<(Weekday, NaiveTime)>, max_retries: u32) -> Self`: Creates a new request. `next_run` is initially `None`.
*   `with_initial_run_time(&mut self, run_time: DateTime<Utc>)`: Sets a specific first run time, overriding schedule calculation for the first run. Modifies the request in place. Required for one-time jobs (empty `weekday_times`).

### `BoxedExecFn` (Type Alias)

The required signature for the job execution function:

```rust
use std::pin::Pin;
use std::future::Future;

// Must be async, Send + Sync + 'static
type BoxedExecFn = Box<
    dyn Fn() -> Pin<Box<dyn Future<Output = bool> + Send + 'static>>
    + Send + Sync + 'static,
>;
// Return `true` for success, `false` for logical failure (triggers retry).
// Panics within the function are caught and treated as failures.
```

**Example Job Function:**

```rust
let job_fn = || {
    Box::pin(async move {
        println!("Job running!");
        tokio::time::sleep(std::time::Duration::from_millis(100)).await;
        // Perform work...
        let success = true; // Determine success/failure
        success
    })
};
```

## IDs

*   `RecurringJobId = uuid::Uuid`: Identifies a job lineage (the definition). Returned by `add_job` methods.
*   `InstanceId = uuid::Uuid`: Identifies a specific scheduled *instance* of a job in the queue or executing. Seen in internal logs and potentially `JobDetails`.

## Query Results & Metrics

*   `JobDetails`: Struct with detailed configuration and current state of a job lineage.
*   `JobSummary`: Struct with summary information for listing jobs.
*   `MetricsSnapshot`: Struct holding a point-in-time snapshot of internal counters and gauges. Includes:
    *   `jobs_submitted`, `jobs_executed_success`, `jobs_executed_fail`, `jobs_panicked`, `jobs_retried`
    *   `jobs_lineage_cancelled`: Count of `cancel_job` calls that marked a lineage.
    *   `jobs_instance_discarded_cancelled`: Count of instances discarded due to cancellation.
    *   `jobs_permanently_failed`
    *   `staging_submitted_total`, `staging_rejected_full`
    *   `job_queue_scheduled_current`, `job_staging_buffer_current`, `workers_active_current` (Gauges)
    *   `job_execution_duration_count`, `job_execution_duration_sum_micros` (Histogram data)

## Errors

*   `BuildError`: Errors during scheduler construction (`SchedulerBuilder::build`).
*   `SubmitError<(RecurringJobRequest, Arc<BoxedExecFn>)>`: Errors during job submission (`try_add_job`, `add_job_async`). Contains original job request/fn on failure.
*   `QueryError`: Errors during queries or cancellation requests (`get_...`, `list_...`, `cancel_job`).
*   `ShutdownError`: Errors during shutdown (`shutdown_graceful`, `shutdown_force`).