Crate slurm [] [src]

Interface to the Slurm workload manager.

Slurm is a system for scheduling and running jobs on large computing clusters. It is often used in scientific HPC (high-performance computing) contexts.

This crate provides hooks for submitting new jobs and interrogating their status. Support for other kinds of operations, such as canceling jobs or altering their runtime parameters, would be entirely appropriate but has not yet been implemented.

Example: querying a running job

extern crate failure;
extern crate slurm;

fn print_random_job_information(jobid: slurm::JobId) -> Result<(), failure::Error> {
    let info = slurm::get_job_info(jobid)?;
    println!("Job ID: {}", info.job_id()); // same as what we put in
    println!("Job's partition: {}", info.partition());
    Ok(())
}

Example: querying a completed job

To gather information about jobs that have completed, you must connect to the Slurm accounting database and query it.

extern crate chrono;
extern crate failure;
extern crate slurm;

fn print_other_job_information(jobid: slurm::JobId) -> Result<(), failure::Error> {
    let mut filter = slurm::JobFiltersOwned::default();
    filter.step_list_mut().append(slurm::JobStepFilterOwned::new(jobid));

    let db = slurm::DatabaseConnectionOwned::new()?;
    let jobs = db.get_jobs(&filter)?;
    let now = chrono::Utc::now();

    for job in jobs.iter() {
        println!("Job ID {}, name {}", job.job_id(), job.job_name());

        if let Some(d) = job.wait_duration() {
            println!("  job started; wait time: {} s", d.num_seconds());
        } else if let Some(t_el) = job.eligible_time() {
            let wait = now.signed_duration_since(t_el).num_seconds();
            println!("  job not yet started; time since eligibility: {} s", wait);
        } else {
            println!("  job not yet eligible to run");
        }
    }

    Ok(())
}

Submitting a “Hello World” job

extern crate failure;
extern crate slurm;

fn submit_hello_world() -> Result<slurm::JobId, failure::Error> {
    let cwd = std::env::current_dir()?;

    let log = {
        let mut p = cwd.clone();
        p.push("%j.log");
        p.to_str().ok_or(failure::err_msg("cannot stringify log path"))?.to_owned()
    };

    let mut desc = slurm::JobDescriptorOwned::new();

    desc.set_name("helloworld")
        .set_argv(&["helloworld"])
        .inherit_environment()
        .set_stderr_path(&log)
        .set_stdin_path("/dev/null")
        .set_stdout_path(&log)
        .set_work_dir_cwd()?
        .set_script("#! /bin/bash \
                     set -e -x \
                     echo hello world \"$@\"")
        .set_gid_current() // JobDescriptor args must come after due to the return type
        .set_num_tasks(1)
        .set_time_limit(5)
        .set_uid_current();

    let msg = desc.submit_batch()?;
    println!("new job id: {}", msg.job_id());
    Ok(msg.job_id())
}

A note on memory management

The Slurm C library uses a (primitive) custom memory allocator for its data structures. Because we must maintain compatibility with this allocator, we have to allocate all of our data structures from the heap rather than the stack. Almost all of the structures exposed here come in both “borrowed” and “owned” flavors; they are largely equivalent, but only the owned versions free their data when they go out of scope. Borrowed structures need not be immutable, but it is not possible to modify them in ways that require freeing or allocating memory associated with their sub-structures.

Structs

DatabaseConnection

A connection to the Slurm accounting database.

DatabaseConnectionOwned

An owned version of DatabaseConnection.

JobDescriptor

A description of a batch job to submit.

JobDescriptorOwned

An owned version of JobDescriptor.

JobFilters

A set of filters for identifying jobs of interest when querying the Slurm accounting database.

JobFiltersOwned

An owned version of JobFilters

JobInfo

Information about a running job.

JobRecord

Accounting information about a job.

JobStepFilter

A filter for selecting jobs and job steps.

JobStepFilterOwned

An owned version of JobStepFilter.

SingleJobInfoMessage

Information about a single job.

SingleJobInfoMessageOwned

An owned version of SingleJobInfoMessage.

SlurmList

A list of some kind of object known to Slurm.

SlurmListIteratorOwned
SlurmListOwned

An owned version of SlurmList.

SlurmStringListIteratorOwned

A helper for iterating through lists of strings.

StepRecord

Accounting information about a step within a job.

SubmitResponseMessage

Information returned by Slurm upon job submission.

SubmitResponseMessageOwned

An owned version of SubmitResponseMessage.

Enums

JobState

States that a job or job step can be in.

SlurmError

Specifically-enumerated errors that we can get from the Slurm API.

Traits

JobStepRecordSharedFields

A trait for accessing fields common to SlurmDB job records and step records.

UnownedFromSlurmPointer

A helper trait that lets us generically iterate over lists. It must be public so that we can expose Iterator for SlurmListIteratorOwned.

Functions

get_job_info

Get information about a single job.

Type Definitions

JobId

A job identifier number; this will always be u32.

StepId

A job-step identifier number; this will always be u32.