Crate slurm [−] [src]
Interface to the Slurm workload manager.
Slurm is a system for scheduling and running jobs on large computing clusters. It is often used in scientific HPC (high-performance computing) contexts.
This crate provides hooks for submitting new jobs and interrogating their status. Support for other kinds of operations, such as canceling jobs or altering their runtime parameters, would be entirely appropriate but has not yet been implemented.
Example: querying a running job
extern crate failure; extern crate slurm; fn print_random_job_information(jobid: slurm::JobId) -> Result<(), failure::Error> { let info = slurm::get_job_info(jobid)?; println!("Job ID: {}", info.job_id()); // same as what we put in println!("Job's partition: {}", info.partition()); Ok(()) }
Example: querying a completed job
To gather information about jobs that have completed, you must connect to the Slurm accounting database and query it.
extern crate chrono; extern crate failure; extern crate slurm; fn print_other_job_information(jobid: slurm::JobId) -> Result<(), failure::Error> { let mut filter = slurm::JobFiltersOwned::default(); filter.step_list_mut().append(slurm::JobStepFilterOwned::new(jobid)); let db = slurm::DatabaseConnectionOwned::new()?; let jobs = db.get_jobs(&filter)?; let now = chrono::Utc::now(); for job in jobs.iter() { println!("Job ID {}, name {}", job.job_id(), job.job_name()); if let Some(d) = job.wait_duration() { println!(" job started; wait time: {} s", d.num_seconds()); } else if let Some(t_el) = job.eligible_time() { let wait = now.signed_duration_since(t_el).num_seconds(); println!(" job not yet started; time since eligibility: {} s", wait); } else { println!(" job not yet eligible to run"); } } Ok(()) }
Submitting a “Hello World” job
extern crate failure; extern crate slurm; fn submit_hello_world() -> Result<slurm::JobId, failure::Error> { let cwd = std::env::current_dir()?; let log = { let mut p = cwd.clone(); p.push("%j.log"); p.to_str().ok_or(failure::err_msg("cannot stringify log path"))?.to_owned() }; let mut desc = slurm::JobDescriptorOwned::new(); desc.set_name("helloworld") .set_argv(&["helloworld"]) .inherit_environment() .set_stderr_path(&log) .set_stdin_path("/dev/null") .set_stdout_path(&log) .set_work_dir_cwd()? .set_script("#! /bin/bash \ set -e -x \ echo hello world \"$@\"") .set_gid_current() // JobDescriptor args must come after due to the return type .set_num_tasks(1) .set_time_limit(5) .set_uid_current(); let msg = desc.submit_batch()?; println!("new job id: {}", msg.job_id()); Ok(msg.job_id()) }
A note on memory management
The Slurm C library uses a (primitive) custom memory allocator for its data structures. Because we must maintain compatibility with this allocator, we have to allocate all of our data structures from the heap rather than the stack. Almost all of the structures exposed here come in both “borrowed” and “owned” flavors; they are largely equivalent, but only the owned versions free their data when they go out of scope. Borrowed structures need not be immutable, but it is not possible to modify them in ways that require freeing or allocating memory associated with their sub-structures.
Structs
DatabaseConnection |
A connection to the Slurm accounting database. |
DatabaseConnectionOwned |
An owned version of |
JobDescriptor |
A description of a batch job to submit. |
JobDescriptorOwned |
An owned version of |
JobFilters |
A set of filters for identifying jobs of interest when querying the Slurm accounting database. |
JobFiltersOwned |
An owned version of |
JobInfo |
Information about a running job. |
JobRecord |
Accounting information about a job. |
JobStepFilter |
A filter for selecting jobs and job steps. |
JobStepFilterOwned |
An owned version of |
SingleJobInfoMessage |
Information about a single job. |
SingleJobInfoMessageOwned |
An owned version of |
SlurmList |
A list of some kind of object known to Slurm. |
SlurmListIteratorOwned | |
SlurmListOwned |
An owned version of |
SlurmStringListIteratorOwned |
A helper for iterating through lists of strings. |
StepRecord |
Accounting information about a step within a job. |
SubmitResponseMessage |
Information returned by Slurm upon job submission. |
SubmitResponseMessageOwned |
An owned version of |
Enums
JobState |
States that a job or job step can be in. |
SlurmError |
Specifically-enumerated errors that we can get from the Slurm API. |
Traits
JobStepRecordSharedFields |
A trait for accessing fields common to SlurmDB job records and step records. |
UnownedFromSlurmPointer |
A helper trait that lets us generically iterate over lists. It must be
public so that we can expose |
Functions
get_job_info |
Get information about a single job. |
Type Definitions
JobId |
A job identifier number; this will always be |
StepId |
A job-step identifier number; this will always be |