Upgrading the SLURM Plugin
This page covers SLURM-specific considerations when upgrading waldur-site-agent-slurm.
Read the general upgrade guide first.
Required backend_settings keys
The SLURM backend reads the following keys from backend_settings.
Required keys must be present or the agent will fail to start.
| Key | Required | Notes |
|---|---|---|
default_account |
Yes | Root SLURM account; must exist in the cluster |
customer_prefix |
Yes | Prefix for customer-level SLURM accounts |
project_prefix |
Yes | Prefix for project-level SLURM accounts |
allocation_prefix |
Yes | Prefix for allocation accounts |
cluster_name |
No | Must match the offering's backend_id in Waldur; required in multi-cluster setups |
slurm_bin_path |
No | Default /usr/bin |
parent_account |
No | Set for flat hierarchies (no customer tier); omit for nested hierarchy |
default_partition |
No | Fallback SLURM partition |
enforce_offering_partitions |
No | Default false |
enable_user_homedir_account_creation |
No | Default true |
default_homedir_umask |
No | Default 0077 |
Check the CHANGELOG for any new required keys before upgrading.
QoS configuration
QoS state is driven by the paused and downscaled flags set by Waldur Mastermind
(via policy or manual action). The agent maps these flags to SLURM QoS names:
1 2 3 4 | |
Optional per-account QoS creation during resource provisioning is available via
qos_management:
1 2 3 4 | |
Account hierarchy and sync_resource_project
When a project is moved to a different customer in Waldur, the SLURM account's parent
must be updated to reflect the new customer account. The agent handles this via
sync_resource_project, called at:
- Polling mode: every
order_processormembership_synccycle. - Event-process (STOMP) mode: on incoming
RESOURCEevents, and periodically every reconciliation interval. Mastermind also pushes aRESOURCEevent immediately when a project moves, so the hierarchy is corrected without waiting for the next cycle.
sync_resource_project is skipped when parent_account is set (flat hierarchy).
Validating after upgrade
Run diagnostics
waldur_site_diagnostics calls SlurmBackend.diagnostics(), which prints the configured
prefixes, default_account, and SLURM version, and returns an error if sinfo is
unreachable:
1 | |
A healthy output looks like:
1 2 3 4 5 | |
Verify account hierarchy for recently moved projects (STOMP mode)
If any project was moved between customers while the agent was not running or was on an older version, the SLURM account parent may be stale. Trigger a sync by restarting the agent or waiting for one reconciliation interval.
To check a specific account's parent directly:
1 | |
Confirm QoS names exist in SLURM
If you use qos_downscaled or qos_paused, verify the referenced QoS objects
exist in the cluster:
1 | |
A QoS name in backend_settings that does not exist in SLURM will cause the
downscale or pause action to fail with a BackendError.
Filesystem quotas
If you use homedir_quota or project_directory (Lustre/CephFS/XFS quotas), no
migration steps are required — quota configuration is read fresh on each resource
creation or user add. After upgrading, run diagnostics and create a test resource to
confirm quota-setting commands succeed.
See SLURM Storage Quotas for full configuration reference.