Marketplace SLURM Partitions and Software Catalogs
This guide covers SLURM partition configuration and their integration with software catalogs in Waldur's marketplace.
Overview
SLURM partitions represent compute partitions in a cluster that can be associated with marketplace offerings. They define resource limits, scheduling policies, access controls, and optionally link to software catalogs for partition-specific software availability.
OfferingPartition records are exposed via the marketplace API for tools like Open OnDemand and are informational by default. The Waldur Site Agent can optionally enforce them as access restrictions on the SLURM cluster (sacctmgr add user … Partitions=…), enabling per-partition pricing — one offering per partition, each with its own price components — while reusing the same underlying SLURM account hierarchy. Enforcement is opt-in; existing deployments that populated partitions for documentation purposes only continue to behave exactly as before.
SLURM partition assignment by the Site Agent
Enforcement is enabled per-cluster via the enforce_offering_partitions setting in the agent's backend_settings. The default is false — partition records are not threaded into SLURM. When set to true, and when an offering has OfferingPartition records, the agent constructs an association command that includes the offering's partition list:
1 2 | |
Behavior summary (when enforcement is enabled):
- The offering's partition list is read at user-association time. Partition names are sorted alphabetically and joined with commas into a single
Partitions=argument. - The agent does not reconcile partition associations after creation. Changes to an offering's partition list affect only newly-added users; users who already have associations keep their existing partition restrictions until they are explicitly removed and re-added.
- The agent does not emit a per-user
DefaultPartition=. Realsacctmgrdoes not accept that attribute onadd user(no parser inuser_functions.corsacctmgr_set_assoc_rec) and rejects the call withUnknown option. The default partition for unparameterized jobs comes from the cluster-wideDefault=YESline inslurm.conf.
Precedence
The agent resolves partitions in this order:
- Offering partitions — when
enforce_offering_partitionsistrueand the offering hasOfferingPartitionrecords, those names become thePartitions=value. - Global
default_partition— when the offering has no partitions (or enforcement is disabled), the agent'sdefault_partitionsetting (single partition string) is used as a fallback. This preserves the pre-existing single-partition behavior for sites that haven't migrated to per-offering partitions. - Unrestricted — neither configured, no
Partitions=flag is emitted. The user falls back to SLURM's cluster-wide default partition behavior.
Site-agent configuration
Two relevant settings under backend_settings:
1 2 3 | |
enforce_offering_partitionsswitches on the partition-aware path. Leave unset (or set tofalse) to keepOfferingPartitionrecords purely informational, the historical behavior.default_partitionis the legacy single-partition fallback used when the offering has no partitions or when enforcement is disabled.
Scope and non-goals
- Partition restrictions are applied at user-level SLURM associations. SLURM's accounting model does not support partition restrictions at account scope.
- The agent does not modify existing user associations when an offering's partition list changes. To rebalance, an operator must remove and re-add the user, or terminate the resource and re-provision it on the desired offering.
- Other partition attributes (
max_cpus_per_node,max_time, QoS, etc.) remain informational — they are exposed via the API but are not pushed into SLURM by the agent.
SLURM Partition Model
The OfferingPartition model maps closely to SLURM's partition_info_t struct and includes comprehensive configuration options for HPC environments.
Partition Parameters
Architecture
cpu_arch: CPU architecture of the partition (e.g.,x86_64/amd/zen3)gpu_arch: GPU architecture of the partition (e.g.,nvidia/cc90,amd/gfx90a)
CPU Configuration
cpu_bind: Default task binding policy (SLURM cpu_bind)def_cpu_per_gpu: Default CPUs allocated per GPUmax_cpus_per_node: Maximum allocated CPUs per nodemax_cpus_per_socket: Maximum allocated CPUs per socket
Memory Configuration (in MB)
def_mem_per_cpu: Default memory per CPUdef_mem_per_gpu: Default memory per GPUdef_mem_per_node: Default memory per nodemax_mem_per_cpu: Maximum memory per CPUmax_mem_per_node: Maximum memory per node
Time Limits
default_time: Default time limit in minutesmax_time: Maximum time limit in minutesgrace_time: Preemption grace time in seconds
Node Configuration
max_nodes: Maximum nodes per jobmin_nodes: Minimum nodes per jobexclusive_topo: Exclusive topology access requiredexclusive_user: Exclusive user access required
Scheduling Configuration
priority_tier: Priority tier for scheduling and preemptionqos: Quality of Service (QOS) namereq_resv: Require reservation for job allocation
Partition Management API
Available Endpoints
Partition management is handled through offering actions, similar to software catalog management:
add_partition: Add a new partition to an offeringupdate_partition: Update partition configurationremove_partition: Remove a partition from an offering
Add Partition to Offering
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Update Partition Configuration
1 2 3 4 5 6 7 8 9 | |
Remove Partition from Offering
1 2 3 4 5 6 7 | |
Partition Software Catalog Associations
Software catalogs can be optionally associated with specific partitions through the partition field in OfferingSoftwareCatalog. This enables partition-specific software availability, allowing different partitions to expose different software sets.
Associating Software Catalogs with Partitions
1 2 3 4 5 6 7 8 9 10 | |
Use Cases for Partition-Specific Software
- Architecture-Specific Partitions: GPU partitions with CUDA libraries, ARM partitions with ARM-optimized software
- License Management: Commercial software available only on specific partitions
- Performance Optimization: Different optimized builds for different hardware configurations
- Access Control: Research groups with access to specialized software on designated partitions
Example Workflow
Here's a complete example of setting up a GPU partition with specialized software:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Partition Architecture Filtering
Partitions can be filtered by their CPU and GPU architecture fields, enabling users to find partitions matching specific hardware requirements.
Available Filters
| Filter | Type | Description |
|---|---|---|
cpu_arch |
string (icontains) | Filter by CPU architecture substring (e.g., zen3, x86_64) |
gpu_arch |
string (icontains) | Filter by GPU architecture substring (e.g., nvidia, cc90) |
has_gpu |
boolean | Filter partitions with (true) or without (false) GPU architecture |
Examples
1 2 3 4 5 6 7 8 9 10 11 | |
Connecting Software to Partitions
The gpu_arch field on partitions and the gpu_architectures field on software targets enable matching software to compatible hardware. For example, to find which partitions can run software requiring nvidia/cc90:
1 2 3 4 5 | |
Integration Considerations
SLURM Configuration Mapping
When configuring OfferingPartition models, ensure the parameters align with your actual SLURM cluster configuration:
- Resource Limits: Set realistic limits that match hardware capabilities
- QOS Integration: Ensure QOS names match those defined in SLURM
- Time Limits: Align with cluster policies and user expectations
- Architecture Targeting: Match CPU families/microarchitectures with actual hardware
Software Catalog Strategy
Consider these approaches when associating software catalogs with partitions:
- Global Catalog: Single catalog available across all partitions
- Partition-Specific: Different catalogs for different partition types
- Hybrid Approach: Base catalog globally + specialized catalogs per partition
Permissions
Partition Management (Offering Managers)
- OfferingPartition: Offering managers can create/modify SLURM partition configurations through offering actions
- Requires
UPDATE_OFFERINGpermission on the offering
Software Catalog Association (Offering Managers)
- OfferingSoftwareCatalog: Offering managers can associate catalogs with partitions through offering actions
- Must have
UPDATE_OFFERINGpermission on the offering
Related Documentation
- Marketplace Software Catalogs - Main software catalog documentation