Plugin Development Guide
This guide covers everything needed to build a custom backend plugin for Waldur Site Agent. It is written for both human developers and LLM-based code generators.
Waldur Mastermind concepts
Before implementing a plugin, understand how Waldur Mastermind concepts map to plugin operations.
| Waldur concept | Description | Plugin relevance |
|---|---|---|
| Offering | Service catalog entry | Config block per offering; picks backend plugin |
| Resource | Allocation from an offering | CRUD via BaseBackend; keyed by backend_id |
| Order | Create/update/terminate request | Triggers order_process mode |
| Component | Measurable dimension (CPU, RAM) | Defined in backend_components config |
| OfferingUser | User linked to an offering | Username backend generates usernames |
| billing_type | usage or limit |
Metered vs quota accounting |
| backend_id | Resource ID on the backend | Generated by _get_resource_backend_id |
Architecture overview
A resource-management plugin consists of two main classes:
- Backend (inherits
BaseBackend): Orchestrates high-level operations (create resource, collect usage, manage users). - Client (inherits
BaseClient): Handles low-level communication with the external system (CLI commands, API calls).
A separate plugin family covers username management
(AbstractUsernameManagementBackend) — see the dedicated section below.
A single distribution may register both, but the entry point groups are
distinct.
graph TB
WM[Waldur Mastermind<br/>REST API] <-->|Orders, Resources,<br/>Usage, Keys| SA[Site Agent Core<br/>Processor]
SA -->|"user_context<br/>(ssh_keys, plan_quotas)"| BE[YourBackend<br/>BaseBackend]
BE --> CL[YourClient<br/>BaseClient]
CL --> EXT[External System<br/>CLI / API]
BE -.->|backend_metadata| SA
classDef waldur fill:#1E3A8A,stroke:#3B82F6,stroke-width:2px,color:#FFFFFF
classDef core fill:#065F46,stroke:#10B981,stroke-width:2px,color:#FFFFFF
classDef plugin fill:#581C87,stroke:#8B5CF6,stroke-width:2px,color:#FFFFFF
classDef external fill:#92400E,stroke:#F59E0B,stroke-width:2px,color:#FFFFFF
class WM waldur
class SA core
class BE,CL plugin
class EXT external
BaseBackend method reference
Abstract methods (must implement)
ping(raise_exception: bool = False) -> bool
- Mode: All (health check)
- Purpose: Verify backend connectivity.
- No-op: Return
False.
diagnostics() -> bool
- Mode: Diagnostics CLI
- Purpose: Log diagnostic info and return health status.
- No-op: Log a message, return
True.
list_components() -> list[str]
- Mode: Diagnostics
- Purpose: Return component types available on the backend.
- No-op: Return
[].
_get_usage_report(resource_backend_ids: list[str]) -> dict
- Mode:
report,membership_sync - Purpose: Collect usage data for resources.
- Return format:
1 2 3 4 5 6 7 | |
- Key rules:
- Component keys must match
backend_componentsconfig keys. - Values must be in Waldur units (after
unit_factorconversion). TOTAL_ACCOUNT_USAGEis required and must equal the sum of per-user values.- No-op: Return
{}.
_collect_resource_limits(waldur_resource) -> tuple[dict, dict]
- Mode:
order_process(resource creation) - Purpose: Convert Waldur limits to backend limits and back.
- Returns:
(backend_limits, waldur_limits)wherebackend_limitshas values multiplied byunit_factor. - No-op: Return
({}, {}).
_pre_create_resource(waldur_resource, user_context=None) -> None
- Mode:
order_process(resource creation) - Purpose: Set up prerequisites before resource creation (e.g., parent accounts).
user_contextcontains pre-resolved data:ssh_keys(UUID → public key),plan_quotas(component → value),team,offering_users.- No-op: Use
pass.
downscale_resource(resource_backend_id: str) -> bool
- Mode:
membership_sync - Purpose: Restrict resource capabilities (e.g., set restrictive QoS).
- No-op: Return
True.
pause_resource(resource_backend_id: str) -> bool
- Mode:
membership_sync - Purpose: Prevent all usage of the resource.
- No-op: Return
True.
restore_resource(resource_backend_id: str) -> bool
- Mode:
membership_sync - Purpose: Restore resource to normal operation.
- No-op: Return
True.
get_resource_metadata(resource_backend_id: str) -> dict
- Mode:
membership_sync - Purpose: Return backend-specific metadata for Waldur.
- No-op: Return
{}.
Hook methods (override as needed)
These have default implementations in BaseBackend. Override only when your
backend needs custom behavior.
| Method | Default | When to override |
|---|---|---|
post_create_resource |
No-op | Post-creation setup; set resource.backend_metadata to push data to Waldur |
_pre_delete_resource |
No-op | Pre-deletion cleanup (cancel jobs) |
post_delete_resource |
No-op | Post-deletion cleanup (e.g., remove child offerings linked to the resource) |
_pre_delete_user_actions |
No-op | Per-user cleanup before removal |
process_existing_users |
No-op | Process existing users (homedirs) |
check_pending_order |
Returns True |
Non-blocking order creation (see below) |
evaluate_pending_order |
Returns ACCEPT |
Custom approval logic for pending orders (see below) |
setup_target_event_subscriptions |
Returns [] |
STOMP subscriptions to target systems |
get_usage_report_for_period |
Returns {} |
Historical usage queries for past billing periods |
has_prepaid_components |
Returns False |
Enable duration-aware limit calculation for prepaid billing |
sync_resource_end_date |
No-op | Synchronise end_date between source and target Waldur instances |
sync_resource_effective_id |
No-op | Reflect downstream backend_id as effective_id on the source resource |
sync_resource_project |
No-op | Push project metadata to backends that manage their own projects |
update_user_attributes |
No-op | Forward OFFERING_USER attribute updates to the backend |
sync_offering_user_usernames |
Returns False |
Pull backend-assigned usernames into Waldur (federation) |
create_user_homedirs |
Provided | Override only to customise homedir quota or path logic |
Non-blocking order creation (optional)
Backends that create resources via remote APIs can use non-blocking order
creation. Instead of blocking until the remote operation completes, the backend
returns immediately with a pending_order_id in BackendResourceInfo.
To opt in, set the supports_async_orders class attribute to True. The
processor only inspects order.backend_id for async tracking when this flag
is enabled, which prevents conflicts with external systems (e.g. SharePoint)
that may set order.backend_id for unrelated purposes.
1 2 | |
When enabled, the core processor:
- Sets the source order's
backend_idto thepending_order_id - Keeps the order in
EXECUTINGstate - On subsequent polling cycles, calls
check_pending_order(backend_id)to check completion - When
check_pending_order()returnsTrue, marks the source order asDONE
Backends that opt in will typically also override handled_resource_states
to include ResourceState.CREATING so that user/limit sync runs while the
remote order is still in flight.
check_pending_order(order_backend_id: str) -> bool
- Default: Returns
True(no async orders, always "complete") - Override when: Your backend uses non-blocking resource creation
- Returns:
Trueif the remote order completed,Falseif still pending - Raises:
BackendErrorif the remote order failed or was cancelled
Example (Waldur federation plugin):
1 2 3 4 5 6 7 | |
setup_target_event_subscriptions(source_offering, user_agent, global_proxy) -> list
- Default: Returns
[](no target subscriptions) - Override when: Your backend supports STOMP events from a target system
- Returns: List of
StompConsumertuples for lifecycle management - Called by:
event_processmode during STOMP setup
Pending order evaluation (optional)
When an order arrives in PENDING_PROVIDER state, the agent calls
evaluate_pending_order on the backend before taking any action. The
default implementation returns ACCEPT, which preserves the existing
auto-approve behaviour. Override this method to implement custom
approval logic.
evaluate_pending_order(order, waldur_rest_client) -> PendingOrderDecision
- Default: Returns
PendingOrderDecision.ACCEPT - Override when: You need to inspect or gate orders before approval
- Parameters:
order(OrderDetails) — full order data includingproject_uuid,customer_uuid,created_by_*,attributes, andconsumer_message/provider_messagefields.waldur_rest_client(AuthenticatedClient) — authenticated client for fetching additional data from the Waldur API (e.g., project members and roles).- Returns one of:
PendingOrderDecision.ACCEPT— approve the orderPendingOrderDecision.REJECT— reject the orderPendingOrderDecision.PENDING— keep waiting; the order will be re-evaluated on the next polling cycle
Note: This is the only hook that receives
waldur_rest_client. Other backend methods receive Waldur data viauser_contextinstead.
Use cases
| Scenario | Approach |
|---|---|
| Wait for a PI | Query project members, return PENDING until a PI role exists |
| Reject unprocessable orders | Inspect order.attributes, return REJECT |
| Require a signed agreement | Set provider_message, return PENDING until consumer_message is set |
Example: wait for a PI before approving
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Example: reject orders that lack a required attribute
1 2 3 4 5 6 7 8 9 | |
Username management backends
Username management is handled by a separate plugin family inheriting
from AbstractUsernameManagementBackend (defined in
waldur_site_agent/backend/backends.py). These backends generate or look
up local-IDP usernames for OfferingUser records and are wired in the
config via username_management_backend.
Required methods
| Method | Purpose |
|---|---|
generate_username(offering_user) -> str |
Create a new local username. Return "" if generation is not supported. |
get_username(offering_user) -> Optional[str] |
Look up the existing local username for the user, or None. |
get_or_create_username is provided by the base class and calls
get_username first, then generate_username only if no username is
found.
Optional hook methods
| Method | Default | When to override |
|---|---|---|
sync_user_profiles(offering_users) |
No-op | Push user profiles to the IDP before membership sync runs |
deactivate_users(usernames) |
No-op | Remove departed users from the external system |
Error signalling for user-action gates
When username generation requires the user to take an action (e.g. link
an existing IdP account, complete a validation form), raise one of these
exceptions from waldur_site_agent.backend.exceptions:
OfferingUserAccountLinkingRequiredError(comment, comment_url=None)— user must link an existing account.OfferingUserAdditionalValidationRequiredError(comment, comment_url=None)— additional validation is required.
The processor moves the offering user into a PENDING_ACCOUNT_LINKING /
PENDING_ADDITIONAL_VALIDATION state and surfaces the comment (and URL)
to the operator. See docs/offering-users.md for the full state machine.
Entry point group
Register username management plugins under a different entry point group than resource backends:
1 2 | |
The fallback entry point name is base, provided by the
waldur-site-agent-basic-username-management plugin.
BaseClient method reference
All methods below are abstract and must be implemented.
| Method | Signature | Purpose |
|---|---|---|
list_resources |
() -> list[ClientResource] |
List all resources on backend |
get_resource |
(resource_id) -> ClientResource or None |
Get single resource or None |
create_resource |
(name, description, organization, parent_name=None) -> str |
Create resource |
delete_resource |
(name) -> str |
Delete resource |
set_resource_limits |
(resource_id, limits_dict) -> str or None |
Set limits (backend units) |
get_resource_limits |
(resource_id) -> dict[str, int] |
Get limits (backend units) |
get_resource_user_limits |
(resource_id) -> dict[str, dict[str, int]] |
Per-user limits |
set_resource_user_limits |
(resource_id, username, limits_dict) -> str |
Set per-user limits |
get_association |
(user, resource_id) -> Association or None |
Check user-resource link |
create_association |
(username, resource_id, default_account=None) -> str |
Create user-resource link |
delete_association |
(username, resource_id) -> str |
Remove user-resource link |
get_usage_report |
(resource_ids, timezone=None) -> list |
Raw usage data from backend |
list_resource_users |
(resource_id) -> list[str] |
List usernames for resource |
Important: BaseClient also provides:
execute_command(command, silent=False)for running CLI commands with error handling — use it for CLI-based backends.create_linux_user_homedir(username, umask="")which shells out to/sbin/mkhomedir_helper. Override only if your backend creates home directories some other way; otherwise it works as-is for SLURM-style Linux deployments.
Agent mode method matrix
This table shows which BaseBackend methods are called by each agent mode.
| Method | order_process | report | membership_sync | event_process |
|---|---|---|---|---|
ping |
startup | startup | startup | startup |
create_resource / create_resource_with_id |
CREATE order | - | - | CREATE event |
_pre_create_resource |
CREATE order | - | - | CREATE event |
post_create_resource |
CREATE order | - | - | CREATE event |
_collect_resource_limits |
CREATE order | - | - | CREATE event |
check_pending_order |
CREATE order (async) | - | - | CREATE event (async) |
evaluate_pending_order |
pending-provider orders | - | - | - |
set_resource_limits |
UPDATE order | - | - | UPDATE event |
delete_resource |
TERMINATE order | - | - | TERMINATE event |
_pre_delete_resource |
TERMINATE order | - | - | TERMINATE event |
pull_resource / pull_resources |
CREATE order | usage pull | sync cycle | various events |
_get_usage_report |
- | usage pull | sync cycle | - |
add_users_to_resource |
post-create | - | user sync | role events |
remove_users_from_resource |
- | - | user sync | role events |
add_user / remove_user |
- | - | role changes | role events |
downscale_resource |
- | - | status sync | - |
pause_resource |
- | - | status sync | - |
restore_resource |
- | - | status sync | - |
get_resource_metadata |
- | - | status sync | - |
setup_target_event_subscriptions |
- | - | - | STOMP setup |
list_resources |
- | import | - | import event |
get_resource_limits |
- | import | - | import event |
get_resource_user_limits |
- | - | limits sync | - |
set_resource_user_limits |
- | - | limits sync | - |
process_existing_users |
- | - | user sync | - |
Usage report format specification
The _get_usage_report method must return data in this exact structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Rules
- Component keys must exactly match those in the
backend_componentsYAML config. - Values must be integers in Waldur units (i.e., divide raw backend values
by
unit_factor). TOTAL_ACCOUNT_USAGEis a required key and must equal the sum of all per-user values for each component.- If a resource has no usage, return
{"TOTAL_ACCOUNT_USAGE": {"cpu": 0, "mem": 0, ...}}. - If usage reporting is not supported, return
{}(empty dict).
Example: SLURM CPU and memory
Given config:
1 2 3 4 5 6 7 | |
If SLURM reports 120000 cpu-minutes and 122880 MB-minutes for user1:
1 2 3 4 5 6 | |
Calculation: 120000 / 60000 = 2, 122880 / 61440 = 2.
Capability flags and class attributes
BaseBackend exposes three class-level capability flags that change how
the core processor treats your backend.
supports_decreasing_usage: bool = False
Set to True if usage values can decrease between reports (e.g., a
storage backend reporting current disk usage rather than accumulated
compute time).
1 2 | |
When False (default), the reporting processor skips updates where the
new usage value is lower than the previously reported value, treating it
as a data anomaly.
supports_cycle_preflight: bool = False
Set to True for backends that call a remote API during order processing.
The order processor runs run_preflight() once per offering per cycle
before listing orders. The default implementation calls ping() and raises
BackendNotReadyError on failure so orders stay pending instead of ERRED.
Override run_preflight() to probe specific endpoints. Opt in from plugins
such as Waldur federation and other HTTP backends can enable it when needed.
supports_async_orders: bool = False
Set to True for backends that complete order creation asynchronously
on a remote system and report progress via pending_order_id. See the
"Non-blocking order creation" section above for the full flow.
1 2 | |
handled_resource_states: list = [ResourceState.OK, ResourceState.ERRED]
Controls which resource states the membership processor fetches and
processes. Override when your backend needs to manage users on resources
that are still being provisioned (e.g., async backends that include
CREATING).
1 2 3 4 | |
Decision matrix for no-op implementations
If your backend does not support a certain operation, use these return values:
| Method | No-op return | Meaning |
|---|---|---|
ping |
False |
Backend has no health check |
diagnostics |
True |
Diagnostics not implemented but OK |
list_components |
[] |
No component discovery |
_get_usage_report |
{} |
No usage reporting |
_collect_resource_limits |
({}, {}) |
No limits support |
_pre_create_resource |
pass |
No pre-creation setup |
downscale_resource |
True |
No downscaling concept |
pause_resource |
True |
No pausing concept |
restore_resource |
True |
No restore concept |
get_resource_metadata |
{} |
No metadata |
Annotated YAML configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
unit_factor explained
The unit_factor converts between Waldur display units and backend-native units:
backend_value = waldur_value * unit_factorwaldur_value = backend_value / unit_factor
Examples:
- CPU k-Hours to SLURM cpu-minutes:
unit_factor = 60000(60 min x 1000) - GB-Hours to SLURM MB-minutes:
unit_factor = 61440(60 min x 1024 MB) - GB to GB (no conversion):
unit_factor = 1
Entry point registration
Register your plugin in pyproject.toml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
The entry point name (e.g., mycustom) is what users put in
backend_type, order_processing_backend, or
username_management_backend in the config YAML. The four entry-point
groups are independent — a single distribution may register some or all
of them.
Processor-plugin data flow
Plugins generally do not have direct access to the Waldur API client.
The core processor pre-resolves any Waldur data the plugin might need and
passes it via user_context. Plugins return metadata to Waldur by setting
resource.backend_metadata.
Exception:
evaluate_pending_orderreceiveswaldur_rest_clientdirectly, because the order has not been approved yet and no resource context exists at that point.
Pre-resolved data in user_context
The processor enriches the user_context dict before calling backend
methods. Plugins read from it without making API calls:
| Key | Type | Contents |
|---|---|---|
team |
list[dict] |
Team members with usernames |
offering_users |
list[dict] |
Offering users |
ssh_keys |
dict[str, str] |
Mapping of SSH key UUID → public key text |
plan_quotas |
dict[str, int] |
Plan component quotas (component key → value) |
Returning metadata via backend_metadata
To push metadata back to Waldur (e.g., access credentials, connection
endpoints), set resource.backend_metadata in post_create_resource:
1 2 3 4 5 6 7 8 | |
Data flow
sequenceDiagram
participant P as Processor
participant B as YourBackend
participant W as Waldur API
participant E as External System
P->>P: Fetch service provider
P->>B: Set service_provider_uuid
Note over P,B: Resource creation order arrives
P->>W: Resolve SSH keys, plan quotas
W-->>P: Pre-resolved data
P->>B: _pre_create_resource(resource, user_context)
B->>B: Read ssh_keys, plan_quotas from user_context
B->>E: Create resource with resolved data
E-->>B: Resource created
P->>B: post_create_resource(resource, waldur_resource, user_context)
B->>B: Set resource.backend_metadata
B-->>P: Return
P->>W: Push backend_metadata to Waldur
Example: resolving an SSH key UUID from user_context
When a resource attribute contains a UUID reference (e.g., an SSH key UUID
from the order form), look it up in the pre-resolved ssh_keys dict:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
Design principles
- Plugins should avoid importing
waldur_api_clientfor runtime API calls. All Waldur data should come viauser_contextorBaseBackendattributes. The exception isevaluate_pending_order, which receiveswaldur_rest_clientfor querying project or order data before approval. service_provider_uuidis still set onBaseBackendby the processor and can be read by plugins for constructing backend-side identifiers.- Handle missing context gracefully —
user_contextmay beNoneor missing keys in unit tests. Always default to{}or empty values.
Common pitfalls
1. Unit factor direction
The unit_factor converts from Waldur units to backend units by multiplication.
When reporting usage back, you must divide by unit_factor. Getting this
backwards causes limits to be set at 1/60000th of the intended value
or usage to be reported 60000x too high.
2. Missing TOTAL_ACCOUNT_USAGE
The _get_usage_report return dict must include a "TOTAL_ACCOUNT_USAGE" key
for each resource. If missing, the core will substitute zeros, and reported
usage will appear as zero in Waldur.
3. Entry point not discovered
Common causes:
- Package not installed (
uv sync --all-packages) - Entry point group name misspelled. Resource backends use
"waldur_site_agent.backends"; username management uses"waldur_site_agent.username_management_backends"(note the plural and the suffix). Validation schemas use"waldur_site_agent.component_schemas"/"waldur_site_agent.backend_settings_schemas". - Entry point value points to wrong class or module
Debug with:
1 2 3 | |
4. Forgetting super().init()
Your backend __init__ must call super().__init__(backend_settings, backend_components).
This sets up self.backend_settings, self.backend_components, and
self.client. Then assign your own client:
1 2 3 4 | |
5. Returning wrong types from client methods
get_resourcemust returnNone(not raise) when resource is absent.get_associationmust returnNone(not raise) when no association exists.list_resourcesmust returnlist[ClientResource], not raw dicts.
6. Component key mismatch
Component keys in _get_usage_report must exactly match the keys in
backend_components config. If config has "cpu" but you report "CPU",
the usage will be silently ignored.
Testing guidance
What to test per mode
| Mode | Test focus |
|---|---|
order_process |
create_resource, delete_resource, limit conversion |
report |
_get_usage_report format, unit conversion math |
membership_sync |
add_user, remove_user, pause/restore |
| All | ping, error handling, edge cases |
Mock patterns
Mock the client to avoid needing a real backend:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Fixtures
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Key assertions
1 2 3 4 5 6 7 8 9 | |
LLM implementation checklist
When implementing a new backend plugin with an LLM, follow these steps in order:
- Read existing plugins: Study
plugins/slurm/andplugins/mup/for patterns. - Copy the template: Start from
docs/plugin-template/and rename. - Implement
__init__: Callsuper().__init__(), setbackend_type, create client. - Implement
BaseClientmethods: Start withget_resource,create_resource,delete_resource,list_resources. - Implement
BaseBackendabstract methods: Start withping, then_pre_create_resource, then_collect_resource_limits, then_get_usage_report. - Handle unit conversion: Verify
unit_factormath in both directions. - Write tests: Mock the client, test each abstract method.
- Register entry points: Add to
pyproject.toml. - Test integration: Install with
uv sync --all-packagesand runwaldur_site_diagnostics. - Verify: Run
uv run pytestanduvx prek run --all-files.
Files to study
waldur_site_agent/backend/backends.py—BaseBackend(resource backends) andAbstractUsernameManagementBackend(username plugins), plusPendingOrderDecisionenum.waldur_site_agent/backend/clients.py— Base client class.waldur_site_agent/backend/structures.py— Data structures (ClientResource,Association,BackendResourceInfowith fieldsbackend_id,parent_id,effective_id,users,usage,limits,pending_order_id,backend_metadata).waldur_site_agent/backend/exceptions.py—BackendError,DuplicateResourceError, and theOfferingUser*RequiredErrorexceptions used by username backends.waldur_site_agent/common/plugin_schemas.py—PluginComponentSchemaandPluginBackendSettingsSchemabase classes for optional config validation entry points.plugins/slurm/waldur_site_agent_slurm/backend.py— Reference implementation (CLI-based).plugins/mup/waldur_site_agent_mup/backend.py— Reference implementation (API-based).plugins/waldur/waldur_site_agent_waldur/backend.py— Reference forsupports_async_orders,handled_resource_states, and thesync_resource_*hooks.plugins/basic_username_management/— Minimal reference forAbstractUsernameManagementBackend.
Common mistakes to avoid
- Do not forget
super().__init__(backend_settings, backend_components). - Do not return raw dicts from
list_resources; returnClientResourceobjects. - Do not raise exceptions from
get_resourcewhen resource is absent; returnNone. - Do not forget the
"TOTAL_ACCOUNT_USAGE"key in usage reports. - Do not confuse Waldur units with backend units in
_collect_resource_limits. - Do not hardcode component keys; read them from
self.backend_components.