Waldur Site Agent - Rancher Plugin
This plugin enables integration between Waldur Site Agent and Rancher for Kubernetes project management with optional Keycloak user group integration.
Features
- Rancher Project Management: Creates and manages Rancher projects with resource-specific naming
- OIDC Group Integration: Creates hierarchical Keycloak groups that map to Rancher project roles via OIDC
- Automatic User Management: Adds/removes users from Keycloak groups based on Waldur project membership
- Resource Quotas: Sets CPU and memory limits as Rancher project quotas
- Usage Reporting: Reports actual allocated resources (CPU, memory, storage) from Kubernetes
- Complete Lifecycle: Creates groups, binds to projects, manages users, cleans up empty groups
- Enhanced Descriptions: Project descriptions include customer and project names for clarity
Architecture
The plugin follows the Waldur Site Agent plugin architecture and consists of:
- RancherBackend: Main backend implementation that orchestrates project and user management
- RancherClient: Handles Rancher API operations for project management
- KeycloakClient: Manages Keycloak groups and user memberships
Key Architecture Features
- Resource-Specific Naming: Rancher projects named after resource slugs for better identification
- OIDC-Based Access: No direct user-to-Rancher assignments; all access via Keycloak groups
- Enhanced Backend Interface: Full
WaldurResourcecontext available to all backend methods - Automatic Cleanup: Empty groups and role bindings automatically removed
- Real-World Validated: Tested with actual Rancher and Keycloak instances
Installation
- Install the plugin using uv:
1 | |
- The plugin will be automatically discovered via Python entry points.
Setup Requirements
Rancher Server Setup
Required Rancher Credentials
- Rancher Server: Accessible Rancher instance
- API Access: Unscoped API token with cluster access
- Cluster ID: Target cluster ID (format:
c-xxxxx, notc-xxxxx:p-xxxxx)
Creating Rancher API Tokens
- Login to Rancher UI
- Navigate to: User Profile → API & Keys
- Create Token:
- Name:
waldur-site-agent - Scope:
No Scope(unscoped for full access) - Expires: Set appropriate expiration
- Save: Access Key and Secret Key
- Find Cluster ID: In Rancher UI, cluster URL shows cluster ID (e.g.,
c-j8276)
Keycloak Setup (Optional)
Required for OIDC Group Integration
- Keycloak Server: Accessible Keycloak instance
- Target Realm: Where user accounts and groups will be managed
- Service User: User with group management permissions
Creating Keycloak Service User
- Login to Keycloak Admin Console
- Select Target Realm: (e.g.,
your-realm) - Create User:
- Username:
waldur-site-agent-rancher - Email Verified: Yes
- Enabled: Yes
- Set Password: In Credentials tab (temporary: No)
- Assign Roles: In Role Mappings tab
- Client Roles →
realm-management - Add:
manage-users(sufficient for group operations)
Waldur Marketplace Setup
Required Waldur Configuration
- Marketplace Offering: Created in Waldur
- Components: Configured via
waldur_site_load_components - Offering State: Must be
Activefor order processing
Setting Up Offering Components
- Create configuration file with component definitions
- Run component loader:
1 | |
- Activate offering in Waldur Admin UI (change from Draft to Active)
Complete Setup Example
Step 1: Create Configuration File
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
Step 2: Load Components
1 | |
Step 3: Activate Offering
- Login to Waldur Admin UI
- Navigate to: Marketplace → Provider Offerings
- Find your offering and change state from
DrafttoActive
Step 4: Start Order Processing
1 | |
Step 5: Verify Setup
1 | |
Configuration
Basic Configuration (Rancher only)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
Full Configuration (with Keycloak)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
Configuration Reference
Rancher Settings (matching waldur-mastermind format)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
backend_url |
string | Yes | - | Rancher server URL (e.g., https://rancher.example.com) |
username |
string | Yes | - | Rancher access key (called username in waldur-mastermind) |
password |
string | Yes | - | Rancher secret key |
cluster_id |
string | Yes | - | Rancher cluster ID (e.g., c-m-1234abcd, not c-m-1234abcd:p-xxxxx) |
verify_cert |
boolean | No | true | Whether to verify SSL certificates |
project_prefix |
string | No | "waldur-" | Prefix for created Rancher project names |
default_role |
string | No | "workloads-manage" | Default role assigned to users in Rancher |
keycloak_use_user_id |
boolean | No | true | Use Keycloak user ID for lookup (false = use username) |
namespace_labels |
map | No | {} | Extra labels applied to namespaces (e.g., gpu-pool: h100-2x) |
Keycloak Settings (optional, matching waldur-mastermind format)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
keycloak_enabled |
boolean | No | false | Enable Keycloak integration |
keycloak.keycloak_url |
string | Conditional | - | Keycloak server URL |
keycloak.keycloak_realm |
string | Conditional | "waldur" | Keycloak realm name |
keycloak.keycloak_user_realm |
string | Conditional | "master" | Keycloak user realm for auth |
keycloak.keycloak_username |
string | Conditional | - | Keycloak admin username |
keycloak.keycloak_password |
string | Conditional | - | Keycloak admin password |
keycloak.keycloak_ssl_verify |
boolean | No | true | Whether to verify SSL certificates |
Usage
Running the Agent
Start the agent with your configuration file:
1 | |
Diagnostics
Run diagnostics to check connectivity:
1 | |
Supported Agent Modes
- order_process: Creates and manages Rancher projects based on Waldur resource orders
- membership_sync: Synchronizes user memberships between Waldur and Rancher/Keycloak
- report: Reports resource usage from Rancher projects to Waldur
Project Management
Project Creation
When a Waldur resource (representing project access) is created:
- A Rancher project is created with the name
{project_prefix}{waldur_project_slug} - If Keycloak is enabled, hierarchical groups are created:
- Parent Group:
c_{cluster_uuid_hex}(cluster-level access) - Child Group:
project_{project_uuid_hex}_{role_name}(project + role access) - Resource quotas are applied to the Rancher project
- OIDC binds the Keycloak groups to Rancher project roles
User Management
When users are added to a Waldur resource:
- User is added to the Rancher project with the configured role
- If Keycloak is enabled, user is added to the child group (
project_{project_uuid_hex}_{role_name}) - OIDC automatically grants the user access to the Rancher project based on group membership
When users are removed:
- User is removed from the Rancher project
- If Keycloak is enabled, user is removed from the project role group
Naming Convention
The plugin follows the waldur-mastermind Rancher plugin naming patterns:
- Rancher Project Name:
{project_prefix}{waldur_resource_slug}(configurable prefix) - Keycloak Parent Group:
c_{cluster_uuid_hex}(cluster access) - Keycloak Child Group:
project_{project_uuid_hex}_{role_name}(project + role access)
Where:
{project_prefix}is configurable (default:waldur-){waldur_resource_slug}is the Waldur resource slug (more specific than project slug){cluster_uuid_hex}is the cluster UUID in hex format{project_uuid_hex}is the Waldur project UUID in hex format (for permissions){role_name}is configurable (default:workloads-manage)
Supported Components and Accounting Model
The plugin supports the following resource components (all with billing_type: "limit"):
- CPU: Measured in cores
- Memory: Measured in GB
- Storage: Measured in GB
- GPU: Measured in units (optional, see GPU Scheduling)
Accounting Model
Project Limits (Quotas):
- Only CPU and memory limits are set as Rancher project quotas
- Storage is not enforced as quotas (reported only)
Usage Reporting (for all components):
All components report actual allocated resources:
- CPU: Sum of all container CPU requests in the project
- Memory: Sum of all container memory requests in the project
- Storage: Sum of all persistent volume claims in the project
Accounting Flow
- Project Creation: CPU and memory limits → Rancher project quotas
- Usage Reporting: All components → actual allocated resources from Kubernetes
GPU Scheduling
The plugin supports GPU quota management via Kubernetes ResourceQuotas combined with
namespace-label-based scheduling. This approach uses a single generic nvidia.com/gpu
resource for quotas, and a Kyverno policy on the cluster to steer GPU workloads
to the correct node pool based on a namespace label.
How It Works
- Site agent creates a namespace with a
gpu-poollabel (configured vianamespace_labels) - Site agent applies a ResourceQuota with
requests.nvidia.com/gpu: N(generic GPU count) - Kyverno policy on the cluster reads the namespace's
gpu-poollabel and injects a matchingnodeSelectorinto every pod, ensuring GPU workloads land on the correct nodes - GPU nodes are labeled with
gpu-pool=<type>(e.g.,gpu-pool=h100-2x)
Offering Setup
Create a separate Waldur offering per GPU pool type. Each offering's backend config specifies the GPU pool label:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
Required Kubernetes Setup
1. Label GPU Nodes
Each GPU node must be labeled with its pool type matching the namespace_labels values:
1 2 3 | |
2. Deploy Kyverno Policy
A Kyverno ClusterPolicy must be deployed on the cluster to inject nodeSelector
into pods based on the namespace's gpu-pool label. This is the minimal policy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
This policy:
- Matches all Pods in every namespace
- Fetches the namespace's
gpu-poollabel via Kubernetes API - Precondition: only fires if the label is non-empty (CPU-only namespaces are unaffected)
- Mutates: injects
nodeSelector: {gpu-pool: "<value>"}into the Pod spec - Kyverno auto-generates equivalent rules for Deployments, StatefulSets, Jobs, CronJobs, etc.
3. Optional: Default No-GPU Quota Policy
To prevent accidental GPU usage in namespaces without a GPU pool label, deploy a generate policy that creates a zero-GPU ResourceQuota:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
Complete Workflow
The plugin provides end-to-end automation for Rancher project and user management:
Order Processing
- Order Detection: Monitors Waldur for new resource orders
- Project Creation: Creates Rancher project named
{prefix}{resource_slug} - Enhanced Descriptions: Includes customer and project context
- Quota Management: Sets CPU and memory limits if specified
- OIDC Setup: Creates and binds Keycloak groups to project roles
Membership Sync
- User Detection: Monitors Waldur for user membership changes
- Group Management: Creates missing Keycloak groups if needed
- User Addition: Adds users to appropriate Keycloak groups
- User Removal: Removes users when removed from Waldur projects
- Cleanup: Removes empty groups and their Rancher role bindings
OIDC Integration Flow
- Keycloak Groups:
c_{cluster_hex}(parent) →project_{project_slug}_{role}(child) - Group Binding:
keycloakoidc_group://{group_name}bound to Rancher project role - User Management: Users added to Keycloak groups only (not directly to Rancher)
- Automatic Access: OIDC grants Rancher project access based on group membership
Error Handling
- Rancher connectivity issues will be logged and retried
- Keycloak failures will be logged but won't stop Rancher operations
- Invalid configurations will be detected during diagnostics
- Missing users in Keycloak will be logged as warnings
Development
Running Tests
1 | |
Code Quality
1 | |
Troubleshooting
Common Issues
1. Order Processing Disabled
1 | |
Solution: Add backend configuration to your offering:
1 2 3 | |
2. Rancher Authentication Fails (401 Unauthorized)
1 | |
Solutions:
- Verify access key and secret key are correct
- Ensure token is unscoped (not cluster-specific)
- Check token hasn't expired
- Verify API URL format:
https://your-rancher.com(without/v3)
3. Keycloak Connection Fails (404)
1 | |
Solutions:
- Verify Keycloak URL (try with/without
/auth/suffix) - Check realm name is correct
- Ensure user exists in the specified realm
4. Keycloak Group Creation Fails (403 Forbidden)
1 | |
Solution: Grant user manage-users role:
- Realm: Select target realm
- Users → Your service user
- Role Mappings → Client Roles →
realm-management - Add:
manage-users
5. Cluster ID Format Error
1 | |
Solution: Use correct format:
- ✅ Correct:
c-j8276(cluster ID only) - ❌ Incorrect:
c-j8276:p-xxxxx(project reference)
6. Component Loading Fails
1 | |
Solution: Use correct component format:
1 2 3 4 5 6 7 | |
Logging
Enable debug logging to see detailed operation logs:
1 2 | |
Diagnostic Commands
Run comprehensive diagnostics:
1 | |
This will test:
- Rancher API connectivity and authentication
- Keycloak connectivity and permissions (if enabled)
- Project listing capabilities
- Backend discovery and initialization
- Component configuration validity
Verification Commands
Test individual components:
1 2 3 4 5 6 7 8 9 10 | |