Integrating Open OnDemand with Waldur and SLURM
Components used
Open OnDemand is an open source software that empowers students, researchers, and industry professionals with remote web access to supercomputers.
Waldur is an open source platform for running HPC and Cloud self service.
SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
Keycloak is an open source software product to allow single sign-on with identity and access management aimed at modern applications and services.
MyAccessID (optional) Identity and Access Management Service is provided by GEANT with the purpose of offering a common Identity Layer for Infrastructure Service Domains (ISDs).
Integration overview
OOD requirements
- Shared user directory storage available both on SLURM and OOD VMs
- Dedicated hostname for OOD machine (like ood.example.com)
- Open TCP/80 and TCP/443, ability to connect LDAP on SLURM management machine
- Linux server with at least 4GB of RAM and 10GB of storage disk
Open OnDemand (OOD) installation and configuration
Follow the guide at https://github.com/OSC/ood-ansible to automatically install OOD on the Linux server.
Preparation guide:
- Setup a certificate to later use in ansible configuration:
1 2 |
|
- In Keycloak create a client with authentication enabled for the Open OnDemand.
- Populate the ansible inventory.yaml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
- Apply the playbook:
1 |
|
User authentication flow
-
OOD gets Linux username from preferred_username claim from Keycloak
-
OOD launches a "Per User Nginx" (PUN) environment after success user login
-
OOD connects to a SLURM cluster with the selected preferred_username
Keycloak configuration
Keycloak acts as a central Identity server for Waldur and Open OnDemand.
Steps to configure Keycloak:
- Configure Waldur and Open Ondemand clients
- Configure identity federation or user self-registration. If identity federation is used, common task is to configure username mapping like mentioned in https://puhuri.neic.no/idp_integration/use-cases/keycloak-integration/
- Install waldur-username-mapper for matching Keycloak or federated users with their respective Linux usernames: https://docs.waldur.com/integrations/waldur-keycloak-mapper/
Open OnDemand cluster configuration
To configure the connection between Open OnDemand and SLURM you need to manually configure the cluster config /etc/ood/config/clusters.d/my_cluster.yml:
1 2 3 4 5 6 7 8 9 |
|
waldur-slurm-agent configuration
Waldur-slurm-agent is a microservice for pulling allocation from Waldur and pushing the allocation usage statistics back to Waldur.
The microservice supports 2 modes of operation:
- docker-compose - testing only, requires SLURM running in the same docker compose
- native - production
Follow https://docs.waldur.com/integrations/waldur-slurm-service/ for installation guide. Make sure to enable the ENABLE_USER_HOMEDIR_ACCOUNT_CREATION flag - Open OnDemand does not work unless the user's home directory exists.
Host-based SSH authentication configuration
One of the methods to allow OOD to connect to SLURM login node is to setup a host-based “trust” or “SSH host-based authentication” between OOD VM and SLURM login node.
Use https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Host-based_Authentication as a guide.
Troubleshooting
- OOD login and preferredUsername fetching errors are located in /var/log/httpd/error.log or similar
- Per user application or SLURM configuration errors are located in /var/log/ondemand-nginx/USERNAME/
- By default, OOD does not well tolerate setting arbitrary prepends for the URL — prefer using https://ood.example.com instead of https://ood.example.com/
- Most common SSSD / LDAP configuration errors include:
- Wrong LDAP filter
- SSSD is not able to reach LDAP server (network error)
- SSSD installed without sssd-ldap plugin
- Home directory of the user has not been created
- Make sure to specify correct SLURM account in the OOD job configuration: