RabbitMQ Cluster Operator (Production)
For production deployments, it is strongly recommended to use the official RabbitMQ Cluster Kubernetes Operator instead of the Bitnami Helm chart. The operator provides better lifecycle management, high availability, and production-grade features.
Overview
The RabbitMQ Cluster Operator automates:
- Provisioning and management of RabbitMQ clusters
- Scaling and automated rolling upgrades
- Monitoring integration with Prometheus and Grafana
- Backup and recovery operations
- Network policy and security configurations
Prerequisites
- Kubernetes cluster version 1.19 or above
- Configured
kubectl access
- Appropriate RBAC permissions
Installation
1. Install the RabbitMQ Cluster Operator
| kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml"
|
Verify the operator is running:
| kubectl get pods -n rabbitmq-system
|
2. Create a Production RabbitMQ Cluster
Create a production-ready RabbitMQ cluster configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89 | apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: waldur-rabbitmq
namespace: default
spec:
# Production recommendation: use odd numbers (3, 5, 7)
replicas: 3
# Resource configuration
resources:
requests:
cpu: 1000m # 1 CPU core
memory: 2Gi # Keep requests and limits equal for stability
limits:
cpu: 2000m # 2 CPU cores for peak loads
memory: 2Gi
# Persistence configuration
persistence:
storageClassName: "fast-ssd" # Use appropriate storage class
storage: 20Gi # Adjust based on expected message volume
# RabbitMQ configuration
rabbitmq:
additionalConfig: |
# Memory threshold (80% of available memory)
vm_memory_high_watermark.relative = 0.8
# Disk threshold (2GB free space)
disk_free_limit.absolute = 2GB
# Clustering settings
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.node_cleanup.interval = 30
cluster_formation.node_cleanup.only_log_warning = true
# Management plugin
management.tcp.port = 15672
# Enable additional protocols if needed
listeners.tcp.default = 5672
# Logging
log.console = true
log.console.level = info
# Queue master location policy
queue_master_locator = balanced
# Additional plugins
additionalPlugins:
- rabbitmq_management
- rabbitmq_prometheus
- rabbitmq_auth_backend_ldap # If LDAP auth is needed
- rabbitmq_mqtt # If MQTT protocol is needed
- rabbitmq_stomp # If STOMP protocol is needed
# Service configuration
service:
type: ClusterIP
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb # For AWS
# Monitoring
override:
statefulSet:
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "15692"
prometheus.io/path: "/metrics"
# Security and networking
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- rabbitmq
topologyKey: kubernetes.io/hostname
|
Apply the configuration:
| kubectl apply -f waldur-rabbitmq-cluster.yaml
|
Configuration for Waldur
1. Retrieve RabbitMQ Credentials
Get the auto-generated credentials:
| # Get username
kubectl get secret waldur-rabbitmq-default-user -o jsonpath='{.data.username}' | base64 --decode
# Get password
kubectl get secret waldur-rabbitmq-default-user -o jsonpath='{.data.password}' | base64 --decode
|
Update your Waldur values.yaml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | # Disable the bitnami rabbitmq chart
rabbitmq:
enabled: false
# Configure external RabbitMQ connection
global:
waldur:
rabbitmq:
host: "waldur-rabbitmq.default.svc.cluster.local"
port: 5672
auth:
username: "default_user_xxx" # From the secret above
password: "xxx" # From the secret above
vhost: "/"
|
High Availability Configuration
For production high availability, consider these additional configurations:
Pod Disruption Budget
| apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: waldur-rabbitmq-pdb
spec:
minAvailable: 2 # Ensure at least 2 pods are always available
selector:
matchLabels:
app.kubernetes.io/name: waldur-rabbitmq
|
Network Policy (Optional)
Restrict network access to RabbitMQ:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 | apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: waldur-rabbitmq-netpol
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: waldur-rabbitmq
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: waldur
ports:
- protocol: TCP
port: 5672
- from: # Allow management interface access
- podSelector:
matchLabels:
app: monitoring
ports:
- protocol: TCP
port: 15672
- protocol: TCP
port: 15692 # Prometheus metrics
|
Monitoring
The operator automatically enables Prometheus metrics. To access them:
-
Prometheus Metrics Endpoint: http://waldur-rabbitmq:15692/metrics
-
Management UI Access:
| kubectl port-forward service/waldur-rabbitmq 15672:15672
|
Access at: http://localhost:15672
-
Grafana Dashboard: Import RabbitMQ dashboard ID 10991 or similar
Backup and Recovery
Automated Backup Configuration
The operator supports backup configurations through definitions:
| apiVersion: rabbitmq.com/v1beta1
kind: Backup
metadata:
name: waldur-rabbitmq-backup
spec:
rabbitmqClusterReference:
name: waldur-rabbitmq
|
For production, implement external backup strategies using tools like Velero or cloud-native backup solutions.
Scaling
Scale the cluster:
| kubectl patch rabbitmqcluster waldur-rabbitmq --type='merge' -p='{"spec":{"replicas":5}}'
|
Important: Always use odd numbers for replicas (1, 3, 5, 7) to avoid split-brain scenarios.
Troubleshooting
Check Cluster Status
| # Check pods
kubectl get pods -l app.kubernetes.io/name=waldur-rabbitmq
# Check cluster status
kubectl exec waldur-rabbitmq-server-0 -- rabbitmq-diagnostics cluster_status
# Check node health
kubectl exec waldur-rabbitmq-server-0 -- rabbitmq-diagnostics check_running
|
View Logs
| # View operator logs
kubectl logs -n rabbitmq-system deployment/rabbitmq-cluster-operator
# View RabbitMQ logs
kubectl logs waldur-rabbitmq-server-0
|
Migration from Bitnami Chart
If migrating from the Bitnami chart:
- Backup existing data using RabbitMQ management tools
- Deploy the operator and create a new cluster
- Export/import virtual hosts, users, and permissions
- Update Waldur configuration to point to the new cluster
- Test thoroughly before decommissioning the old setup
Security Considerations
-
TLS Configuration: Enable TLS for production:
| spec:
tls:
secretName: waldur-rabbitmq-tls
|
-
Authentication: Consider integrating with LDAP or other authentication backends
-
Network Policies: Implement network policies to restrict access
-
RBAC: Ensure appropriate Kubernetes RBAC policies are in place
For high-throughput scenarios:
- Adjust memory limits based on message volume
- Configure disk I/O with appropriate storage classes
- Tune RabbitMQ parameters in
additionalConfig
- Monitor resource usage and scale accordingly
Support and Documentation
- Official Documentation: https://www.rabbitmq.com/kubernetes/operator/
- GitHub Repository: https://github.com/rabbitmq/cluster-operator
- Examples: https://github.com/rabbitmq/cluster-operator/tree/main/docs/examples
- Community Support: RabbitMQ Discussions on GitHub