profile
viewpoint

Ask questionsHelm: rabbitmq-ha deployment fails to discover k8s nodes

What happened: When deploying stable/rabbitmq-ha chart via helm the pod fails during startup with following error:

2019-09-11 09:21:15.580 [info] <0.226.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@rabbit-rabbitmq-ha-0.rabbit-rabbitmq-ha-discovery.default.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2019-09-11 09:21:15.580 [info] <0.226.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2019-09-11 09:21:15.581 [info] <0.226.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-09-11 09:21:15.581 [info] <0.226.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-09-11 09:21:15.581 [info] <0.226.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-09-11 09:21:15.632 [info] <0.226.0> Failed to get nodes from k8s - 404
2019-09-11 09:21:15.633 [error] <0.225.0> CRASH REPORT Process <0.225.0> with 0 neighbours exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2019-09-11 09:21:15.633 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"404\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,910}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"404"}},[{rabbit_mnesia,init_from_config,0,[{file

This seems like the the k8s node discovery is not working for some reason.

What you expected to happen: RabbitMQ deployment started up correctly.

How to reproduce it (as minimally and precisely as possible):

kind create cluster
export KUBECONFIG="$(kind get kubeconfig-path)"
kubectl apply -f rbac-config.yaml
helm init --upgrade --wait --service-account tiller
helm install stable/rabbitmq-ha --name rabbit

rbac-config.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

Anything else we need to know?:

Environment:

  • kind version: (use kind version): v0.5.1
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-20T18:57:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 82
 Server Version: 19.03.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.2.11-200.fc30.x86_64
 Operating System: Fedora 30 (Thirty)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.56GiB
 ID: XSSP:U7WK:W2J4:LPU5:5CEV:7HQF:DEBG:WML2:MKZY:XC33:GOBX:SYDR
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
NAME=Fedora
VERSION="30 (Thirty)"
ID=fedora
VERSION_ID=30
VERSION_CODENAME=""
PLATFORM_ID="platform:f30"
PRETTY_NAME="Fedora 30 (Thirty)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:30"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f30/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=30
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=30
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
kubernetes-sigs/kind

Answer questions fhaifler

@BenTheElder I see. It seems thought that they have different error status codes returned (not 404, but 403), but still it can be related. Unfortunately for me this problem is not temporary and persists no matter how long I leave the kind cluster running.

@aojea Thank you for trying it out. Maybe it's some weirdness specific to my system, I couldn't identify the culprit though.

The exported kind logs are below: kind-export.zip

useful!

Related questions

Cluster not starting with DIND setup hot 1
Appears when using ceph of roook: map failed: (30) Read-only file system hot 1
Errors compile: version "go1.12.1" does not match go tool version "go1.10.2" hot 1
Installing Istio fails when trying to match resources hot 1
`kind` should be able to start the control plane when `/dev/kmsg` is not available hot 1
WSLv1: No DNS between pods or to the internet hot 1
source:https://uonfu.com/
Github User Rank List