profile
viewpoint

Ask questionscreate HA cluster is flaky

<!-- Please use this template while reporting a bug and provide as much info as possible. Not doing so may result in your bug not being addressed in a timely manner. Thanks!-->

What happened: i started seeing odd failures in the kind-master and -1.14 kubeadm jobs: https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kind-master https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kind-1.14

after switching to this HA config:

# a cluster with 3 control-planes and 3 workers
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
I0604 19:15:09.075770     760 join.go:480] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0604 19:15:10.310249     760 round_trippers.go:438] GET https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 401 Unauthorized in 1233 milliseconds
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized 
 ✗ Joining more control-plane nodes 🎮
DEBU[22:15:10] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}} --filter label=io.k8s.sigs.kind.cluster=kind] 
$KUBECONFIG is still set to use /home/lubo-it/.kube/kind-config-kind even though that file has been deleted, remember to unset it
DEBU[22:15:10] Running: /usr/bin/docker [docker rm -f -v kind-control-plane2 kind-control-plane kind-control-plane3 kind-worker kind-worker3 kind-worker2 kind-external-load-balancer] 
⠈⠁ Joining more control-plane nodes 🎮 Error: failed to create cluster: failed to join a control plane node with kubeadm: exit status 1

What you expected to happen: no errors.

How to reproduce it (as minimally and precisely as possible):

cd kind-src-path
GO111MODULE=on go build
# install the kind binary to PATH
cd kubernetes-src-path
kind build node-image --kube-root=$(pwd)
kind create cluster --config=<path-to-above-ha-config> --image kindest/node:latest

Anything else we need to know?:

  • i cannot reproduce the bug without --loglevel=debug.
  • sometimes it fails during joining the extra CP nodes, something during joining the workers.

Environment:

  • kind version: (use kind version): master at 43bf0e2594db
  • Kubernetes version: master at 1409ff38e5828f55
  • Docker version: (use docker info):
Containers: 10
 Running: 7
 Paused: 0
 Stopped: 3
Images: 128
Server Version: 18.06.3-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-41-generic
Operating System: Ubuntu 17.10
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.66GiB
Name: luboitvbox
ID: K2H6:2I6N:FSBZ:S77V:R5CQ:X22B:VYTF:WZ4R:UIKC:HGOT:UCHD:GCR2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful

/kind bug /priority important-soon (?)

kubernetes-sigs/kind

Answer questions BenTheElder

$ docker logs kind-external-load-balancer 
[WARNING] 174/214223 (1) : config : missing timeouts for frontend 'controlPlane'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 174/214223 (1) : New worker #1 (6) forked
[WARNING] 174/214225 (1) : Reexecuting Master process
[ALERT] 174/214225 (1) : sendmsg()/writev() failed in logger #1: No such file or directory (errno=2)
[WARNING] 174/214225 (6) : Stopping frontend controlPlane in 0 ms.
[WARNING] 174/214225 (6) : Stopping backend kube-apiservers in 0 ms.
[WARNING] 174/214225 (6) : Stopping frontend GLOBAL in 0 ms.
[WARNING] 174/214225 (6) : Proxy controlPlane stopped (FE: 0 conns, BE: 0 conns).
[NOTICE] 174/214225 (1) : New worker #1 (22) forked
[WARNING] 174/214225 (6) : Proxy kube-apiservers stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 174/214225 (6) : Proxy GLOBAL stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 174/214225 (22) : Server kube-apiservers/kind-control-plane is DOWN, reason: Layer4 connection problem, info: "SSL handshake failure (Connection refused)", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 174/214225 (1) : Former worker #1 (6) exited with code 0 (Exit)
[WARNING] 174/214226 (22) : Server kube-apiservers/kind-control-plane2 is DOWN, reason: Layer4 connection problem, info: "SSL handshake failure (Connection refused)", check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 174/214226 (22) : Server kube-apiservers/kind-control-plane3 is DOWN, reason: Layer4 connection problem, info: "SSL handshake failure (Connection refused)", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 174/214226 (22) : backend 'kube-apiservers' has no server available!
[WARNING] 174/214250 (22) : Server kube-apiservers/kind-control-plane is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 174/214336 (22) : Server kube-apiservers/kind-control-plane is DOWN, reason: Layer7 timeout, check duration: 2001ms. 0 active and 0 backup servers left. 19 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 174/214336 (22) : backend 'kube-apiservers' has no server available!
[WARNING] 174/214341 (22) : Server kube-apiservers/kind-control-plane is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 3ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 174/214344 (22) : Server kube-apiservers/kind-control-plane2 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 174/214413 (22) : Server kube-apiservers/kind-control-plane3 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 4ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
useful!

Related questions

Helm: rabbitmq-ha deployment fails to discover k8s nodes hot 31
`kind` should be able to start the control plane when `/dev/kmsg` is not available hot 25
Installing Istio fails when trying to match resources hot 24
WSLv1: No DNS between pods or to the internet hot 18
Appears when using ceph of roook: map failed: (30) Read-only file system hot 15
Errors compile: version "go1.12.1" does not match go tool version "go1.10.2" hot 11
Github User Rank List