profile
viewpoint

Ask questionsRolling-update fails due to calico-node with 1.12.0-beta.2

1. What kops version are you running? Version 1.12.0-beta.2 (git-d1453d22a)

2. What Kubernetes version are you running? 1.12.7

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

  • Create a brand new cluster
  • Change any configuration
  • Run a rolling-update
kops rolling-update cluster --yes

5. What happened after the commands executed? Rolling update fails at the first node (master instance), due to calico-node not becoming ready.

6. What did you expect to happen? Rolling-update to get completed without errors.

7. Please provide your cluster manifest.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  name: ************
spec:
  additionalPolicies:
    master: "[\n      {\n        \"Effect\": \"Allow\",\n        \"Action\": [ \"sts:AssumeRole\"
      ],\n        \"Resource\": [\"*\"]\n      },\n     {\n         \"Effect\": \"Allow\",\n
      \        \"Action\": [\n             \"ec2:DescribeInstanceStatus\"\n         ],\n
      \        \"Resource\": \"*\"\n     }\n  ]\n  \n"
    node: "[\n      {\n        \"Effect\": \"Allow\",\n        \"Action\": [ \"sts:AssumeRole\"
      ],\n        \"Resource\": [\"*\"]\n      },\n     {\n         \"Effect\": \"Allow\",\n
      \        \"Action\": [\n             \"ec2:DescribeInstanceStatus\"\n         ],\n
      \        \"Resource\": \"*\"\n     }\n  ]\n  \n"
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://************************
  dnsZone: ************
  etcdClusters:
  - cpuRequest: 200m
    enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-us-east-1a-1
      name: a-1
    - instanceGroup: master-us-east-1c-1
      name: c-1
    - instanceGroup: master-us-east-1b-1
      name: b-1
    memoryRequest: 100Mi
    name: main
    version: 3.2.24
  - cpuRequest: 100m
    enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-us-east-1a-1
      name: a-1
    - instanceGroup: master-us-east-1c-1
      name: c-1
    - instanceGroup: master-us-east-1b-1
      name: b-1
    memoryRequest: 100Mi
    name: events
    version: 3.2.24
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    admissionControl:
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - ResourceQuota
    - NodeRestriction
    - Priority
    oidcClientID: kubernetes
    oidcGroupsClaim: groups
    oidcIssuerURL: https://dex.************
    oidcUsernameClaim: email
  kubeDNS:
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
    imageGCHighThresholdPercent: 75
    imageGCLowThresholdPercent: 60
    kubeletCgroups: /systemd/system.slice
    runtimeCgroups: /systemd/system.slice
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.12.7
  masterKubelet:
    kubeletCgroups: /systemd/system.slice
    runtimeCgroups: /systemd/system.slice
  masterPublicName: api.************
  networkCIDR: 10.21.0.0/16
  networkID: vpc-xxxxxxxxxxxxxxxxx
  networking:
    calico:
      majorVersion: v3
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.x.x.x/21
    id: subnet-xxxxxxxxxxxxxxxxx
    name: node-us-east-1a
    type: Private
    zone: us-east-1a
  - cidr: 10.x.x.x/21
    id: subnet-xxxxxxxxxxxxxxxxx
    name: node-us-east-1c
    type: Private
    zone: us-east-1c
  - cidr: 10.x.x.x/21
    id: subnet-xxxxxxxxxxxxxxxxx
    name: node-us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: 10.x.x.x/23
    id: subnet-xxxxxxxxxxxxxxxxx
    name: utility-us-east-1a
    type: Utility
    zone: us-east-1a
  - cidr: 10.x.x.x/23
    id: subnet-xxxxxxxxxxxxxxxxx
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  - cidr: 10.x.x.0/23
    id: subnet-xxxxxxxxxxxxxxxxx
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

The issue at a glance:

  • Fresh setup with 1.12 branch, no problems.
  • After updating cluster spec with authentication parameters (kubeAPIServer.admissionControl), rolling-update fails right on the first master being updated.
[...]
I0416 15:51:31.284819   29901 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "15m0s" expires: kube-system pod "calico-node-5w2lp" is not ready (calico-node).
I0416 15:52:00.575052   29901 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "15m0s" expires: kube-system pod "calico-node-5w2lp" is not ready (calico-node).
I0416 15:52:30.055031   29901 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "15m0s" expires: kube-system pod "calico-node-5w2lp" is not ready (calico-node).
E0416 15:52:58.178054   29901 instancegroups.go:214] Cluster did not validate within 15m0s

master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duration of \"15m0s\""

The new master join the cluster, but calico-node never gets ready. Readiness check says:

Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 10.x.x.x,10.x.x.x,10.x.x.x

Log doesn't show any ERROR message, only INFO, though.

2019-04-16 19:00:42.229 [INFO][42] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}

Deleting the pod and letting it be created again seems to solve the problem.

Any ideas?

kubernetes/kops

Answer questions edsonmarquezani

Guy in Calico's Github suggested to change to Calico v3.6 to solve this. Wouldn't it be possible?

useful!

Related questions

Unable to use a local filesystem state store hot 2
Kops 1.12-beta.2 won't/can't bring up etcd server, manager or kube-api hot 1
kube controller manager refuses to connect after upgrading from 1.10.6 to 1.11.7 hot 1
Missing kops controller support for cloudproviders hot 1
InstanceGroup not found (for etcd ap-southeast-2a/main): "ap-southeast-2a" hot 1
Kubelet Unable To Apply Reserved Cgroup Limits because Cgroup does not exist hot 1
etcd3 and kube-apiserver fail on terraform apply after terraform destroying w/ kops generated config hot 1
Upgrade from Kops 1.11 to 1.12 has failed. hot 1
Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config hot 1
Protokube has sustained cpu usage above 100% hot 1
Allow just one instance type in mixedInstancesPolicy hot 1
kubectl command: Unable to connect to the server: EOF hot 1
DNS record for public API address not updated hot 1
etcd3 and kube-apiserver fail on terraform apply after terraform destroying w/ kops generated config hot 1
Issues encountered deploying to OpenStack hot 1
source:https://uonfu.com/
Github User Rank List