profile
viewpoint

Ask questionsetcd3 and kube-apiserver fail on terraform apply after terraform destroying w/ kops generated config

1. What kops version are you running? The command kops version, will display this information. Version 1.12.3 (git-e55205471) 2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.10", GitCommit:"e3c134023df5dea457638b614ee17ef234dc34a6", GitTreeState:"clean", BuildDate:"2019-07-08T03:50:59Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Unable to connect to the server: EOF

3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue?

  • kops edit cluster to configure.
  • kops update cluster --out=./ --target=terraform to generate the Terraform config
  • terraform apply to apply the config.

5. What happened after the commands executed?

  • The terraform apply completes successfully and all instances are created. However, I cannot connect to the cluster even after 30 minutes and all nodes are marked as OutOfService by the load balancer.
  • Running a kops validate cluster results in: unexpected error during validation: error listing nodes: Get https://api.example.net/api/v1/nodes: EOF
  • kubectl get cluster results in: Unable to connect to the server: EOF
  • ssh into the master nodes and I find
0815 14:23:15.697247       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [https://127.0.0.1:4001] /etc/kubernetes/pki/kube-apiserver/etcd-client.key /etc/kubernetes/pki/kube-apiserver/etcd-client.crt /etc/kubernetes/pki/kube-apiserver/etcd-ca.crt true true 1000 0xc420b3ddd0 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)

6. What did you expect to happen? I expected the cluster to be up and ready to go, and to be able to connect to it. 7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2019-07-31T15:59:08Z
  name: example.net
spec:
  additionalPolicies:
    node: |
      [   
        {
          "Effect": "Allow",
          "Action": [
            "sts:AssumeRole"
          ],
          "Resource": [
            "arn:aws:iam:::role/k8s-*"
          ]
        }
      ]   
  api:
    loadBalancer:
      sslCertificate: arn:aws:acm:us-east-1:secretsecretsecret
      type: Internal
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws 
  configBase: s3://kops.example.internal/example.net
  dnsZone: example.net
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1d
      name: d
    - instanceGroup: master-us-east-1e
      name: e
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1d
      name: d
    - instanceGroup: master-us-east-1e
      name: e
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - office_cidr
  - another_cidr
  kubernetesVersion: 1.12.10
  masterInternalName: api.internal.example.net
  masterPublicName: api.example.net
  networkCIDR: another_cidr
  networkID: vpc-abcdefg
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: another_cidr
  sshAccess:
  - another_cidr
  - another_cidr
  subnets:
  - cidr: another_cidr
    egress: nat-0e0e0e0e0e0e0
    id: subnet-abcdefg
    name: us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: another_cidr
    egress: nat-0e0e0e0e0e0e0
    id: subnet-abcdefg
    name: us-east-1d
    type: Private
    zone: us-east-1d
  - cidr: another_cidr
    egress: nat-0e0e0e0e0e0e0
    id: subnet-abcdefg
    name: us-east-1e
    type: Private
    zone: us-east-1e
  - cidr: another_cidr
    id: subnet-abcdefg
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  - cidr: another_cidr
    id: subnet-abcdefg
    name: utility-us-east-1d
    type: Utility
    zone: us-east-1d
  - cidr: another_cidr
    id: subnet-abcdefg
    name: utility-us-east-1e
    type: Utility
    zone: us-east-1e
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-08-02T02:43:49Z
  labels:
    kops.k8s.io/cluster: example.net
  name: master-us-east-1b
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1b
  role: Master
  subnets:
  - us-east-1b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-08-02T02:43:49Z
  labels:
    kops.k8s.io/cluster: example.net
  name: master-us-east-1d
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1d
  role: Master
  subnets:
  - us-east-1d

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-08-02T02:43:50Z
  labels:
    kops.k8s.io/cluster: example.net
  name: master-us-east-1e
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1e
  role: Master
  subnets:
  - us-east-1e

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-08-02T02:43:50Z
  labels:
    kops.k8s.io/cluster: example.net
  name: nodes
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: m4.xlarge
  maxSize: 3
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-1b
  - us-east-1d
  - us-east-1e


8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. I don't think the verbose logs from the kops commands or the terraform commands are relevant here, and that the more relevant logs are those that I have posted below. 9. Anything else do we need to know?

  • All of this works perfectly with version 1.11 of k8s and kops. We're moving forward with development on that version for now, but concerned about the future.
  • If I ssh into the master nodes, the last message in kube-apiserver.log is
0815 14:23:15.697247       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [https://127.0.0.1:4001] /etc/kubernetes/pki/kube-apiserver/etcd-client.key /etc/kubernetes/pki/kube-apiserver/etcd-client.crt /etc/kubernetes/pki/kube-apiserver/etcd-ca.crt true true 1000 0xc420b3ddd0 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)
  • All three masters show this same message.
  • Although it looks like the volume eventually attaches at a different mount point, an error that might be relevant in /var/log/etcd.log is
W0815 13:46:56.016362    3495 mounter.go:293] Error attaching volume "vol-0e7dd951edc73df76": Error attaching EBS volume "vol-0e7dd951edc73df76": InvalidParameterValue: Invalid value '/dev/xvdu' for unixDevice. Attachment point /dev/xvdu is already in use
        status code: 400, request id: 1cbf8ec1-49e1-4be0-8f20-487d79228007
  • The cluster is running etcd version 3.2.24.
kubernetes/kops

Answer questions mccare

Your message looks like etcd-manager cannot mount its persistent EBS volume where etcd data lives (since it says 'already in use'). If you setup the cluster from scratch make sure you have no EBS volumes from your previous cluster (e.g. named a.etcd-main.clustername or tagged with your cluster name), or even duplicate EBS volumes with the same name.

In terraform you should see 2 EBS volumes per master created (aws_ebs_volume.a-etcd-events-...). I had trouble in the area of EBS volumes being created by terraform and an upgrade to kops 1.13 helped me.

useful!

Related questions

Unable to use a local filesystem state store hot 2
Kops 1.12-beta.2 won't/can't bring up etcd server, manager or kube-api hot 1
kube controller manager refuses to connect after upgrading from 1.10.6 to 1.11.7 hot 1
Missing kops controller support for cloudproviders hot 1
InstanceGroup not found (for etcd ap-southeast-2a/main): "ap-southeast-2a" hot 1
Rolling-update fails due to calico-node with 1.12.0-beta.2 hot 1
Kubelet Unable To Apply Reserved Cgroup Limits because Cgroup does not exist hot 1
Upgrade from Kops 1.11 to 1.12 has failed. hot 1
Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config hot 1
Protokube has sustained cpu usage above 100% hot 1
Allow just one instance type in mixedInstancesPolicy hot 1
kubectl command: Unable to connect to the server: EOF hot 1
DNS record for public API address not updated hot 1
Issues encountered deploying to OpenStack hot 1
Master does not rejoin cluster after (simulated) EBS volume failure hot 1
source:https://uonfu.com/
Github User Rank List