profile
viewpoint

Ask questionsDNS resolution has high error rate with freshly created cluster

In our production environment we are experiencing many DNS timeouts.

I have created an debugging application which tries to resolve a dns name every and waits x ms between requests. https://github.com/thomaschaaf/dns-tester

We have had these problems on weave and then changed our cluster to calico but that did not resolve the problems either.

A typical output looks like this:

9500 1 '244s' '39lookups / s'
Lookup 9535 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9537 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9544 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9545 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9546 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9550 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
Lookup 9551 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
9600 8 '249s' '39lookups / s'
9700 8 '252s' '38lookups / s'
9800 8 '254s' '39lookups / s'
9900 8 '257s' '39lookups / s'
10000 8 '260s' '38lookups / s'
10100 8 '262s' '39lookups / s'
Lookup 10112 failed Error: DNS Timeout!
    at Timeout.fn [as _onTimeout] (/root/service/node_modules/await-timeout/dist/index.umd.js:112:25)
    at ontimeout (timers.js:427:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5)
    at Timer.processTimers (timers.js:212:10)
10200 9 '265s' '38lookups / s'
10300 9 '268s' '38lookups / s'

Basically DNS is working fine (in this case there is 1 error at 244 seconds of the program running and in the next 20 seconds we see 8 dns timeouts.

A ping between two pod on our cluster gives us no packet loss. To test we created a fresh cluster and are also seeing the same problems there.

We're using kops Version 1.9.1 with k8s 1.9.8 and kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11.

kubernetes/kops

Answer questions alex88

Is it possible to reopen this? There's still no solution on this

useful!

Related questions

Kops 1.12-beta.2 won't/can't bring up etcd server, manager or kube-api hot 1
kube controller manager refuses to connect after upgrading from 1.10.6 to 1.11.7 hot 1
Missing kops controller support for cloudproviders hot 1
InstanceGroup not found (for etcd ap-southeast-2a/main): "ap-southeast-2a" hot 1
Rolling-update fails due to calico-node with 1.12.0-beta.2 hot 1
Kubelet Unable To Apply Reserved Cgroup Limits because Cgroup does not exist hot 1
etcd3 and kube-apiserver fail on terraform apply after terraform destroying w/ kops generated config hot 1
Upgrade from Kops 1.11 to 1.12 has failed. hot 1
Couldn't find key etcd_endpoints in ConfigMap kube-system/calico-config hot 1
Protokube has sustained cpu usage above 100% hot 1
Allow just one instance type in mixedInstancesPolicy hot 1
kubectl command: Unable to connect to the server: EOF hot 1
Unable to use a local filesystem state store hot 1
DNS record for public API address not updated hot 1
etcd3 and kube-apiserver fail on terraform apply after terraform destroying w/ kops generated config hot 1
source:https://uonfu.com/
Github User Rank List