profile
viewpoint

Ask questionsError response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Description

I just want to change the roles of an existing swarm like:

worker2 -> promote to manager
manager1 -> demote to worker

This is due to a planned maintenance with ip-change on manager1, which should be done like

manager1 -> demote to worker -> drain mode -> leave swarm -> change ip -> join swarm -> promote to manager 
worker2 -> demote to worker again

Steps to reproduce the issue:

manager1:~# docker node promote worker2
Node worker2 promoted to a manager in the swarm.
worker2:~# docker node ls                                                                           
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS            
mzqms0uiq2f6t9lqvhghiuqmg     manager1            Ready               Active              Leader                    
vp4dbt8xefe14rqzej5gpdi2u     worker1             Ready               Active                                        
20vbax32k3rc5dla7p86kfgku *   worker2             Ready               Active              Reachable   
worker2:~# docker node demote manager1 # or just
worker2:~# docker node update --availability drain manager1

Describe the results you received: Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Describe the results you expected: Manager manager1 demoted in the swarm.

Additional information you deem important (e.g. issue happens only occasionally): Swarm has been running for half a year.

Output of docker version:

# docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

docker info
Containers: 22
 Running: 0
 Paused: 0
 Stopped: 22
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: mzqms0uiq2f6t9lqvhghiuqmg
 Is Manager: false
 Node Address: 10.47.0.2
 Manager Addresses:
  10.47.0.4:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-36-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 15.67GiB
Name: manager1
ID: HFFB:LBVB:4TSL:DRVP:JXMR:WZXI:QEDA:N3WP:Z7QL:WAPG:OPVZ:BZLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Ubuntu-Machines running on VMWare.


manager1:~# cat /etc/issue
Ubuntu 16.04.4 LTS \n \l

manager1:~# uname -a
Linux manager1 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
moby/moby

Answer questions leojonathanoh

Just wanted to say since I've been running Swarm on 18.06.1-ce I've been seeing this error everyday and all the time. I never understood what it meant, but from general feeling its related to network issues (e.g. swarm ports not reachable, node is offline).

One thing I do know for certain is that this error happens when one of the Swarm node is on a Wi-Fi connection on a Type-1 Hypervisor (namely Hyper-V), where , so that any change of state (e.g. docker stack deploy, docker node rm) of the Swarm causes the entire Swarm to 'hang' for about 1 minute. See here for an explanation. My issue might be related.

Just take a look at my logs yesterday when I was doing some docker stack rm and docker stack deploy (not even docker node rm or docker swarm join):

$ cat /var/log/syslog | grep 'context deadline exceeded'
Apr 30 00:57:08 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:08.469936110+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:16 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:16.470048208+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:24 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:24.470005513+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:32 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:32.469811627+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:39 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:39.470093866+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:48 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:48.470102369+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:57:57 dockermanager1 dockerd[1666]: time="2019-04-30T00:57:57.470082781+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:05 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:05.469967521+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:15 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:15.470036328+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:22 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:22.469912401+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:29 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:29.469905377+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:37 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:37.470047939+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.469931679+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.470549865+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:02 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:02.470009052+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:09 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:09.469991053+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:17 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:17.470073345+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:59:24 dockermanager1 dockerd[1666]: time="2019-04-30T00:59:24.470134555+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

and more details on here on the last minute:

$ cat /var/log/syslog | grep 'Apr 30 00:59'
Apr 30 00:58:38 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:38.238909507+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.0.211:2377: connect: no route to host\""
Apr 30 00:58:38 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:38.469913224+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:39 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:39.469913607+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:40 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:40.469880591+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:41 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:41.469798076+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:42 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:42.469865259+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:43 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:43.469886743+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:44 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:44.469770628+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:44 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:44.523885240+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, CONNECTING" module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.469931679+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582513564+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {10.0.0.1:2377 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\". Reconnecting..." module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582759561+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, TRANSIENT_FAILURE" module=grpc
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.582911860+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:47 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:47.583184257+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:48 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:48.469930163+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:49 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:49.469957047+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:50 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:50.469823733+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:51 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:51.469861617+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:51 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:51.587378557+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, CONNECTING" module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.470549865+08:00" level=error msg="error sending message to peer" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.653933912+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {10.0.0.1:2377 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\". Reconnecting..." module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654009711+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42390e150, TRANSIENT_FAILURE" module=grpc
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654046711+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:54 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:54.654083010+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:55 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:55.469919456+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:56 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:56.469938640+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:57 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:57.469761127+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:58 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:58.469787412+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""
Apr 30 00:58:59 dockermanager1 dockerd[1666]: time="2019-04-30T00:58:59.469860997+08:00" level=error msg="error sending message to peer" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.0.1:2377: connect: no route to host\""

10.0.0.1 here is an actual manager node that was offline on a 3 node Swarm. All 3 nodes were managers.

My Swarm runs well. All until the next time do a docker stack rm, then the frustration returns.

Related questions

start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 65
start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 29
upgrade docker-18.09.2-ce , shim.sock: bind: address already in use: unknown hot 27
runc regression - EPERM running containers from selinux hot 16
Windows Server 2019 publish ports in swarm not working hot 14
"docker stack deploy">"rpc error: code = 3 desc = name must be valid as a DNS name component" hot 13
Swarm restarts all containers hot 11
integration: "error reading the kernel parameter" errors during CI hot 10
write unix /var/run/docker.sock->@: write: broken pipe hot 10
hcsshim::PrepareLayer failed in Win32: The parameter is incorrect hot 10
OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown hot 9
Docker 18.09.1 doesn't work with iptables v1.8.2 hot 9
dockerd stopped responding to API requests; no installed keys could decrypt message hot 9
manifest invalid error when pushing image to quay.io hot 8
Containers on overlay network cannot reach other containers hot 7
Github User Rank List