Ask questionsServices cannot be created on a swarm during registry outages, even if the necessary images have already been pulled


I have been creating services on a swarm all day for development work I'm doing and it suddenly stopped working - docker service create would hang and eventually exit without creating the service.

After much digging, netstat revealed connections to which I discovered was down at the time. However, it was surprising to me that the registry was even being contacted since all nodes already have the image in question pulled.

Steps to reproduce the issue:

  1. Ensure a test image is pulled locally to all nodes in a swarm
  2. Offline the registry from which that image is pulled
  3. Attempt to docker service create (or, I'm told, docker service update a running service)

Describe the results you received:

$ docker service create --replicas 1 --name helloworld alpine:3.5 ping
... ~5 minutes elapses...
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

The docker daemon logs only:

Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.968662738Z" level=warning msg="unable to pin image alpine:3.5 to digest: manifest unknown: manifest unknown"
Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.968996137Z" level=error msg="Error creating service helloworld: rpc error: code = 4 desc = context deadline exceeded"
Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.969256177Z" level=error msg="Handler for POST /v1.25/services/create returned error: rpc error: code = 4 desc = context deadline exceeded"

Describe the results you expected:

At a minimum, more helpful log messages indicating the registry connection failure, so that it'd be readily apparently what the problem was.

However, the requirement to connect to the registry on service creation or update is a fairly big limitation. The services I'm creating are transient and we'll potentially be creating hundreds or thousands per day. Because the images are already pulled, the dependency on an external service just to run containers is less than ideal.

Additional information you deem important (e.g. issue happens only occasionally):

After discussion with @programmerq on IRC I learned that service create and service update contact the registry to pin the digest, so this process will fail whenever the registry for the service's image is down.

However, it can be worked around by creating the service using the digest form of the image specifier e.g.:

$ docker service create --replicas 1 --restart-condition none -e FOO=foo $(docker image inspect --format '{{(index .RepoDigests 0)}}' alpine:3.5)

(btw, docker images --format '{{.Digest}}' alpine:3.5 and docker image ls --format '{{.Digest}}' alpine 3.5' both return<none>, but the documentation (for at leastdocker images`) claims this should return the digest, is this a bug?)

Output of docker version:

 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:36:34 2017
 OS/Arch:      linux/amd64

 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:36:34 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 4
 Running: 0
 Paused: 0
 Stopped: 4
Images: 2
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Backing Filesystem: extfs
 Dirs: 56
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: gmyub5nxs7v84npk768zcs5au
 Is Manager: true
 ClusterID: pxno36t9783hgb10ok466k91v
 Managers: 1
 Nodes: 2
  Task History Retention Limit: 5
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address:
 Manager Addresses:
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 494.5 MiB
Name: swarm0
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: false
Insecure Registries:
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Both nodes in the swarm are Debian 8 KVM guests running under SmartOS.


Answer questions sebastianfelipe

It still happening on version 19.03.3


Related questions

Swarm restarts all containers hot 2
can not successfully install docker-ce on ubuntu 16.04 ? why ,Can you help me? hot 1
OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown hot 1
Allow COPY command's --chown to be dynamically populated via ENV or ARG hot 1
windowsRS1 and windowsRS5-process are failing due to "Unable to delete '\gopath\src\\docker\docker" hot 1
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded hot 1
one container in the overlay network not available hot 1
Containers on overlay network cannot reach other containers hot 1
[Windows] windowsfilter folder impossible to delete hot 1
swarm node lost leader status hot 1
New-SmbGlobalMapping don't continued working in Container hot 1
failed to export image: failed to create image: failed to get layer: layer does not exist hot 1
"docker stack deploy">"rpc error: code = 3 desc = name must be valid as a DNS name component" hot 1
runc regression - EPERM running containers from selinux hot 1
Read only filesystem creating services with secrets hot 1
Github User Rank List