profile
viewpoint

Ask questionsServices cannot be created on a swarm during registry outages, even if the necessary images have already been pulled

Description

I have been creating services on a swarm all day for development work I'm doing and it suddenly stopped working - docker service create would hang and eventually exit without creating the service.

After much digging, netstat revealed connections to registry-1.docker.io which I discovered was down at the time. However, it was surprising to me that the registry was even being contacted since all nodes already have the image in question pulled.

Steps to reproduce the issue:

  1. Ensure a test image is pulled locally to all nodes in a swarm
  2. Offline the registry from which that image is pulled
  3. Attempt to docker service create (or, I'm told, docker service update a running service)

Describe the results you received:

$ docker service create --replicas 1 --name helloworld alpine:3.5 ping docker.com
... ~5 minutes elapses...
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

The docker daemon logs only:

Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.968662738Z" level=warning msg="unable to pin image alpine:3.5 to digest: manifest unknown: manifest unknown"
Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.968996137Z" level=error msg="Error creating service helloworld: rpc error: code = 4 desc = context deadline exceeded"
Feb 28 18:38:48 localhost docker[2613]: time="2017-02-28T18:38:48.969256177Z" level=error msg="Handler for POST /v1.25/services/create returned error: rpc error: code = 4 desc = context deadline exceeded"

Describe the results you expected:

At a minimum, more helpful log messages indicating the registry connection failure, so that it'd be readily apparently what the problem was.

However, the requirement to connect to the registry on service creation or update is a fairly big limitation. The services I'm creating are transient and we'll potentially be creating hundreds or thousands per day. Because the images are already pulled, the dependency on an external service just to run containers is less than ideal.

Additional information you deem important (e.g. issue happens only occasionally):

After discussion with @programmerq on IRC I learned that service create and service update contact the registry to pin the digest, so this process will fail whenever the registry for the service's image is down.

However, it can be worked around by creating the service using the digest form of the image specifier e.g.:

$ docker service create --replicas 1 --restart-condition none -e FOO=foo $(docker image inspect --format '{{(index .RepoDigests 0)}}' alpine:3.5)

(btw, docker images --format '{{.Digest}}' alpine:3.5 and docker image ls --format '{{.Digest}}' alpine 3.5' both return<none>, but the documentation (for at leastdocker images`) claims this should return the digest, is this a bug?)

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:36:34 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:36:34 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 4
 Running: 0
 Paused: 0
 Stopped: 4
Images: 2
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Backing Filesystem: extfs
 Dirs: 56
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: gmyub5nxs7v84npk768zcs5au
 Is Manager: true
 ClusterID: pxno36t9783hgb10ok466k91v
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.18.2.200
 Manager Addresses:
  172.18.2.200:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 494.5 MiB
Name: swarm0
ID: UBFQ:KAEY:LQPZ:Q3AJ:A6LW:LSR5:ZQOR:WZFY:RFCT:BPIE:2GBP:UXOC
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Both nodes in the swarm are Debian 8 KVM guests running under SmartOS.

moby/moby

Answer questions sebastianfelipe

It still happening on version 19.03.3

Related questions

start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 47
upgrade docker-18.09.2-ce , shim.sock: bind: address already in use: unknown hot 24
start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 17
runc regression - EPERM running containers from selinux hot 11
Windows Server 2019 publish ports in swarm not working hot 11
"docker stack deploy">"rpc error: code = 3 desc = name must be valid as a DNS name component" hot 9
OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown hot 8
Swarm restarts all containers hot 8
integration: "error reading the kernel parameter" errors during CI hot 8
hcsshim::PrepareLayer failed in Win32: The parameter is incorrect hot 8
Docker 18.09.1 doesn't work with iptables v1.8.2 hot 7
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded hot 6
Containers on overlay network cannot reach other containers hot 6
write unix /var/run/docker.sock->@: write: broken pipe hot 6
feature request: docker cat command hot 6
Github User Rank List