profile
viewpoint

Ask questionsone container in the overlay network not available

<!-- If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

For more information about reporting issues, see https://github.com/moby/moby/blob/master/CONTRIBUTING.md#reporting-other-issues


GENERAL SUPPORT INFORMATION

The GitHub issue tracker is for bug reports and feature requests. General support for docker can be found at the following locations:

  • Docker Support Forums - https://forums.docker.com
  • Slack - community.docker.com #general channel
  • Post a question on StackOverflow, using the Docker tag

General support for moby can be found at the following locations:

  • Moby Project Forums - https://forums.mobyproject.org
  • Slack - community.docker.com #moby-project channel
  • Post a question on StackOverflow, using the Moby tag

BUG REPORT INFORMATION

Use the commands below to provide key information from your environment: You do NOT have to include this information if this is a FEATURE REQUEST -->

Description

<!-- Briefly describe the problem you are having in a few paragraphs. -->

One of 5 containers from a service is not reachable by others, while the container itself is reported healthy by docker

Steps to reproduce the issue: Couldn't find a reliable way to reproduce

Describe the results you received: A container was removed from swarm VIP lb, and was not reachable from other containers

Describe the results you expected: The container to be available as all other ones from the service

Additional information you deem important (e.g. issue happens only occasionally):

The log on a different swarm node had this, might be related:

Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669166101+01:00" level=info msg="parsed scheme: \"\"" module=grpc
Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669194476+01:00" level=info msg="scheme \"\" not registered, fallback to default scheme" module=grpc
Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669264581+01:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{ 0  <nil>}]" module=grpc
Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669299244+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669350636+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4293861f0, CONNECTING" module=grpc
Mar 14 06:19:29 api01 dockerd[6572]: time="2019-03-14T06:19:29.669489510+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4293861f0, READY" module=grpc
Mar 14 06:20:05 api01 dockerd[6572]: time="2019-03-14T06:20:05.553771163+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4293861f0, TRANSIENT_FAILURE" module=grpc
Mar 14 06:20:05 api01 dockerd[6572]: time="2019-03-14T06:20:05.553833393+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4293861f0, CONNECTING" module=grpc
Mar 14 06:20:05 api01 dockerd[6572]: time="2019-03-14T06:20:05.553816934+01:00" level=warning msg="grpc: addrConn.createTransport failed to connect to { 0  <nil>}. Err :connection error: desc = \"transport: Error while dialing only one connection allowed\". Reconnecting..." module=grpc
Mar 14 06:20:05 api01 dockerd[6572]: time="2019-03-14T06:20:05.553884690+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4293861f0, TRANSIENT_FAILURE" module=grpc
Mar 14 06:20:05 api01 dockerd[6572]: time="2019-03-14T06:20:05.669396784+01:00" level=warning msg="grpc: addrConn.transportMonitor exits due to: context canceled" module=grpc

Connections from the container to the VIP work:

Finding the VIP

# docker service inspect collector-v0_collector | jq .[0].Endpoint.VirtualIPs[0].Addr
"10.0.0.195/24"

Pinging it from the container namespace

# nsenter --target $(pidof collector) -n ping 10.0.0.195
PING 10.0.0.195 (10.0.0.195) 56(84) bytes of data.
64 bytes from 10.0.0.195: icmp_seq=1 ttl=64 time=0.050 ms
64 bytes from 10.0.0.195: icmp_seq=2 ttl=64 time=0.033 ms
64 bytes from 10.0.0.195: icmp_seq=3 ttl=64 time=0.039 ms

Pinging a different container on the same service/network

# nsenter --target $(pidof collector) -n ping 10.0.0.144
PING 10.0.0.144 (10.0.0.144) 56(84) bytes of data.
^C
--- 10.0.0.144 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Finding the IP of the "broken" container:

# nsenter --target $(pidof collector) -n ip addr | grep inet
    inet 127.0.0.1/8 scope host lo
    inet 10.0.0.147/24 brd 10.0.0.255 scope global eth0
    inet 172.18.0.23/16 brd 172.18.255.255 scope global eth1

Connecting from the load-balancer container on a different host:

api01# nsenter -t $(pidof lb) -n curl -o /dev/null http://10.0.0.147:8080/metrics
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to 10.0.0.147:8080; Connection refused

Connecting from the load balancer on current host

api02# nsenter -t $(pidof lb) -n curl -o /dev/null http://10.0.0.147:8080/metrics
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  121k    0  121k    0     0  28.0M      0 --:--:-- --:--:-- --:--:-- 29.7M

Output of docker version:

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:47:25 2019
  OS/Arch:          linux/amd64
  Experimental:     true

Output of docker info:

Containers: 22
 Running: 21
 Paused: 0
 Stopped: 1
Images: 22
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: je9pbt7fhxcgc5fp4t5milsbw
 Is Manager: true
 ClusterID: lbqfpif6v1gowlgutpsuxit3l
 Managers: 6
 Nodes: 6
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 172.16.0.35
 Manager Addresses:
  172.16.0.31:2377
  172.16.0.32:2377
  172.16.0.33:2377
  172.16.0.35:2377
  172.16.0.36:2377
  172.16.0.37:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.5.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 125.6GiB
Name: api02
ID: 5N2S:KZM3:Z6AA:SLMT:BWCK:VOT2:SG7Y:6OJ3:Y5X4:LRIZ:7UXU:7QOR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

The cluster runs on physical nodes on CentOS 7

moby/moby

Answer questions wheelybird

We're seeing this (or at least something very similar) too. It's happened twice now, first on Docker CE 18.09.2 and now on Docker CE 18.09.5. This is on Ubuntu 18.04.2.

We're actually seeing multiple containers on the same host (for difference services but within the same overlay network) refusing connections. Both times this happened when we were doing a rolling update for the services.

The syslog shows a fair few warnings and errors regarding network interfaces. Here are some excepts:

May 16 11:08:10 ip-172-16-122-110 kernel: [4999192.321415] veth55: renamed from veth0dac271
May 16 11:08:10 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:08:10.581760578Z" level=warning msg="peerDbAdd transient condition - Key:10.0.13.5 02:42:0a:00:0d:05 cardinality:2 db state:Set{{77d0548e3a125972a8b6338ca4b1954b2533299323949a75f9fbee0e155509c2 172.16.122.110 24 32 true}, {b1a988c5528da10fa3446db96b85977bb9b6380547e33bc3fad91e3df4e9f9b3 172.16.102.167 24 32 false}}"
...
...
May 16 11:08:10 ip-172-16-122-110 kernel: [4999192.360842] docker_gwbridge: port 9(veth884f989) entered disabled state
May 16 11:08:10 ip-172-16-122-110 kernel: [4999192.360877] br0: port 7(veth54) entered blocking state
May 16 11:08:10 ip-172-16-122-110 kernel: [4999192.360878] br0: port 7(veth54) entered forwarding state
May 16 11:08:10 ip-172-16-122-110 networkd-dispatcher[773]: ERROR:Unknown interface index 2049 seen even after reload
...
...
May 16 11:08:10 ip-172-16-122-110 networkd-dispatcher[773]: WARNING:Unknown index 2053 seen, reloading interface list
May 16 11:08:10 ip-172-16-122-110 networkd-dispatcher[773]: ERROR:Unknown interface index 2053 seen even after reload
May 16 11:08:10 ip-172-16-122-110 containerd[22404]: time="2019-05-16T11:08:10.739802546Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/88d9bcd18e002caecf54007907f989a135a151558a96a6fd4c43f006b6e5f999/shim.sock" debug=false pid=5860
...
...
May 16 11:08:58 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:08:58.378946319Z" level=warning msg="rmServiceBinding 9e969a167b4c3e361860a757392e1d006249546cbe219447ab797288155e9dec possible transient state ok:false entries:0 set:false "
May 16 11:08:58 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:08:58.413575639Z" level=warning msg="rmServiceBinding 7895de12c83c7fe646fe14d7a8865796281980cce0fd5b3335e94eb5f716ce9e possible transient state ok:false entries:0 set:false "
...
...
May 16 11:09:18 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:09:18.082341736Z" level=warning msg="rmServiceBinding c888c203b0f8e0d058c3ffa08a4efdb7d17f99d3cb60b609ee4369fbbee335c2 possible transient state ok:false entries:0 set:false "
May 16 11:09:22 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:09:22.431321317Z" level=warning msg="failed to deactivate service binding for container fa_api.4.yfj5nahbfny1l3kim87744dqk" error="No such container: fa_api.4.yfj5nahbfny1l3kim87744dqk" module=node/agent node.id=lef6w0zatlo7iflu3w445r5su
May 16 11:10:22 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:10:22.598402415Z" level=warning msg="rmServiceBinding 4059f290d6ef5053b568f13f5ee867c70b0f575b273f0ec28310ce47cfdb7881 possible transient state ok:false entries:0 set:false "
May 16 11:10:22 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:10:22.613769755Z" level=warning msg="rmServiceBinding 7e2e18c7d034e6eac040ede2ffa7a96f1c9139ae0c7050eadd2ed4b5ba55d07d possible transient state ok:false entries:0 set:false "
May 16 11:10:30 ip-172-16-122-110 dockerd[25893]: time="2019-05-16T11:10:30.317708884Z" level=warning msg="failed to deactivate service binding for container fa_app.1.6rsiia03eyn8v4dza5luo87t6" error="No such container: fa_app.1.6rsiia03eyn8v4dza5luo87t6" module=node/agent node.id=lef6w0zatlo7iflu3w445r5su

Docker info

Containers: 9
 Running: 6
 Paused: 0
 Stopped: 3
Images: 9
Server Version: 18.09.5
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: lef6w0zatlo7iflu3w445r5su
 Is Manager: false
 Node Address: 172.16.122.110
 Manager Addresses:
  172.16.110.251:2377
  172.16.111.199:2377
  172.16.112.115:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-1032-aws
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 957.8MiB
Name: ip-172-16-122-110
ID: 7ZHL:PTGH:QFBH:PC24:NWLQ:DIWE:SZCG:EUWY:HIWK:I6KX:3L5U:QVIP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 ec2_role=app
 ec2_environment=staging
 ec2_availability_zone=eu-west-2c
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Docker version

Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056
 Built:             Thu Apr 11 04:43:57 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.5
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       e8ff056
  Built:            Thu Apr 11 04:10:53 2019
  OS/Arch:          linux/amd64
  Experimental:     false

All the nodes in the swarm are on the same version of Ubuntu & Docker.

We had planned to migrate to our production environment to Swarm within weeks, but this issue is blocking that as we can't guarantee a release won't break our site.

useful!

Related questions

Swarm restarts all containers hot 2
integration: "error reading the kernel parameter" errors during CI hot 2
can not successfully install docker-ce on ubuntu 16.04 ? why ,Can you help me? hot 1
OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown hot 1
Allow COPY command's --chown to be dynamically populated via ENV or ARG hot 1
windowsRS1 and windowsRS5-process are failing due to "Unable to delete '\gopath\src\github.com\docker\docker" hot 1
Panic: runtime error: invalid memory address or nil pointer dereference hot 1
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded hot 1
Containers on overlay network cannot reach other containers hot 1
&#34;initgroups, operation not permitted&#34; error in apache2-mpm-itk when inside Docker - moby hot 1
[Windows] windowsfilter folder impossible to delete hot 1
swarm node lost leader status hot 1
New-SmbGlobalMapping don't continued working in Container hot 1
failed to export image: failed to create image: failed to get layer: layer does not exist hot 1
"docker stack deploy">"rpc error: code = 3 desc = name must be valid as a DNS name component" hot 1
Github User Rank List