profile
viewpoint

Ask questionsOCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown

Description docker exec xxx ls: OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown

Steps to reproduce the issue: occur very infrequently

Describe the results you received:

`docker ps` find that the container is running, 
Actually the container is exited( cat /proc/${Container.State.Pid}/stat: No such file or directory ) 

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

18.03.0-ce

Output of docker info:

Containers: 48
 Running: 11
 Paused: 0
 Stopped: 37
Images: 59
Server Version: 18.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.16.6-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.91GiB
...
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

moby/moby

Answer questions lixianyang

uname -a

Linux k8s-worker2 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

docker version

Client: Version: 18.06.1-ce API version: 1.38 Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:20:43 2018 OS/Arch: linux/amd64 Experimental: false Server: Engine: Version: 18.06.1-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:28:38 2018 OS/Arch: linux/amd64 Experimental: false

dockerd log

5月 06 18:37:05 k8s-worker2 dockerd[718]: time="2019-05-06T18:37:05.424828185+08:00" level=error msg="Handler for POST /v1.38/containers/af7331ee2a19a156aa8a24e9531908e6009a8df3bd2858e06f478b3baecd0c20/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:301: running exec setns process for init caused \\\"exit status 46\\\"\": unknown"
5月 06 18:41:28 k8s-worker2 dockerd[718]: time="2019-05-06T18:41:28+08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/14759e87e23d98186b4e2d48556ea72907d02262117b87a77daabf3d81a9eb36/shim.sock" debug=false pid=34838
5月 06 18:41:49 k8s-worker2 dockerd[718]: time="2019-05-06T18:41:49+08:00" level=info msg="shim reaped" id=14759e87e23d98186b4e2d48556ea72907d02262117b87a77daabf3d81a9eb36
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17+08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/ce66964fd0d36153cb56adb27c301aef420d8a96d06bdfa05879c017ea034c12/shim.sock" debug=false pid=41729
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17+08:00" level=info msg="shim reaped" id=ce66964fd0d36153cb56adb27c301aef420d8a96d06bdfa05879c017ea034c12
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17.370821546+08:00" level=error msg="stream copy error: reading from a closed fifo"
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17.370980518+08:00" level=error msg="stream copy error: reading from a closed fifo"
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17.388641665+08:00" level=error msg="ce66964fd0d36153cb56adb27c301aef420d8a96d06bdfa05879c017ea034c12 cleanup: failed to delete container from containerd: no such container"
5月 06 18:42:17 k8s-worker2 dockerd[718]: time="2019-05-06T18:42:17.388681331+08:00" level=error msg="Handler for POST /v1.38/containers/ce66964fd0d36153cb56adb27c301aef420d8a96d06bdfa05879c017ea034c12/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:301: running exec setns process for init caused \\\"exit status 46\\\"\": unknown"
5月 06 18:43:12 k8s-worker2 dockerd[718]: time="2019-05-06T18:43:12.210677691+08:00" level=error msg="Error setting up exec command in container ce66964fd0d36153cb5: Container ce66964fd0d36153cb56adb27c301aef420d8a96d06bdfa05879c017ea034c12 is not running"

kernel log

May  6 03:40:18 k8s-worker2 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
May  6 03:40:18 k8s-worker2 kernel:  cache: sighand_cache(773:6fc0be0722748b9d4937fd28e1b0228d5de516889e61a1ffd8189a5ed52ce88c), object size: 2088, buffer size: 2112, default order: 3, min order: 0
May  6 03:40:18 k8s-worker2 kernel:  node 0: slabs: 0, objs: 0, free: 0
May  6 03:45:20 k8s-worker2 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
May  6 03:45:20 k8s-worker2 kernel:  cache: signal_cache(773:283aab8d6b01a67957f231592e6ee7b7edc14da46f4837e8ce2d13bdd4c5dad8), object size: 1120, buffer size: 1152, default order: 3, min order: 0
May  6 03:45:20 k8s-worker2 kernel:  node 0: slabs: 0, objs: 0, free: 0
May  6 03:50:31 k8s-worker2 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
May  6 03:50:31 k8s-worker2 kernel:  cache: signal_cache(773:7bc08d209f40885a3a278db5d5410c7b4d5f5f5501105049f7588d7eb2974d66), object size: 1120, buffer size: 1152, default order: 3, min order: 0
May  6 03:50:31 k8s-worker2 kernel:  node 0: slabs: 0, objs: 0, free: 0
May  6 03:50:31 k8s-worker2 kernel: runc:[1:CHILD] invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=996
May  6 03:50:31 k8s-worker2 kernel: runc:[1:CHILD] cpuset=7bc08d209f40885a3a278db5d5410c7b4d5f5f5501105049f7588d7eb2974d66 mems_allowed=0
May  6 03:50:31 k8s-worker2 kernel: CPU: 2 PID: 61503 Comm: runc:[1:CHILD] Tainted: G               ------------ T 3.10.0-514.el7.x86_64 #1
May  6 03:50:31 k8s-worker2 kernel: Hardware name: Red Hat KVM, BIOS 1.9.1-5.el7.centos 04/01/2014
May  6 03:50:31 k8s-worker2 kernel: ffff880764738000 00000000d75adf7a ffff8807483fbc98 ffffffff81685fac
May  6 03:50:31 k8s-worker2 kernel: ffff8807483fbd28 ffffffff81680f57 ffff88067020d280 0000000000000001
May  6 03:50:31 k8s-worker2 kernel: 0000000000000000 0000000000000000 0000000000000046 ffffffff81184156
May  6 03:50:31 k8s-worker2 kernel: Call Trace:
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81685fac>] dump_stack+0x19/0x1b
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81680f57>] dump_header+0x8e/0x225
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81184156>] ? find_lock_task_mm+0x56/0xc0
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff810936ce>] ? has_capability_noaudit+0x1e/0x30
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff811f2fd1>] mem_cgroup_oom_synchronize+0x551/0x580
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff811f2420>] ? mem_cgroup_charge_common+0xc0/0xc0
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81184e94>] pagefault_out_of_memory+0x14/0x90
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff8167ed47>] mm_fault_error+0x68/0x12b
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81691cd5>] __do_page_fault+0x395/0x450
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff81691e76>] trace_do_page_fault+0x56/0x150
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff8169151b>] do_async_page_fault+0x1b/0xd0
May  6 03:50:31 k8s-worker2 kernel: [<ffffffff8168e0b8>] async_page_fault+0x28/0x30
May  6 03:50:31 k8s-worker2 kernel: Task in /kubepods/burstable/podbc8864ce-67e6-11e9-8229-fae833f99400/7bc08d209f40885a3a278db5d5410c7b4d5f5f5501105049f7588d7eb2974d66 killed as a result of limit of /kubepods/burstable/podbc8864ce-67e6-11e9-8229-fae833f99400
May  6 03:50:31 k8s-worker2 kernel: memory: usage 262144kB, limit 262144kB, failcnt 259804
May  6 03:50:31 k8s-worker2 kernel: memory+swap: usage 262144kB, limit 9007199254740988kB, failcnt 0
May  6 03:50:31 k8s-worker2 kernel: kmem: usage 262052kB, limit 9007199254740988kB, failcnt 0
May  6 03:50:31 k8s-worker2 kernel: Memory cgroup stats for /kubepods/burstable/podbc8864ce-67e6-11e9-8229-fae833f99400: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
May  6 03:50:31 k8s-worker2 kernel: Memory cgroup stats for /kubepods/burstable/podbc8864ce-67e6-11e9-8229-fae833f99400/7aba9fb36a402290256c9368dc519ec2b7b22b32370b4a755fff4dfabbd50ef3: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
May  6 03:50:31 k8s-worker2 kernel: Memory cgroup stats for /kubepods/burstable/podbc8864ce-67e6-11e9-8229-fae833f99400/7bc08d209f40885a3a278db5d5410c7b4d5f5f5501105049f7588d7eb2974d66: cache:0KB rss:52KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:12KB inactive_file:0KB active_file:0KB unevictable:0KB
May  6 03:50:31 k8s-worker2 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
May  6 03:50:31 k8s-worker2 kernel: [128215]     0 128215      254        1       4        0          -998 pause
May  6 03:50:31 k8s-worker2 kernel: [61502]     0 61502     1636       64       7        0           996 runc:[0:PARENT]
May  6 03:50:31 k8s-worker2 kernel: [61503]     0 61503     1636       17       7        0           996 runc:[1:CHILD]
May  6 03:50:31 k8s-worker2 kernel: Memory cgroup out of memory: Kill process 61502 (runc:[0:PARENT]) score 988 or sacrifice child
May  6 03:50:31 k8s-worker2 kernel: Killed process 61502 (runc:[0:PARENT]) total-vm:6544kB, anon-rss:60kB, file-rss:196kB, shmem-rss:0kB
May  6 03:55:44 k8s-worker2 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
May  6 03:55:44 k8s-worker2 kernel:  cache: kmalloc-1024(773:4696afcf2c6273f304878d687cbd20e1dcee009f8130ce3ae5b376a3d50b77ef), object size: 1024, buffer size: 1024, default order: 3, min order: 0
May  6 03:55:44 k8s-worker2 kernel:  node 0: slabs: 0, objs: 0, free: 0

I delete pod in CrashLoopBackOff state, then new pod create success.

Related questions

start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 36
upgrade docker-18.09.2-ce , shim.sock: bind: address already in use: unknown hot 17
Windows Server 2019 publish ports in swarm not working hot 10
start container failed with "failed to umount /var/lib/docker/containers/.../shm: no such file or directory" hot 9
runc regression - EPERM running containers from selinux hot 8
Swarm restarts all containers hot 8
integration: "error reading the kernel parameter" errors during CI hot 7
hcsshim::PrepareLayer failed in Win32: The parameter is incorrect hot 7
"docker stack deploy">"rpc error: code = 3 desc = name must be valid as a DNS name component" hot 6
write unix /var/run/docker.sock->@: write: broken pipe hot 6
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Containers on overlay network cannot reach other containers
windowsRS1 and windowsRS5-process are failing due to "Unable to delete '\gopath\src\github.com\docker\docker" hot 4
no matching manifest for linux/arm64/unknown hot 4
Using healthcheck on swarm disturbs nameservices hot 4
Github User Rank List