profile
viewpoint

Ask questionsPod is stuck in terminating: runc did not terminate sucessfully: container does not exist

Description

A kubernetes pod is stuck in terminating:

trace-resolver-intake-1-2-3-6496f5896c-wjrpz                      0/1     Terminating             0          21d
kubectl get po some-trace-resolver-pod-wjrpz -o json | jq .status.containerStatuses
[
  {
    "containerID": "containerd://dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc",
    "image": "trace-resolver:v2",
    "imageID": "trace-resolver@sha256:***",
    "lastState": {},
    "name": "trace-resolver",
    "ready": false,
    "restartCount": 0,
    "state": {
      "running": {
        "startedAt": "2020-01-21T13:58:22Z"
      }
    }
  }
]

On the host:

journalctl -o cat -u containerd

time="2020-02-12T13:12:54.556721052Z" level=info msg="StopPodSandbox for "4498f5a09d782a7fb5c662f20b3bdbba8260e086f32529d4d56097cf8bf4445d""
time="2020-02-12T13:12:54.556777390Z" level=info msg="Container to stop "e77efe7990700916123241ab4af14d7d9f0d262b2467a146aceb83af85274cb3" must be in running or unknown state, current state "CONTAINER_EXITED""
time="2020-02-12T13:12:54.560359783Z" level=info msg="Kill container "dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc""
time="2020-02-12T13:12:54.564680513Z" level=error msg="StopPodSandbox for "4498f5a09d782a7fb5c662f20b3bdbba8260e086f32529d4d56097cf8bf4445d" failed" error="failed to stop container "dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc": failed to kill container "dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc": unknown error after kill: runc did not terminate sucessfully: container "dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc" does not exist
crictl pods

POD ID              CREATED             STATE               NAME                                           NAMESPACE                  ATTEMPT
4498f5a09d782       3 weeks ago         Ready               trace-resolver-intake-1-2-3-6496f5896c-wjrpz   apm-firehose               0
crictl ps

sudo crictl ps
CONTAINER ID        IMAGE               CREATED             STATE               NAME                       ATTEMPT             POD ID
dc2ca0615937e       6380989e35b3b       3 weeks ago         Running             trace-resolver             0                   4498f5a09d782

Process tree of the containerd-shim, without any children:

root     29561   776  0 Jan21 ?        00:14:19  \_ containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc -address /run/containerd/containerd.sock -containerd-binary /usr/local/bin/containerd -debug
find  /var/lib/containerd -name dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc -exec ls -Rl {} \;

/var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc:
total 0
prwx------ 1 root root 0 Jan 21 13:58 shim.stderr.log
prwx------ 1 root root 0 Feb 12 13:12 shim.stdout.log
/var/lib/containerd/io.containerd.grpc.v1.cri/containers/dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc:
total 4
-rw------- 1 root root 145 Jan 21 13:58 status
sudo ctr -n k8s.io c ls 

dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc    sha256:6380989e35b3b3cf22e491f5497b3437f713c191ae7e49147ffc19964461030a    io.containerd.runtime.v1.linux
jq /var/lib/containerd/io.containerd.grpc.v1.cri/containers/dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc/status

{
  "Version": "v1",
  "Pid": 29578,
  "CreatedAt": 1579615102238074600,
  "StartedAt": 1579615102390671000,
  "FinishedAt": 0,
  "ExitCode": 0,
  "Reason": "",
  "Message": ""
}
kill -USR1 29561 ; journalctl -u containerd -o cat -e

time="2020-02-12T13:12:52Z" level=info msg="=== BEGIN goroutine stack dump ===
goroutine 29 [running]:
main.dumpStacks(0xc00006a5a0)
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:259 +0x88
main.executeShim.func1(0xc000076780, 0xc00006a5a0)
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:179 +0x3d
created by main.executeShim
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:177 +0x5e6
goroutine 1 [select]:
main.handleSignals(0xc00006a5a0, 0xc000076720, 0xc00005f0e0, 0xc00009a0c0, 0xc000153e80, 0x0)
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:222 +0xf4
main.executeShim(0x7fc21c31c240, 0xc0000764e0)
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:182 +0x61e
main.main()
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:111 +0x1e3
goroutine 18 [syscall]:
os/signal.signal_recv(0x765e20)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/os/signal/signal_unix.go:29 +0x41
goroutine 19 [chan receive]:
main.main.func1()
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:82 +0x7f
created by main.main
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:81 +0x54
goroutine 5 [IO wait]:
internal/poll.runtime_pollWait(0x7fc21c317c90, 0x72, 0xc00009eaa8)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc0000bc498, 0x72, 0xffffffffffffff00, 0x765260, 0x9691c0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc0000bc498, 0xc00015f000, 0x1000, 0x1000)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc0000bc480, 0xc00015f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc0000bc480, 0xc00015f000, 0x1000, 0x1000, 0x60238f16053c53d, 0xc00009ebc0, 0x40338d)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000074090, 0xc00015f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/net/net.go:177 +0x68
bufio.(*Reader).Read(0xc000050240, 0xc00004c0a0, 0xa, 0xa, 0x43a95d, 0xc000012178, 0x1)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/bufio/bufio.go:216 +0x22f
io.ReadAtLeast(0x764ae0, 0xc000050240, 0xc00004c0a0, 0xa, 0xa, 0xa, 0x2, 0x2, 0xc00009ed60)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/io/io.go:310 +0x88
io.ReadFull(0x764ae0, 0xc000050240, 0xc00004c0a0, 0xa, 0xa, 0x8, 0xc00009ef68, 0xc000012120)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/io/io.go:329 +0x58
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.readMessageHeader(0xc00004c0a0, 0xa, 0xa, 0x764ae0, 0xc000050240, 0xc00009eec8, 0x2, 0x2, 0xc00009ee48)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/channel.go:54 +0x60
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*channel).recv(0xc00004c080, 0x7683a0, 0xc00004c0c0, 0x2, 0xc000083600, 0x0, 0x0, 0x7a, 0x0, 0x0)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/channel.go:102 +0x6d
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serverConn).run.func1(0xc000012060, 0xc00006a640, 0xc000012120, 0xc00004c080, 0x7683a0, 0xc00004c0c0, 0xc0000120c0, 0xc0000502a0)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:344 +0x1af
created by github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serverConn).run
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:314 +0x24a
goroutine 25 [chan receive]:
github.com/containerd/containerd/runtime/v1/shim.(*Service).processExits(0xc00009a0c0)
	/home/travis/gopath/src/github.com/containerd/containerd/runtime/v1/shim/service.go:491 +0xd6
created by github.com/containerd/containerd/runtime/v1/shim.NewService
	/home/travis/gopath/src/github.com/containerd/containerd/runtime/v1/shim/service.go:91 +0x3bf
goroutine 26 [syscall, 202 minutes]:
syscall.Syscall6(0xe8, 0x9, 0xc00009c9b8, 0x80, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x4)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/syscall/asm_linux_amd64.s:44 +0x5
github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait(0x9, 0xc00009c9b8, 0x80, 0x80, 0xffffffffffffffff, 0xffffffffffffffff, 0x765260, 0xc000100c00)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:1689 +0x72
github.com/containerd/containerd/vendor/github.com/containerd/console.(*Epoller).Wait(0xc000066740, 0xc0000c5a90, 0x8)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/console/console_linux.go:110 +0x7a
created by github.com/containerd/containerd/runtime/v1/shim.(*Service).initPlatform
	/home/travis/gopath/src/github.com/containerd/containerd/runtime/v1/shim/service_linux.go:111 +0xbb
goroutine 27 [chan receive, 2376 minutes]:
github.com/containerd/containerd/runtime/v1/shim.(*Service).forward(0xc00009a0c0, 0x764e40, 0xc00005cbd0)
	/home/travis/gopath/src/github.com/containerd/containerd/runtime/v1/shim/service.go:575 +0x62
created by github.com/containerd/containerd/runtime/v1/shim.NewService
	/home/travis/gopath/src/github.com/containerd/containerd/runtime/v1/shim/service.go:95 +0x471
goroutine 28 [IO wait, 31633 minutes]:
internal/poll.runtime_pollWait(0x7fc21c317d60, 0x72, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc0000bc398, 0x72, 0xc000082000, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc0000bc398, 0xffffffffffffff00, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc0000bc380, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc0000bc380, 0xc00005f0f0, 0xc00005f0b0, 0xc00006a640)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/net/fd_unix.go:238 +0x42
net.(*UnixListener).accept(0xc00005f2f0, 0xc0000a0e98, 0xc0000a0ea0, 0x18)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/net/unixsock_posix.go:162 +0x32
net.(*UnixListener).Accept(0xc00005f2f0, 0x731a28, 0xc00006a640, 0x7683e0, 0xc000070010)
	/home/travis/.gimme/versions/go1.11.11.linux.amd64/src/net/unixsock.go:257 +0x47
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*Server).Serve(0xc00005f0e0, 0x7683e0, 0xc000070010, 0x767f20, 0xc00005f2f0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:84 +0x10a
main.serve.func1(0x767f20, 0xc00005f2f0, 0xc00005f0e0, 0x7683e0, 0xc000070010)
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:207 +0x8b
created by main.serve
	/home/travis/gopath/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:205 +0x1d8
goroutine 30 [select]:
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serverConn).run(0xc00006a640, 0x7683e0, 0xc000070010)
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:413 +0x3b8
created by github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*Server).Serve
	/home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:124 +0x253
=== END goroutine stack dump ===" namespace=k8s.io path="/run/containerd/io.containerd.runtime.v1.linux/k8s.io/dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc" pid=29561
runc  --root /run/containerd/runc/k8s.io/ ps dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc 

container "dc2ca0615937ecc648dd5a6c64aff5c0993c3b0f31dbccd55fe2849371204bdc" does not exist

Output of containerd --version:

containerd --version

containerd github.com/containerd/containerd v1.2.7 85f6aa58b8a3170aec9824568f7a31832878b603
runc --version

runc version 1.0.0-rc8
spec: 1.0.1-dev
kubelet --version

Kubernetes v1.10.13

Any other relevant information:

SIGKILL the containerd-shim unblock this situation.

Interesting prometheus metric:

grpc_server_handled_total{grpc_code="Unknown",grpc_method="StopPodSandbox",grpc_service="runtime.v1alpha2.RuntimeService",grpc_type="unary"} 145649
containerd/containerd

Answer questions gimlichael

My Windows worker nodes are being flooded with these:

E0524 17:24:38.357894    4028 remote_runtime.go:495] ListContainerStats with filter &ContainerStatsFilter{Id:,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem f76dd003c4f827c36bb10d24d08e9f14f6e68be728b727126741fd77c6185ac3: A virtual machine or container with the specified identifier does not exist.
E0524 17:24:38.357894    4028 eviction_manager.go:255] eviction manager: failed to get summary stats: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem f76dd003c4f827c36bb10d24d08e9f14f6e68be728b727126741fd77c6185ac3: A virtual machine or container with the specified identifier does not exist.

Any relation?

Thanks.

useful!

Related questions

`grpc.WithBlock()` in `containerd.New()` swallows underlying issue when attempting to connect hot 1
failed to create containerd container: error unpacking image: content digest sha256:16*: not found" hot 1
Unable to pull DockerHub "library" images from Artifactory mirror with CRI plugin hot 1
failed to create containerd container: error unpacking image: content digest sha256:16*: not found" hot 1
runc init goes into a loop with containerd.io 1.2.2-3.3 hot 1
ctr image export not working hot 1
containerd can't pull image from Github Docker Package Registry hot 1
failed to pull image (centos7 overlayfs) again hot 1
containerd can't pull image from Github Docker Package Registry hot 1
containerd can't pull image from Github Docker Package Registry hot 1
Support remote snapshotter to speed up image pulling hot 1
Cgroup dir under /sys/fs/cgroup in container hot 1
[Question] ctr fails to pull images from insecure-registry. hot 1
Github User Rank List