profile
viewpoint

pull request commentcri-o/cri-o

Allow server to start without config

This is a logical step over #3123 :+1:

saschagrunert

comment created time in 2 hours

pull request commentcri-o/cri-o

Add support for crio drop-in configuration files

I think we should align with how systemd does some of this: https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html.

One thing where we are differing in this implementation vs. systemd is in priority of crio.conf. I think t the files in conf.d should still allow overriding crio.conf.

Thanks for putting this together. This has been on my list for so long :)

saschagrunert

comment created time in 2 hours

pull request commentcri-o/cri-o

Fail to start when stream server port already allocated

/retest

saschagrunert

comment created time in 2 hours

pull request commentcri-o/cri-o

Sandbox: Don't use an infra container in some cases

This doesn't work without the following patch. Without the patch, we always default to zero value for namespace mode which is pod and the infra container is always created for the pod.

diff --git a/server/sandbox_run_linux.go b/server/sandbox_run_linux.go
index a17ad221d..d9f989778 100644
--- a/server/sandbox_run_linux.go
+++ b/server/sandbox_run_linux.go
@@ -514,6 +514,9 @@ func (s *Server) runPodSandbox(ctx context.Context, req *pb.RunPodSandboxRequest
 
        var container *oci.Container
        // Only create the container in the runtime if we need it
+
+       sb.SetNamespaceOptions(securityContext.GetNamespaceOptions())
+
        if sb.NeedsInfra(s.config.ManageNSLifecycle) {
                container, err = oci.NewContainer(id, containerName, podContainer.RunDir, logPath, labels, g.Config.Annotations, kubeAnnotations, "", "", "", nil, id, false, false, false, sb.Privileged(), sb.RuntimeHandler(), podContainer.Dir, created, podContainer.Config.Config.StopSignal)
                if err != nil {
@@ -554,7 +557,6 @@ func (s *Server) runPodSandbox(ctx context.Context, req *pb.RunPodSandboxRequest
                g.AddAnnotation(fmt.Sprintf("%s.%d", annotations.IP, idx), ip)
        }
        sb.AddIPs(ips)
-       sb.SetNamespaceOptions(securityContext.GetNamespaceOptions())
 
        spp := securityContext.GetSeccompProfilePath()
        g.AddAnnotation(annotations.SeccompProfilePath, spp)

haircommander

comment created time in 4 hours

pull request commentcri-o/cri-o

Fail to start when already listening on socket

/test e2e_features_fedora

saschagrunert

comment created time in 6 hours

push eventopencontainers/runtime-tools

dbenoit

commit sha 2affd4568575a46b567ed7c795141bac61d5a450

Add missing clone rule for s390x. Signed-off-by: dbenoit <dbenoit@redhat.com>

view details

Mrunal Patel

commit sha d1bf3e66ff0aa45840608d28eae6ce15f2cc9a26

Merge pull request #700 from dbenoit17/fixes-clone-issue-699 Add missing clone rule for s390x. Fixes #699

view details

push time in 6 hours

PR merged opencontainers/runtime-tools

Add missing clone rule for s390x. Fixes #699

Fixes https://github.com/opencontainers/runtime-tools/issues/699

This PR adds the missing clone masked_eq seccomp rule which causes clone syscalls to fail on s390x.

I have only tested this patch directly in cri-o so far. Please consider this PR a work in progress while I become familiar with this project and its processes such as writing and running unit tests, etc.

Additionally, since this patch only affects s390x, please let me know if there is anything extra I should do to test the architecture-specific functionality.

Thanks, DB

+14 -0

2 comments

1 changed file

dbenoit17

pr closed time in 6 hours

issue closedopencontainers/runtime-tools

Generated Seccomp profile rejects clone syscall on s390x.

The s390x architecture requires an extra seccomp masked_eq rule to allow clone syscalls. This rule is present in cri-o's default configuration, but is missing in the configuration generated by opencontainers/runtime-tools. This causes syscalls to clone to be rejected by seccomp in cri-o subsystems which utilize this generated seccomp profile.

This issue affects the ose-pod container in openshift. When cri-o tries to instantiate /usr/bin/pod on s390x during the openshifft bootstrap, there is a clone syscall in the golang runtime startup which fails with a Permission Denied error. This prevents containers from being instantiated by the bootstrap kubelet, causing the openshift bootstrap to fail.

closed time in 6 hours

dbenoit17

pull request commentopencontainers/runtime-tools

Add missing clone rule for s390x. Fixes #699

LGTM

dbenoit17

comment created time in 6 hours

pull request commentopenshift/machine-config-operator

Switch to managing network ns lifecycle

@runcom ptal

haircommander

comment created time in 7 hours

pull request commentcri-o/cri-o

Switch default cgroup manager to systemd

/test integration_rhel

saschagrunert

comment created time in 7 hours

pull request commentcri-o/cri-o

[1.17] Revert net cleanup and ctx check

@runcom These are the only two changes which are somewhat related though I am not 100% convinced. We can merge this and see if it helps or explore other possibilities.

mrunalp

comment created time in 8 hours

pull request commentcri-o/cri-o

Add documentation about stream_port="0"

/lgtm

saschagrunert

comment created time in 8 hours

pull request commentcri-o/cri-o

WIP [1.16] fail on network stop

/retest

haircommander

comment created time in 9 hours

pull request commentcri-o/cri-o

Fail to start when already listening on socket

/lgtm

saschagrunert

comment created time in 11 hours

pull request commentcri-o/cri-o

Fail to start when stream server port already allocated

/lgtm

saschagrunert

comment created time in 11 hours

pull request commentcri-o/cri-o

Switch default cgroup manager to systemd

/lgtm

saschagrunert

comment created time in 11 hours

pull request commentopenshift/origin

test: fix the operator images test to skip succeeded containers

@rphillips ptal

mfojtik

comment created time in 11 hours

pull request commentopenshift/origin

test: fix the operator images test to skip succeeded containers

Looks good :+1:

mfojtik

comment created time in 11 hours

PR opened cri-o/cri-o

Reviewers
[1.17] Revert net cleanup

We are seeing possible regressions from these fixes so reverting them.

+3 -12

0 comment

3 changed files

pr created time in 11 hours

create barnchmrunalp/cri-o

branch : revert_net_cleanup

created branch time in 11 hours

pull request commentcri-o/cri-o

[1.16] fail on network stop

/test integration_rhel

haircommander

comment created time in a day

PR opened cri-o/cri-o

Reviewers
[1.16] Update test grid

Test PR to check what's up with e2e-aws.

Signed-off-by: Mrunal Patel mrunalp@gmail.com

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/cri-o/cri-o/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

- How I did it

- How to verify it

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: -->

+1 -0

0 comment

1 changed file

pr created time in a day

create barnchmrunalp/cri-o

branch : test_1.16

created branch time in a day

pull request commentcri-o/cri-o

[1.16] fail on network stop

/test e2e-aws

haircommander

comment created time in a day

pull request commentcri-o/cri-o

[1.16] fail on network stop

circle ci is failing on iptables failures from stop pod.

haircommander

comment created time in a day

pull request commentopenshift/machine-config-operator

Switch to managing network ns lifecycle

/lgtm

haircommander

comment created time in a day

pull request commentcri-o/cri-o

[1.16] fail on network stop

/retest

haircommander

comment created time in a day

pull request commentopenshift/machine-config-operator

Switch to managing network ns lifecycle

/test unit

haircommander

comment created time in a day

pull request commentcri-o/cri-o

[1.16] fail on network stop

/test e2e-aws

haircommander

comment created time in a day

Pull request review commentopenshift/machine-config-operator

Bug 1793144: kubelet: add log level environment variable

 contents: |   Type=notify   ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests   ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state+  Environment="KUBELET_LOG_LEVEL=3"

Did we test that we can override this in one of the environment files?

rphillips

comment created time in a day

Pull request review commentcri-o/cri-o

[1.16] fail on network stop

 func (s *Sandbox) Stopped() bool { 	return s.stopped } +// NetworkStopped returns whether the network has been stopped+func (s *Sandbox) NetworkStopped() bool {+	return s.networkStopped+}++// SetNetworkStopped sets the sandbox network state as stopped+// This should be set after a network stop operation succeeds,+// so we don't double stop the network+// if createFile is true, it creates a "network-stopped" file+// in the infra container's persistent dir+// this is used to track the network is stopped over reboots+// returns an error if an error occurred when creating the network-stopped file+func (s *Sandbox) SetNetworkStopped(createFile bool) error {+	if s.networkStopped {+		return nil+	}+	s.networkStopped = true+	if createFile {+		if err := s.createFileInInfraDir(sbNetworkStoppedFilename); err != nil {+			return fmt.Errorf("failed to create state file in container directory. Restores may fail: %v", err)+		}+	}+	return nil+}++func (s *Sandbox) createFileInInfraDir(filename string) error {+	infra := s.InfraContainer()+	_, err := os.Create(filepath.Join(infra.Dir(), filename))+	return err+}++func (s *Sandbox) RestoreStopped() {+	if s.fileExistsInInfraDir(sbStoppedFilename) {+		s.stopped = true+	}+	if s.fileExistsInInfraDir(sbNetworkStoppedFilename) {+		s.networkStopped = true+	}+}++func (s *Sandbox) fileExistsInInfraDir(filename string) bool {+	infra := s.InfraContainer()+	_, err := os.Stat(filepath.Join(infra.Dir(), filename))

we should log if it is some other error.

haircommander

comment created time in a day

pull request commentcri-o/cri-o

[1.16] Add image decryption capabilities

My preference is that this goes into 1.17 if possible.

lumjjb

comment created time in 4 days

Pull request review commentcri-o/cri-o

WIP: fail on network stop

 func (s *Server) networkStop(ctx context.Context, sb *sandbox.Sandbox) { 	podNetwork, err := s.newPodNetwork(sb) 	if err != nil { 		log.Warnf(ctx, err.Error())-		return+		return nil

We need to return this error as the next call will fail if this doesn't succeed.

haircommander

comment created time in 4 days

Pull request review commentopencontainers/runc

rootfs: do not permit /proc mounts to non-directories

 func mountToRootfs(m *configs.Mount, rootfs, mountLabel string, enableCgroupns b  	switch m.Device { 	case "proc", "sysfs":+		// If the destination already exists and is not a directory, we remove+		// it. This is to avoid mounting through a symlink or similar -- which+		// has been a "fun" attack scenario in the past.+		// TODO: This won't be necessary once we switch to libpathrs and we can+		//       stop all of these symlink-exchange attacks.+		if fi, err := os.Lstat(dest); err != nil {+			if !os.IsNotExist(err) {+				return err+			}+		} else if fi.Mode()&os.ModeDir == 0 {+			if err := os.Remove(dest); err != nil {

Instead of removing, should we just bail out and fail?

cyphar

comment created time in 5 days

pull request commentcri-o/cri-o

Take total_inactive_file into consideration for memory usage

We should backport this as well. Thanks!

saschagrunert

comment created time in 5 days

pull request commentcri-o/cri-o

Take total_inactive_file into consideration for memory usage

/lgtm

saschagrunert

comment created time in 5 days

create barnchmrunalp/runc

branch : collect_mode

created branch time in 5 days

pull request commentcri-o/cri-o

Check for context errors before returning from longer requests

Thinking about the reusing older pod scenario, I think if we add a boolean that we are going to return success before returning success response, we can go off that boolean to decide whether to reuse the older created pod or cleanup and create a new one. Here are some scenarios:

Scenario 1:

  1. kubelet requests name pod attempt 1
  2. crio goes through all the steps include checking ctx for error. There are no errors.
  3. crio sets pod succeeded to true for this pod name pod_1
  4. kubelet times out
  5. crio returns response.
  6. kubelet sends new request for name pod attempt 1
  7. crio checks that pod_1 is already present and succeeded: true. crio also updates status of the pod to make sure it is running.
  8. crio returns this pod id before kubelet times out this time.

Scenario 2:

  1. kubelet requests name pod attempt 1
  2. crio goes through all the steps include checking ctx for error. There are no errors.
  3. crio sets pod succeeded to true for this pod name pod_1
  4. kubelet times out.
  5. kubelet sends new request for name pod attempt 1
  6. crio see another request for same pod but it hasn't yet set boolean for first request so it cleans up the current pod
  7. crio returns response to first request but that's ignored and there is no cleanup since it is returning success.
  8. crio services the new request by creating a new pod.

In Scenario 2, we tear down a pod which could have been reused, but the race window is very narrow.

In a similar scenario where we didn't set boolean to true for the first request by the time second shows up then we clean up the first pod.

If we keep on timing out in scenario 2 like situation we have to be careful with not introducing more races.

@giuseppe @mtrmac wdyt?

mrunalp

comment created time in 7 days

Pull request review commentcri-o/cri-o

Check for context errors before returning from longer requests

 func (s *Server) CreateContainer(ctx context.Context, req *pb.CreateContainerReq 		log.Errorf(ctx, "%v", err) 	} +	if ctx.Err() == context.Canceled || ctx.Err() == context.DeadlineExceeded {+		return nil, ctx.Err()+	}+

@giuseppe actually after rereading this comment you are right we still have that race. See my response to @mtrmac on thoughts to fix this via bumping up attempt number or on crio side adding a random string to name and doing cleanup on older leaked resources.

mrunalp

comment created time in 7 days

pull request commentcri-o/cri-o

Check for context errors before returning from longer requests

@mitr you are absolutely right.

If so, I can’t see how this PR truly fixes the problem; it does narrow the failure timing significantly (assuming the CRI-O context timeout and the Kubelet timeouts are very similar), but it doesn’t eliminate it — if the timeout value is the same for the Kubelet and CRI-O, it starts for the Kubelet at the latest when the request is sent, but for CRI-O at the earliest when it is received, so the Kubelet is still going to timeout first.

Yes, the timeout value is the same, so the kubelet should be timing out first.

AFAICS, in the above scenario, fundamentally Kubelet must be able to detect that “the operation has succeeded previously” (i.e. notice that the name is reused and that implies this step can now be skipped) or start a separate operation with no shared state with the one that was considered a failure (i.e. allocate an entirely new, different, pod name, and eventually somehow clean up the orphaned resources that were successfully allocated but the Kubelet doesn’t actually know about). (I haven’t looked into the details, that may well involve extending the CRI.)

There is a field called attempt number in the pod request. If the kubelet bumps that up on timeouts then we wouldn't have these clashes. We would still have to cleanup any older created resources from previous attempts if they escaped due to the we returned success but kubelet timed out before it received success race. The current PR doesn't fix that race but is an attempt to reduce the amount of leaks compared to status quo.

I also thought about reusing previously created pod as an option. But I worry that it will be difficult to get right compared to having the attempt number be different from the kubelet or us appending a random string at the end of the name and deleting an old pod when we see one requested with the same parameters.

mrunalp

comment created time in 7 days

pull request commentopencontainers/runc

Separate systemd dbus connection initialization from running check

@opencontainers/runc-maintainers ptal

mrunalp

comment created time in 7 days

push eventmrunalp/runc

Akihiro Suda

commit sha 5c20ea1472dbeeebdb1bcef31a09888890a25b3a

fix merging #2177 and #2169 A new method was added to the cgroup interface when #2177 was merged. After #2177 got merged, #2169 was merged without rebase (sorry!) and compilation was failing: libcontainer/cgroups/fs2/fs2.go:208:22: container.Cgroup undefined (type *configs.Config has no field or method Cgroup) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha 55f8c254beb00f916c115a7034f7eee0cfd657a1

temporarily disable CRIU tests Ubuntu kernel is temporarily broken: https://github.com/opencontainers/runc/pull/2198#issuecomment-571124087 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Qiang Huang

commit sha 709377ca558df88ea538852c9310b700f140fc9b

Merge pull request #2198 from AkihiroSuda/criu-master temporarily disable CRIU tests

view details

Mrunal Patel

commit sha 36ac9a37d9f0f4e571c6c4678692036074fd9e64

systemd: Export function IsSystemRunning We export this function so it could be used outside the package. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha 33771c78a5cfd12f3e06d14e2eb569eb13549683

Refactor systemd connection initialization 1. Use the simpler IsRunningSystemd for checking if systemd is running. 2. Lazy initialization the systemd dbus connection in the systemd package. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

push time in 7 days

push eventmrunalp/runc

Akihiro Suda

commit sha 88e8350de28e2bdb79a051a7b8504b21e810b341

cgroup2: split fs2 from fs split fs2 package from fs, as mixing up fs and fs2 is very likely to result in unmaintainable code. Inspired by containerd/cgroups#109 Fix #2157 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha ec49f98d72aa7d3fc61eb0d9f4dd7915abcad114

fs2: support legacy device spec (to pass CI) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Mrunal Patel

commit sha 5cc0deaf7a089a91a5ce4b81f835b64fcc4778d6

Merge pull request #2169 from AkihiroSuda/split-fs cgroup2: split fs2 from fs

view details

Mrunal Patel

commit sha b66df411de4f94d3c9152e7363c12587eb098c55

systemd: Export function IsSystemRunning We export this function so it could be used outside the package. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha 610df4672ea7da02061fe88472a3412455315aba

Refactor systemd connection initialization 1. Use the simpler IsRunningSystemd for checking if systemd is running. 2. Lazy initialization the systemd dbus connection in the systemd package. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

push time in 8 days

push eventopencontainers/runc

Akihiro Suda

commit sha 88e8350de28e2bdb79a051a7b8504b21e810b341

cgroup2: split fs2 from fs split fs2 package from fs, as mixing up fs and fs2 is very likely to result in unmaintainable code. Inspired by containerd/cgroups#109 Fix #2157 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha ec49f98d72aa7d3fc61eb0d9f4dd7915abcad114

fs2: support legacy device spec (to pass CI) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Mrunal Patel

commit sha 5cc0deaf7a089a91a5ce4b81f835b64fcc4778d6

Merge pull request #2169 from AkihiroSuda/split-fs cgroup2: split fs2 from fs

view details

push time in 8 days

PR merged opencontainers/runc

cgroup2: split fs2 from fs

split fs2 package from fs, as mixing up fs and fs2 is very likely to result in unmaintainable code.

Inspired by containerd/cgroups#109

Fix #2157

Signed-off-by: Akihiro Suda akihiro.suda.cz@hco.ntt.co.jp


This PR is large, but almost all changes are in libcontainer/cgroups/fs2 and libcontainer/cgroups/systemd/unified_hierarchy.go. Other changes are just for deduplicating files across cgroups/fs and cgroups/fs2.

+963 -901

8 comments

43 changed files

AkihiroSuda

pr closed time in 8 days

issue closedopencontainers/runc

cgroup2: v2 subsystems should not call WriteCgroupProc()

https://github.com/opencontainers/runc/blob/c4d8e1688c816a8cef632a3b44a38611511b7140/libcontainer/cgroups/fs/cpu_v2.go#L32-L42

v2 subsystems should not call WriteCgroupProc(), because it should be managed by Manager rather than each of subsystems for unified mode

https://github.com/opencontainers/runc/pull/2148#issuecomment-545762055

The subsystem interface should be modified like this:

diff --git a/libcontainer/cgroups/fs/apply_raw.go b/libcontainer/cgroups/fs/apply_raw.go
index 512fd700..66ab43ec 100644
--- a/libcontainer/cgroups/fs/apply_raw.go
+++ b/libcontainer/cgroups/fs/apply_raw.go
@@ -58,20 +58,24 @@ func (s subsystemSet) Get(name string) (subsystem, error) {
 }
 
 type subsystem interface {
        // Name returns the name of the subsystem.
        Name() string
        // Returns the stats, as 'stats', corresponding to the cgroup under 'path'.
        GetStats(path string, stats *cgroups.Stats) error
+       // Set the cgroup represented by cgroup.
+       Set(path string, cgroup *configs.Cgroup) error
+}
+
+type subsystemV1 interface {
+       subsystem
        // Removes the cgroup represented by 'cgroupData'.
        Remove(*cgroupData) error
        // Creates and joins the cgroup represented by 'cgroupData'.
        Apply(*cgroupData) error
-       // Set the cgroup represented by cgroup.
-       Set(path string, cgroup *configs.Cgroup) error
 }
 
 type Manager struct {
        mu       sync.Mutex
        Cgroups  *configs.Cgroup
        Rootless bool // ignore permission-related errors
        Paths    map[string]string

closed time in 8 days

AkihiroSuda

pull request commentopencontainers/runc

cgroup2: split fs2 from fs

LGTM

AkihiroSuda

comment created time in 8 days

pull request commentopenshift/release

Add config and job for cri-o 1.17

@stevekuznetsov ptal.

umohnani8

comment created time in 8 days

pull request commentopenshift/release

Add config and job for cri-o 1.17

/lgtm

umohnani8

comment created time in 8 days

pull request commentcri-o/cri-o

[1.14] Destroy the pod's network when it can't be restored

/lgtm

umohnani8

comment created time in 8 days

pull request commentcri-o/cri-o

Bump version to 1.17.0-rc1

/lgtm

umohnani8

comment created time in 8 days

PR opened cri-o/cri-o

Reviewers
[1.17] Check for context erroring before returning from longer requests

Signed-off-by: Mrunal Patel mrunalp@gmail.com

+8 -0

0 comment

2 changed files

pr created time in 8 days

create barnchmrunalp/cri-o

branch : check_ctx_1.17

created branch time in 8 days

Pull request review commentcri-o/cri-o

Check for context errors before returning from longer requests

 func (s *Server) CreateContainer(ctx context.Context, req *pb.CreateContainerReq 		log.Errorf(ctx, "%v", err) 	} +	if ctx.Err() == context.Canceled || ctx.Err() == context.DeadlineExceeded {+		return nil, ctx.Err()+	}+

Yeah that is possible but that should be okay since we end up cleaning the pod in CRI-O which is the main goal of this PR. The issue earlier was we weren't cleaning up at all so kubelet crio were out of sync w.r.t. the pod state. Also, I think we need these checks sprinkled through the function more as a follow up.

mrunalp

comment created time in 8 days

pull request commentcri-o/cri-o

WIP: Check for context errors before returning from longer requests

Awesome, thanks for testing!

On Jan 11, 2020, at 6:12 PM, Daphne Maddox notifications@github.com wrote:

Seems to fix the issue as I mentioned experiencing it. The name is reserved error was pretty frequent (though I won't bore you with the log lines)... now it's gone.

(before this PR)

Normal Created 52m kubelet, ubu Created container kube-controller-manager Normal Started 52m kubelet, ubu Started container kube-controller-manager Warning FailedCreatePodSandBox 44m kubelet, ubu Failed to create pod sandbox: rpc error: code = Unknown desc = error reserving pod name k8s_kube-controller-manager-ubu_kube-system_ff3dda92fdafad8918c8b8c81e70d5fc_6 for id f37693177f3520cee653f9276c6560f481a3616fb493b76a4c7917d1046db31d: name is reserved Normal SandboxChanged 44m (x2 over 44m) kubelet, ubu Pod sandbox changed, it will be killed and re-created. Normal Pulled 44m kubelet, ubu Container image "k8s.gcr.io/kube-controller-manager:v1.17.0" already present on machine Normal Created 44m kubelet, ubu Created container kube-controller-manager Normal Started 44m kubelet, ubu Started container kube-controller-manager

deployed this PR:

Normal SandboxChanged 12m kubelet, ubu Pod sandbox changed, it will be killed and re-created. Normal Pulled 12m kubelet, ubu Container image "k8s.gcr.io/kube-controller-manager:v1.17.0" already present on machine Normal Created 12m kubelet, ubu Created container kube-controller-manager Normal Started 12m kubelet, ubu Started container kube-controller-manager Normal Pulled 7m41s kubelet, ubu Container image "k8s.gcr.io/kube-controller-manager:v1.17.0" already present on machine Normal Created 7m40s kubelet, ubu Created container kube-controller-manager Normal Started 7m40s kubelet, ubu Started container kube-controller-manager Normal SandboxChanged 2m19s kubelet, ubu Pod sandbox changed, it will be killed and re-created. Normal Pulled 2m12s kubelet, ubu Container image "k8s.gcr.io/kube-controller-manager:v1.17.0" already present on machine Normal Created 2m10s kubelet, ubu Created container kube-controller-manager Normal Started 2m10s kubelet, ubu Started container kube-controller-manager — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mrunalp

comment created time in 10 days

pull request commentcri-o/cri-o

WIP: Check for context errors before returning from longer requests

/launch e2e-aws

mrunalp

comment created time in 10 days

pull request commentcri-o/cri-o

doc: improve setup.md

You need to sign your commits :)

dougsland

comment created time in 10 days

pull request commentcri-o/cri-o

doc: improve setup.md

/retest

dougsland

comment created time in 10 days

pull request commentcri-o/cri-o

WIP: Check for context erroring before returning from longer requests

/retest

mrunalp

comment created time in 10 days

PR opened cri-o/cri-o

Reviewers
WIP: Check for context erroring before returning from longer requests

This is a possible fix for #2984

The issue seems to be that the kubelet times out on the create request but crio doesn't error out when kubelet client times out. So, we end up with the pod being created and then kubelet keeps on trying to create the pod again and it can't since the name is reserved.

With the change, we error out on timeouts so the pod creation fails and the name is unreserved so later attempts by the kubelet to create a pod or container may succeed.

We may want to add more of these checks around long running operations rather than just at the end in follow-ups.

Signed-off-by: Mrunal Patel mrunalp@gmail.com

+8 -0

0 comment

2 changed files

pr created time in 10 days

create barnchmrunalp/cri-o

branch : check_ctx

created branch time in 10 days

PR opened opencontainers/runc

Separate systemd dbus connection initialization from running check

This allows us to catch and report any failures in initializing the systemd dbus connection.

+39 -29

0 comment

4 changed files

pr created time in 11 days

create barnchmrunalp/runc

branch : systemd_conn_cleanup

created branch time in 11 days

push eventcri-o/cri-o

Urvashi Mohnani

commit sha 8fbed3819825dc8d267e41a6ded9b68498badcd6

Destroy the pod's network when it can't be restored If a pod cannot be restored after restart, destroy the pod network and release the IPs for fututre use Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>

view details

Mrunal Patel

commit sha 9e3db6606f7b6a172692bc79d695ac5e39044ead

Merge pull request #3100 from openshift-cherrypick-robot/cherry-pick-3096-to-release-1.16 [release-1.16] Destroy the pod's network when it can't be restored

view details

push time in 11 days

PR merged cri-o/cri-o

Reviewers
[release-1.16] Destroy the pod's network when it can't be restored approved dco-signoff: yes lgtm ok-to-test size/XS

This is an automated cherry-pick of #3096

/assign umohnani8

+4 -3

11 comments

1 changed file

openshift-cherrypick-robot

pr closed time in 11 days

pull request commentcri-o/cri-o

[1.16] Don't check for closed namespaces

Kata is broken. Feel free to force merge.

Sent from my iPhone

On Jan 10, 2020, at 5:09 PM, Daniel J Walsh notifications@github.com wrote:

/test kata-containers

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

sysrich

comment created time in 11 days

pull request commentcri-o/cri-o

server: create cgroupns when running on cgroup v2

/lgtm

giuseppe

comment created time in 11 days

pull request commentcri-o/cri-o

[1.16] Don't check for closed namespaces

/lgtm

sysrich

comment created time in 11 days

pull request commentcri-o/cri-o

[1.16] Don't check for closed namespaces

/retest

sysrich

comment created time in 11 days

pull request commentcri-o/cri-o

Destroy the pod's network when it can't be restored

/approve

umohnani8

comment created time in 11 days

pull request commentcri-o/cri-o

[1.17] Add logic for running openshift e2e-aws tests

/lgtm

umohnani8

comment created time in 13 days

push eventmrunalp/cri-o

Mrunal Patel

commit sha a0cb8161d6fd47f29fb824c985950acb1cabdf6f

Update to conmon v2.0.9 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

push time in 14 days

PR opened cri-o/cri-o

Reviewers
Update to conmon v2.0.9

Signed-off-by: Mrunal Patel mrunalp@gmail.com

@haircommander @umohnani8 @rhatdan @saschagrunert ptal

+10 -5

0 comment

7 changed files

pr created time in 14 days

issue closedcri-o/cri-o

CRI-O fails to start container due to missing log directory


GENERAL SUPPORT INFORMATION

The GitHub issue tracker is for bug reports and feature requests. General support for CRI-O can be found at the following locations:

  • IRC - #cri-o channel on irc.freenode.org
  • Slack - kubernetes.slack.com #sig-node channel
  • Post a question on StackOverflow, using the CRI-O tag

BUG REPORT INFORMATION

Use the commands below to provide key information from your environment: You do NOT have to include this information if this is a FEATURE REQUEST -->

Description

<!-- Briefly describe the problem you are having in a few paragraphs. -->

Creating a container following the sandbox example for crictl from https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md fails to start the container. Checking logs I found that journalctl was reporting a no such file or directory. If I update the pod and container to not specify the log directory the containers start in the pods successfully. This has been impacting the ability to start kubernetes with CRI-O runtime

Steps to reproduce the issue:

  1. Bare minimum install of CENTOS 7.7.1908
  2. Add repo for CRI-O from cbs.centos.org/repos/paas7-crio-115-release
  3. yum install cri-o
  4. add repo for kubernetes from packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
  5. yum install cri-tools
  6. disable swap
  7. start cri-o service
  8. create pod : crictl runp pod-config.json
  9. create container : crictl create $POD_ID container-config.json pod-config.json

Describe the results you received: container does not start with following error FATA[0000] Creating container failed: rpc error: code = Unknown desc = exit status 1: write child: broken pipe

Describe the results you expected: Have a running container

Additional information you deem important (e.g. issue happens only occasionally): Edited pod config to not specify /tmp as log_directory Edited container config to use log_path as busybox-0.log instead of busybox/0.log

Output of crio --version:

crio version 1.15.1-2.el7

Additional environment details (AWS, VirtualBox, physical, etc.):

running on bare metal install of INTEL NUC5CPYH

closed time in 14 days

rancid-racer

issue commentcri-o/cri-o

CRI-O fails to start container due to missing log directory

@rancid-racer Thanks!

rancid-racer

comment created time in 14 days

pull request commentcri-o/cri-o

oci: Handle timeouts correctly for probes

Yeah, I am going to open a PR to update conmon for master. We already did that for 1.16.

mrunalp

comment created time in 14 days

pull request commentcri-o/cri-o

update createSandboxContainer to parse hugepages limits from CRI message

Looks fine to me. @saschagrunert you want to check the dependency update here?

bg-chun

comment created time in 14 days

pull request commentcri-o/cri-o

[1.15] test: add changes required for fedora 30 AMI

/lgtm

haircommander

comment created time in 14 days

pull request commentcri-o/cri-o

[1.14] skip test failing on mount eperm

/lgtm

haircommander

comment created time in 14 days

pull request commentcri-o/cri-o

Fix possible segmentation fault in error handling

/lgtm

saschagrunert

comment created time in 14 days

push eventcri-o/cri-o

Mrunal Patel

commit sha 4a7fd40130bbb9174596a17dc7c3692792d58531

Update conmon to v2.0.9 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha 486dedf7736c467aa68ae1b6149c4cb6a2d0ab88

oci: Handle timeouts correctly for probes We shouldn't return a custom error when an exec sync command times out. Instead, we should return a non-zero exit code. When the prober code in kubernetes sees a custom error it goes into Unknown status and doesn't restart containers on timed out probes as expected. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha 639f39b214f1752b9c724c00160470300e62ebbc

Merge pull request #3065 from mrunalp/fix_probe_timeouts [1.16] oci: Handle timeouts correctly for probes

view details

push time in 15 days

PR merged cri-o/cri-o

Reviewers
[1.16] oci: Handle timeouts correctly for probes approved dco-signoff: yes lgtm size/M

We shouldn't return a custom error when an exec sync command times out. Instead, we should return a non-zero exit code.

When the prober code in kubernetes sees a custom error it goes into Unknown status and doesn't restart containers on timed out probes as expected.

Signed-off-by: Mrunal Patel mrunalp@gmail.com

+24 -6

11 comments

8 changed files

mrunalp

pr closed time in 15 days

pull request commentcri-o/cri-o

[1.16] oci: Handle timeouts correctly for probes

/retest

mrunalp

comment created time in 15 days

pull request commentcri-o/cri-o

[1.16] oci: Handle timeouts correctly for probes

/retest

mrunalp

comment created time in 15 days

pull request commentcri-o/cri-o

[1.16] oci: Handle timeouts correctly for probes

/test launch-aws

mrunalp

comment created time in 15 days

pull request commentcri-o/cri-o

[1.16] oci: Handle timeouts correctly for probes

PR updated with conmon bump.

mrunalp

comment created time in 15 days

push eventmrunalp/cri-o

Mrunal Patel

commit sha 486dedf7736c467aa68ae1b6149c4cb6a2d0ab88

oci: Handle timeouts correctly for probes We shouldn't return a custom error when an exec sync command times out. Instead, we should return a non-zero exit code. When the prober code in kubernetes sees a custom error it goes into Unknown status and doesn't restart containers on timed out probes as expected. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

push time in 15 days

push eventmrunalp/cri-o

Mrunal Patel

commit sha f73f12e508d10de2f91b05f09ea27d92a7ce1883

test: Update to golang 1.13 kubernetes requires golang 1.13 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha ac7fb946daeb64023e9509c9f5ebe03bb9ee41d5

test: Remove godoc since it was dropped from go 1.13 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha cccde399af72b337c99d67098e23d28b3bb50634

Merge pull request #3068 from mrunalp/fix_ci_golang_1.13 [1.16] Use golang 1.13

view details

Mrunal Patel

commit sha 4a7fd40130bbb9174596a17dc7c3692792d58531

Update conmon to v2.0.9 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha daf4a16866521eff6af84d4d51ae92896bfffc8d

oci: Handle timeouts correctly for probes We shouldn't return a custom error when an exec sync command times out. Instead, we should return a non-zero exit code. When the prober code in kubernetes sees a custom error it goes into Unknown status and doesn't restart containers on timed out probes as expected. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

push time in 15 days

pull request commentcontainers/conmon

Bump to 2.0.9

LGTM

haircommander

comment created time in 15 days

push eventcri-o/cri-o

Mrunal Patel

commit sha f73f12e508d10de2f91b05f09ea27d92a7ce1883

test: Update to golang 1.13 kubernetes requires golang 1.13 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha ac7fb946daeb64023e9509c9f5ebe03bb9ee41d5

test: Remove godoc since it was dropped from go 1.13 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

view details

Mrunal Patel

commit sha cccde399af72b337c99d67098e23d28b3bb50634

Merge pull request #3068 from mrunalp/fix_ci_golang_1.13 [1.16] Use golang 1.13

view details

push time in 16 days

PR merged cri-o/cri-o

Reviewers
[1.16] Use golang 1.13 approved dco-signoff: yes lgtm size/XS

kubernetes requires go 1.13

+1 -2

5 comments

2 changed files

mrunalp

pr closed time in 16 days

pull request commentcri-o/cri-o

[1.16] Use golang 1.13

Merging since kata tests shouldn't be affected by this change.

mrunalp

comment created time in 16 days

pull request commentcri-o/cri-o

[1.16] Use golang 1.13

/retest

mrunalp

comment created time in 16 days

pull request commentcontainers/conmon

Add TimedOutMessage to config to share with go code

@rhatdan @haircommander @giuseppe ptal

mrunalp

comment created time in 16 days

PR opened cri-o/cri-o

Reviewers
[1.16] Use golang 1.13

kubernetes requires go 1.13

+1 -2

0 comment

2 changed files

pr created time in 16 days

create barnchmrunalp/cri-o

branch : fix_ci_golang_1.13

created branch time in 16 days

more