profile
viewpoint
Aleksa Sarai [see §317C(6)] cyphar @SUSE Linux GmbH Oceania (circa 1984) https://www.cyphar.com Backdoors are courtesy of the Australian Government. "designated communications provider" under §317C(6) of the Telecommunications Act 1997.

cyphar/devgibson 53

Hackers in yo' kernel!

cyphar/filepath-securejoin 33

Proposed filepath.SecureJoin implementation

cyphar/dotfiles 22

... everybody needs one, right?

cyphar/cyphar.com 13

My personal website.

cyphar/initrs 13

[OUTDATED] Please switch to https://github.com/openSUSE/catatonit

cyphar/ascii-snake 10

A remake of the old Nokia Snake game.

cyphar/keplerk2-halo 3

A research project to see if photometry can be done on halo contamination of K2 postage stamps.

cyphar/comp2129 2

COMP2129 2017

cyphar/comp3520 2

COMP3520 2019

cyphar/epyc 2

Embedded PYthon Code (a templating engine).

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose+was to align Docker and other ecosystem players around a runtime and image+specification that would become an industry baseline for how images are packaged+and executed in OCI compliant runtimes.++Since that initial work, finalized in 1.0 specs in the summer of 2017, a+distribution specification is now underway, based on the HTTP API of the initial+Docker distribution (registry) project. In 2019, an "artifacts" project was+approved by the TOB and will be used to define other artifact types (in addition+to OCI images) which may be defined and stored in registries conforming to the+OCI distribution spec.++## Who uses the OCI?++It makes the most sense to see consumers of OCI as fitting into a few specific+categories: contributors/members, implementors, and end users.++**Contributors**, including the project maintainers, and member companies have a+vested interest in bringing forward the "state of the art" with respect to the+scope of the OCI. Currently this is of course limited to specifications around+runtime, image, and distribution of containers. The artifacts repository and+project is related to both image and distribution and is a natural expansion+of OCI into ongoing experimentation with registries and cloud native tooling.+Specifically, [artifacts](https://github.com/opencontainers/artifacts) expands the concept around the **what** when+discussing non-OCI image content (Helm charts, Singularity container images,+SPDX, source bundles, etc.) and registries.++**Implementors** own projects outside the OCI for which they have determined+value in being "OCI compliant"; whether that's a registry, a unique container+runtime, or an end-user tool that talks to registries. They take the OCI+specifications and implement support for them in their project, and potentially+will use the conformance suite within the OCI to validate their compliance to+the specification(s).++**End Users** tend to gain value from the OCI specifications in an indirect+way: they will expect projects and products that claim OCI compliance to+interoperate smoothly with other projects and products which are OCI compliant.+They will look to the OCI to continue maturing conformance and specifications+to best support the cloud native ecosystem's goal of interoperability for+runtimes, images, and distributing images and artifacts across clouds and+platforms.++## What Types of Projects Exist within the OCI?++There has been some growth in the nature and use of the OCI since those+initial meetings around the image and runtime specification in 2015. The+following subsections define project categories which exist in the OCI today.++### Specifications++Clearly the image, runtime, and distribution specifications are the key+reason for the existence of the OCI today. These standards are meant to+provide a baseline to follow for implementors of runtimes, registries, and+tools which interact with container images and registries.+ - [Runtime spec](https://github.com/opencontainers/runtime-spec)+ - [Image spec](https://github.com/opencontainers/image-spec)+ - [Distribution spec](https://github.com/opencontainers/distribution-spec)++### Spec Conformance Test++Conformance tests should provide a clear, end-user runnable test suite for+implementors to use to determine and demonstrate that an implementing project+or product meets the explicit definitions of the specification.+ - [OCI Conformance](https://github.com/opencontainers/oci-conformance)++The most advanced conformance implementation to date is for the new distribution specification. Additional work on image and runtime conformance is ongoing.++### Libraries ++While hosting library code is not a common goal of the OCI, there are a few+specific cases where small, non-end user focused, and tightly scoped libraries+have been accepted into the OCI. The common denominator for these libraries are+that they help implementors properly use features of the specifications:+ - [go-digest](https://github.com/opencontainers/go-digest)+ - [selinux](https://github.com/opencontainers/selinux)++Utilities and end-user UX-oriented code is most likely better targeted at other+more broad communities like the [CNCF](https://cncf.io). While there are not+explicit rules, a discussion with the TOB is warranted for projects looking to+contribute a library to OCI.++### Reference Implementations++While theoretically a specification can have one or more reference+implementations, the OCI runtime spec and the program, `runc`, have gone+hand in hand simply due to the particulars around the founding of the OCI.++It is not inconceivable that more reference implementations would be +contributed and supported within the OCI, but at this point, the only+active and viable reference implementation within the OCI is the `runc`+implementation of the runtime specification, based around the contributed+**libcontainer** codebase contributed by Docker.+ - [runc](https://github.com/opencontainers/runc)++Runc is also unique in that it is an open source reference implementation,+but also a core dependent ecosystem component underneath the majority of+container engine implementations in use today.++For any future reference implementation to be adopted by the OCI, it would+need to be kept in sync with the specification it implements. For a change+to be accepted to the spec, the equivalent implementation would need to be+authored and accepted as well.++## Should my Project be in the OCI?++The OCI receives proposals suggesting additions to the current suite+of project types listed above. We understand that a perfect framework+for determining inclusion or rejection of these proposals is an+intangible goal. However, we list the following considerations to help+guide future potential submissions.++ 1. The OCI, unlike the CNCF, is not directly chartered for the advancement, marketing, and support of general cloud native software projects.+ 2. Projects consumed via a UI and/or with a significant end user experience component are unlikely to be a good fit for the OCI model.

I think that this boils down to the "dirt" or "boring infrastructure" question -- we should phrase this as effectively saying "only boring projects make sense within the OCI" (though obviously without sounding so pessimistic about the prospect). There actually is text in the OCI Charter which mentions that this was the intention of the OCI from the outset:

The Open Container Initiative does not seek to be a marketing organization, define a full stack or solution requirements, and will strive to avoid standardizing technical areas undergoing innovation and debate.

estesp

comment created time in 4 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose+was to align Docker and other ecosystem players around a runtime and image+specification that would become an industry baseline for how images are packaged+and executed in OCI compliant runtimes.++Since that initial work, finalized in 1.0 specs in the summer of 2017, a+distribution specification is now underway, based on the HTTP API of the initial+Docker distribution (registry) project. In 2019, an "artifacts" project was+approved by the TOB and will be used to define other artifact types (in addition+to OCI images) which may be defined and stored in registries conforming to the+OCI distribution spec.++## Who uses the OCI?++It makes the most sense to see consumers of OCI as fitting into a few specific+categories: contributors/members, implementors, and end users.++**Contributors**, including the project maintainers, and member companies have a+vested interest in bringing forward the "state of the art" with respect to the+scope of the OCI. Currently this is of course limited to specifications around+runtime, image, and distribution of containers. The artifacts repository and+project is related to both image and distribution and is a natural expansion+of OCI into ongoing experimentation with registries and cloud native tooling.+Specifically, [artifacts](https://github.com/opencontainers/artifacts) expands the concept around the **what** when+discussing non-OCI image content (Helm charts, Singularity container images,+SPDX, source bundles, etc.) and registries.++**Implementors** own projects outside the OCI for which they have determined+value in being "OCI compliant"; whether that's a registry, a unique container+runtime, or an end-user tool that talks to registries. They take the OCI+specifications and implement support for them in their project, and potentially+will use the conformance suite within the OCI to validate their compliance to+the specification(s).++**End Users** tend to gain value from the OCI specifications in an indirect+way: they will expect projects and products that claim OCI compliance to+interoperate smoothly with other projects and products which are OCI compliant.+They will look to the OCI to continue maturing conformance and specifications+to best support the cloud native ecosystem's goal of interoperability for+runtimes, images, and distributing images and artifacts across clouds and+platforms.++## What Types of Projects Exist within the OCI?++There has been some growth in the nature and use of the OCI since those+initial meetings around the image and runtime specification in 2015. The+following subsections define project categories which exist in the OCI today.++### Specifications++Clearly the image, runtime, and distribution specifications are the key+reason for the existence of the OCI today. These standards are meant to+provide a baseline to follow for implementors of runtimes, registries, and+tools which interact with container images and registries.+ - [Runtime spec](https://github.com/opencontainers/runtime-spec)+ - [Image spec](https://github.com/opencontainers/image-spec)+ - [Distribution spec](https://github.com/opencontainers/distribution-spec)++### Spec Conformance Test++Conformance tests should provide a clear, end-user runnable test suite for+implementors to use to determine and demonstrate that an implementing project+or product meets the explicit definitions of the specification.+ - [OCI Conformance](https://github.com/opencontainers/oci-conformance)++The most advanced conformance implementation to date is for the new distribution specification. Additional work on image and runtime conformance is ongoing.++### Libraries ++While hosting library code is not a common goal of the OCI, there are a few+specific cases where small, non-end user focused, and tightly scoped libraries+have been accepted into the OCI. The common denominator for these libraries are+that they help implementors properly use features of the specifications:+ - [go-digest](https://github.com/opencontainers/go-digest)+ - [selinux](https://github.com/opencontainers/selinux)++Utilities and end-user UX-oriented code is most likely better targeted at other+more broad communities like the [CNCF](https://cncf.io). While there are not+explicit rules, a discussion with the TOB is warranted for projects looking to+contribute a library to OCI.++### Reference Implementations++While theoretically a specification can have one or more reference+implementations, the OCI runtime spec and the program, `runc`, have gone+hand in hand simply due to the particulars around the founding of the OCI.++It is not inconceivable that more reference implementations would be +contributed and supported within the OCI, but at this point, the only+active and viable reference implementation within the OCI is the `runc`+implementation of the runtime specification, based around the contributed+**libcontainer** codebase contributed by Docker.+ - [runc](https://github.com/opencontainers/runc)++Runc is also unique in that it is an open source reference implementation,+but also a core dependent ecosystem component underneath the majority of+container engine implementations in use today.++For any future reference implementation to be adopted by the OCI, it would+need to be kept in sync with the specification it implements. For a change+to be accepted to the spec, the equivalent implementation would need to be+authored and accepted as well.++## Should my Project be in the OCI?++The OCI receives proposals suggesting additions to the current suite+of project types listed above. We understand that a perfect framework+for determining inclusion or rejection of these proposals is an+intangible goal. However, we list the following considerations to help+guide future potential submissions.++ 1. The OCI, unlike the CNCF, is not directly chartered for the advancement, marketing, and support of general cloud native software projects.+ 2. Projects consumed via a UI and/or with a significant end user experience component are unlikely to be a good fit for the OCI model.+ 3. The OCI has an small but active and vibrant group of participants today; however the specifications and related projects are a small niche of the overall cloud native world, and seeking out the OCI to validate or grow a community around a young project is unlikely to be a viable model. The CNCF is much more suited to that aim given the sandbox and maturity model.

I think this should be merged into (2) as being part of the "boring infrastructure" point. In fact, I think most of the guidelines could be considered to be part of the "boring infrastructure" point -- maybe we should break it down like this:

The OCI receives proposals suggesting additions to the current suite of project types listed above. We understand that a perfect framework for determining inclusion or rejection of these proposals is an intangible goal. However, we list the following considerations to help guide the review of future project submissions.

  • The project should be a piece of "boring core container infrastructure". This is a partially subjective criterion, but the key factors that make something fulfill this requirement are:

    1. The project should be as un-opinionated and extensible as is reasonably possible.
    2. Rather than being a complete solution or framework, the project should be usable as a building-block for larger solutions and frameworks.
    3. (probably more things go here...)
  • It should fit logically into the scope and mission of the OCI -- to provide a home for open standards and tools which underpin the wider container ecosystem. This means that it should not conflict with existing OCI projects, nor should be it be completely unrelated to existing OCI projects. The precise scope of the OCI is defined by the OCI Charter, but the TOB may choose to expand the scope if they feel it is within the mission of the OCI.

estesp

comment created time in 4 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose+was to align Docker and other ecosystem players around a runtime and image+specification that would become an industry baseline for how images are packaged+and executed in OCI compliant runtimes.++Since that initial work, finalized in 1.0 specs in the summer of 2017, a+distribution specification is now underway, based on the HTTP API of the initial+Docker distribution (registry) project. In 2019, an "artifacts" project was+approved by the TOB and will be used to define other artifact types (in addition+to OCI images) which may be defined and stored in registries conforming to the+OCI distribution spec.++## Who uses the OCI?++It makes the most sense to see consumers of OCI as fitting into a few specific+categories: contributors/members, implementors, and end users.++**Contributors**, including the project maintainers, and member companies have a+vested interest in bringing forward the "state of the art" with respect to the+scope of the OCI. Currently this is of course limited to specifications around+runtime, image, and distribution of containers. The artifacts repository and+project is related to both image and distribution and is a natural expansion+of OCI into ongoing experimentation with registries and cloud native tooling.+Specifically, [artifacts](https://github.com/opencontainers/artifacts) expands the concept around the **what** when+discussing non-OCI image content (Helm charts, Singularity container images,+SPDX, source bundles, etc.) and registries.++**Implementors** own projects outside the OCI for which they have determined+value in being "OCI compliant"; whether that's a registry, a unique container+runtime, or an end-user tool that talks to registries. They take the OCI+specifications and implement support for them in their project, and potentially+will use the conformance suite within the OCI to validate their compliance to+the specification(s).++**End Users** tend to gain value from the OCI specifications in an indirect+way: they will expect projects and products that claim OCI compliance to+interoperate smoothly with other projects and products which are OCI compliant.+They will look to the OCI to continue maturing conformance and specifications+to best support the cloud native ecosystem's goal of interoperability for+runtimes, images, and distributing images and artifacts across clouds and+platforms.++## What Types of Projects Exist within the OCI?++There has been some growth in the nature and use of the OCI since those+initial meetings around the image and runtime specification in 2015. The+following subsections define project categories which exist in the OCI today.++### Specifications++Clearly the image, runtime, and distribution specifications are the key+reason for the existence of the OCI today. These standards are meant to+provide a baseline to follow for implementors of runtimes, registries, and+tools which interact with container images and registries.+ - [Runtime spec](https://github.com/opencontainers/runtime-spec)+ - [Image spec](https://github.com/opencontainers/image-spec)+ - [Distribution spec](https://github.com/opencontainers/distribution-spec)++### Spec Conformance Test++Conformance tests should provide a clear, end-user runnable test suite for+implementors to use to determine and demonstrate that an implementing project+or product meets the explicit definitions of the specification.+ - [OCI Conformance](https://github.com/opencontainers/oci-conformance)++The most advanced conformance implementation to date is for the new distribution specification. Additional work on image and runtime conformance is ongoing.++### Libraries ++While hosting library code is not a common goal of the OCI, there are a few+specific cases where small, non-end user focused, and tightly scoped libraries+have been accepted into the OCI. The common denominator for these libraries are+that they help implementors properly use features of the specifications:+ - [go-digest](https://github.com/opencontainers/go-digest)+ - [selinux](https://github.com/opencontainers/selinux)++Utilities and end-user UX-oriented code is most likely better targeted at other+more broad communities like the [CNCF](https://cncf.io). While there are not+explicit rules, a discussion with the TOB is warranted for projects looking to+contribute a library to OCI.++### Reference Implementations++While theoretically a specification can have one or more reference+implementations, the OCI runtime spec and the program, `runc`, have gone+hand in hand simply due to the particulars around the founding of the OCI.++It is not inconceivable that more reference implementations would be +contributed and supported within the OCI, but at this point, the only+active and viable reference implementation within the OCI is the `runc`+implementation of the runtime specification, based around the contributed+**libcontainer** codebase contributed by Docker.+ - [runc](https://github.com/opencontainers/runc)++Runc is also unique in that it is an open source reference implementation,+but also a core dependent ecosystem component underneath the majority of+container engine implementations in use today.

To be clear, I do have a conflict of interest when it comes to this section (as one of the proposals which triggered this entire discussion and review is #67 -- my proposal to include umoci as a reference implementation of the image-spec), so bear that in mind.

My main dislike of this section is the constant emphasis that runc is a special-case and the tone indicates (at least to me) that reference implementations are not welcome to the OCI. If this was the final text of this section, then I would be forced to vote against both #67 and #68 -- purely because this section is basically saying "any reference implementation other than runc is probably not meant to fit into OCI". And I don't think (at least based on our calls) that's what the TOB's stance on the issue is.

I think a better way of phrasing this section (without special-casing runc) would be something more along the lines of:

The OCI does not require that its specifications have reference implementations, and any project which aims to be adopted by the OCI as a reference implementation has to be judged on its own merits. Reference implementations should be generally usable as a building block for other programs (thus should not be "too opinionated") and should have their features limited to the set of features outlined by the relevant specification. In addition, the popularity of the project (especially in production use-cases) should be taken into consideration -- reference implementations should be battle-tested, after all.

The first (and currently only viable) reference implementation included in the OCI was runc, as a reference implementation of the OCI runtime specification. This project's inclusion is a slight oddity of history, having been a production-ready project predating the OCI's formation and was included in the OCI as part of the original draft runtime specification. However this strange history does not preclude the addition of any new reference implementations, merely that this is a process which is not well-established in the OCI.

estesp

comment created time in 4 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose+was to align Docker and other ecosystem players around a runtime and image+specification that would become an industry baseline for how images are packaged+and executed in OCI compliant runtimes.++Since that initial work, finalized in 1.0 specs in the summer of 2017, a+distribution specification is now underway, based on the HTTP API of the initial+Docker distribution (registry) project. In 2019, an "artifacts" project was+approved by the TOB and will be used to define other artifact types (in addition+to OCI images) which may be defined and stored in registries conforming to the+OCI distribution spec.++## Who uses the OCI?++It makes the most sense to see consumers of OCI as fitting into a few specific+categories: contributors/members, implementors, and end users.++**Contributors**, including the project maintainers, and member companies have a+vested interest in bringing forward the "state of the art" with respect to the+scope of the OCI. Currently this is of course limited to specifications around+runtime, image, and distribution of containers. The artifacts repository and+project is related to both image and distribution and is a natural expansion+of OCI into ongoing experimentation with registries and cloud native tooling.+Specifically, [artifacts](https://github.com/opencontainers/artifacts) expands the concept around the **what** when+discussing non-OCI image content (Helm charts, Singularity container images,+SPDX, source bundles, etc.) and registries.++**Implementors** own projects outside the OCI for which they have determined+value in being "OCI compliant"; whether that's a registry, a unique container+runtime, or an end-user tool that talks to registries. They take the OCI+specifications and implement support for them in their project, and potentially+will use the conformance suite within the OCI to validate their compliance to+the specification(s).++**End Users** tend to gain value from the OCI specifications in an indirect+way: they will expect projects and products that claim OCI compliance to+interoperate smoothly with other projects and products which are OCI compliant.+They will look to the OCI to continue maturing conformance and specifications+to best support the cloud native ecosystem's goal of interoperability for+runtimes, images, and distributing images and artifacts across clouds and+platforms.++## What Types of Projects Exist within the OCI?++There has been some growth in the nature and use of the OCI since those+initial meetings around the image and runtime specification in 2015. The+following subsections define project categories which exist in the OCI today.++### Specifications++Clearly the image, runtime, and distribution specifications are the key+reason for the existence of the OCI today. These standards are meant to+provide a baseline to follow for implementors of runtimes, registries, and+tools which interact with container images and registries.+ - [Runtime spec](https://github.com/opencontainers/runtime-spec)+ - [Image spec](https://github.com/opencontainers/image-spec)+ - [Distribution spec](https://github.com/opencontainers/distribution-spec)++### Spec Conformance Test++Conformance tests should provide a clear, end-user runnable test suite for+implementors to use to determine and demonstrate that an implementing project+or product meets the explicit definitions of the specification.+ - [OCI Conformance](https://github.com/opencontainers/oci-conformance)++The most advanced conformance implementation to date is for the new distribution specification. Additional work on image and runtime conformance is ongoing.++### Libraries ++While hosting library code is not a common goal of the OCI, there are a few+specific cases where small, non-end user focused, and tightly scoped libraries+have been accepted into the OCI. The common denominator for these libraries are+that they help implementors properly use features of the specifications:+ - [go-digest](https://github.com/opencontainers/go-digest)+ - [selinux](https://github.com/opencontainers/selinux)++Utilities and end-user UX-oriented code is most likely better targeted at other+more broad communities like the [CNCF](https://cncf.io). While there are not+explicit rules, a discussion with the TOB is warranted for projects looking to+contribute a library to OCI.++### Reference Implementations++While theoretically a specification can have one or more reference+implementations, the OCI runtime spec and the program, `runc`, have gone+hand in hand simply due to the particulars around the founding of the OCI.++It is not inconceivable that more reference implementations would be +contributed and supported within the OCI, but at this point, the only+active and viable reference implementation within the OCI is the `runc`+implementation of the runtime specification, based around the contributed+**libcontainer** codebase contributed by Docker.+ - [runc](https://github.com/opencontainers/runc)++Runc is also unique in that it is an open source reference implementation,+but also a core dependent ecosystem component underneath the majority of+container engine implementations in use today.++For any future reference implementation to be adopted by the OCI, it would+need to be kept in sync with the specification it implements. For a change+to be accepted to the spec, the equivalent implementation would need to be+authored and accepted as well.++## Should my Project be in the OCI?++The OCI receives proposals suggesting additions to the current suite+of project types listed above. We understand that a perfect framework+for determining inclusion or rejection of these proposals is an+intangible goal. However, we list the following considerations to help+guide future potential submissions.++ 1. The OCI, unlike the CNCF, is not directly chartered for the advancement, marketing, and support of general cloud native software projects.

I don't think you need to mention the CNCF here. We could just add it at the end of the list, as a general tip that the CNCF may better fit projects which don't match these criteria.

estesp

comment created time in 4 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose

With #77 this mission statement should be updated.

estesp

comment created time in 6 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as

While you do say that the OCI Charter is our governing document, I would include an explicit comment along the lines of "if there are any conflicts between this document and the OCI Charter, the Charter takes precedence". Another question is whether changing this document means changing the guidelines for admission of new projects -- should that require a 2/3 TOB vote (as amending the OCI Charter does)? Note that a 2-LGTM "vote" is less than is required for a project to be admitted.

estesp

comment created time in 6 days

pull request commentcontainers/crun

ebpf: replace the existing program on update

Also you mentioned you'd do a BPF_F_REPLACE check at runtime -- did you want to include that in this PR?

giuseppe

comment created time in 4 days

Pull request review commentcontainers/crun

ebpf: replace the existing program on update

 bpf_program_complete_dev (struct bpf_program *program, libcrun_error_t *err arg_   return program; } +static int+ebpf_attach_program (int fd, int dirfd, libcrun_error_t *err)+{+  const int MAX_ATTEMPTS = 20;+  int attempt;++  for (attempt = 0;; attempt++)+    {+      cleanup_close int replacefd = -1;+      union bpf_attr attr;+      uint32_t progs[2];+      int ret;++      memset (&attr, 0, sizeof (attr));+      attr.query.target_fd = dirfd;+      attr.query.attach_type = BPF_CGROUP_DEVICE;+      attr.query.prog_cnt = sizeof (progs) / sizeof (progs[0]);+      attr.query.prog_ids = (uint64_t) &progs;++      ret = bpf (BPF_PROG_QUERY, &attr, sizeof (attr));+      if (UNLIKELY (ret < 0))+        return crun_make_error (err, errno, "bpf query");++      if (attr.query.prog_cnt > 1)+        return crun_make_error (err, 0, "invalid device cgroup state, more than one program installed");++      if (attr.query.prog_cnt == 1)+        {+#ifdef BPF_F_REPLACE+          memset (&attr, 0, sizeof (attr));+          attr.prog_id = progs[0];

This logic (replacing whatever program is attached) and the corresponding limitations (not being able to replace a program if there is more than one) seems a bit fishy. I think a much better solution would be to either pin the program, or to store the id and load-time into some state file (which I assume crun has) and then to use that to fetch the original program.

To be fair, it's unlikely anybody else will be attaching BPF_CGROUP_DEVICE programs to a container but this seems a bit fishy to me.

giuseppe

comment created time in 4 days

Pull request review commentcontainers/crun

ebpf: replace the existing program on update

 bpf_program_complete_dev (struct bpf_program *program, libcrun_error_t *err arg_   return program; } +static int+ebpf_attach_program (int fd, int dirfd, libcrun_error_t *err)+{+  const int MAX_ATTEMPTS = 20;+  int attempt;++  for (attempt = 0;; attempt++)+    {+      cleanup_close int replacefd = -1;+      union bpf_attr attr;+      uint32_t progs[2];+      int ret;++      memset (&attr, 0, sizeof (attr));+      attr.query.target_fd = dirfd;+      attr.query.attach_type = BPF_CGROUP_DEVICE;+      attr.query.prog_cnt = sizeof (progs) / sizeof (progs[0]);+      attr.query.prog_ids = (uint64_t) &progs;++      ret = bpf (BPF_PROG_QUERY, &attr, sizeof (attr));+      if (UNLIKELY (ret < 0))+        return crun_make_error (err, errno, "bpf query");++      if (attr.query.prog_cnt > 1)+        return crun_make_error (err, 0, "invalid device cgroup state, more than one program installed");++      if (attr.query.prog_cnt == 1)+        {+#ifdef BPF_F_REPLACE+          memset (&attr, 0, sizeof (attr));+          attr.prog_id = progs[0];+          replacefd = bpf (BPF_PROG_GET_FD_BY_ID, &attr, sizeof (attr));+          if (UNLIKELY (replacefd < 0))+            {+              if (errno == ENOENT && attempt < MAX_ATTEMPTS)+                {+                  /* Another update might have raced and updated, try again.  */+                  continue;+                }+              return crun_make_error (err, errno, "cannot open existing eBPF program");+            }+#else+          return crun_make_error (err, 0, "eBPF program already configured");

If BPF_F_REPLACE isn't supported -- which is the case for most kernels -- you should insert the new program then remove the old one. It won't be atomic, but it'll be much better than this.

giuseppe

comment created time in 4 days

Pull request review commentopencontainers/runtime-spec

Add State status constants to spec-go

 package specs +const (+	// StateCreating indicates that the container is being created+	StateCreating = "creating"++	// StateCreated indicates that the runtime has finished the create operation+	StateCreated = "created"++	// StateRunning indicates that the container process has executed the+	// user-specified program but has not exited+	StateRunning = "running"++	// StateStopped indicates that the container process has exited+	StateStopped = "stopped"+)

The pause-related ones aren't in the runtime-spec because we don't have a pause operation in the spec (among other missing things, such as the lack of an exec operation). I'd say that we should only define the ones which are actually listed in the spec and in runc we can just add the ones we have.

RenaudWasTaken

comment created time in 5 days

delete branch cyphar/runc

delete branch : devices-cgroup-header

delete time in 5 days

push eventopencontainers/runc

Akihiro Suda

commit sha 5b601c66d05635548b3098e6bbfc66c3ec2b5bcc

README.md: fix a dead link Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Aleksa Sarai

commit sha 21cb2360b61ad5fe971413f046102bd70698b534

merge branch 'pr-2427' Akihiro Suda (1): README.md: fix a dead link LGTMs: @kolyshkin @cyphar Closes #2427

view details

push time in 5 days

PR merged opencontainers/runc

README.md: fix a dead link easy-to-review
+1 -1

2 comments

1 changed file

AkihiroSuda

pr closed time in 5 days

PR merged opencontainers/runc

README.md: fix a dead link easy-to-review
+1 -1

2 comments

1 changed file

AkihiroSuda

pr closed time in 5 days

pull request commentopencontainers/runc

README.md: fix a dead link

LGTM.

AkihiroSuda

comment created time in 5 days

delete branch opencontainers/runc

delete branch : add-cii-badge

delete time in 5 days

pull request commentopencontainers/tob

Simplify mission of OCI

Alternatively, we could have a draft branch which we make changes against and then we have a final merge after a 2/3 TOB vote.

caniszczyk

comment created time in 5 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) { 		return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs) 	} +	s := iConfig.SpecState+	if s != nil {+		s.Pid = unix.Getpid()

For now, yes. But please open a runtime-spec issue so we don't lose track of it.

RenaudWasTaken

comment created time in 5 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) { 		return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs) 	} +	s := iConfig.SpecState+	if s != nil {+		s.Pid = unix.Getpid()

Thinking about it a little more -- I think that's a bug in the spec. Yes, there isn't an exec primitive in the specification, but you can manually share the PID namespace which would mean that the hook run inside the container won't be able to make use of the PID in the state JSON.

RenaudWasTaken

comment created time in 6 days

pull request commentopencontainers/runtime-spec

seccomp: Add support for SCMP_ACT_KILL_PROCESS

LGTM.

pjbgf

comment created time in 6 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) { 		return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs) 	} +	s := iConfig.SpecState+	if s != nil {+		s.Pid = unix.Getpid()

Yeah that does make sense. It wouldn't hurt to mention it in the hook documentation.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Capabilities struct { 	Ambient []string } +func (hooks *Hooks) RunHooks(name HookName, spec *specs.State) error {+	hooksMap := map[HookName][]Hook{+		Prestart:        hooks.Prestart,+		CreateRuntime:   hooks.CreateRuntime,+		CreateContainer: hooks.CreateContainer,+		StartContainer:  hooks.StartContainer,+		Poststart:       hooks.Poststart,+		Poststop:        hooks.Poststop,+	}+	for i, hook := range hooksMap[name] {

Yup -- let's leave this for a separate PR.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/tob

Simplify mission of OCI

-# Open Container Initiative Charter v1.1+# Open Container Initiative Charter v1.2  | Comment          | Effective Date   | | ---------------- |:----------------:| | Initial release  | 13 November 2015 |-| Update           | 6 May 2020       |+| Update           | 18 May 2020       |  ##  1. Mission of the Open Container Initiative (“OCI”). -The Open Container Initiative provides an open source technical community within which industry participants may easily contribute to building vendor-neutral, portable and open specifications and runtime that deliver on the promise of containers as a source of application portability backed by a certification program.+The Open Container Initiative provides an open source technical community within which industry participants may easily contribute to building vendor-neutral, portable and open specifications, reference implementations, and tools that deliver on the promise of containers as a source of application portability backed by a certification program.

This is a good interim change, but we really should have a proper rewrite of some of the more substantial parts of the charter. I'll send a draft PR sometime this week with the sorts of changes I'd propose us having.

caniszczyk

comment created time in 7 days

Pull request review commentopencontainers/tob

Simplify mission of OCI

-# Open Container Initiative Charter v1.1+# Open Container Initiative Charter v1.2  | Comment          | Effective Date   | | ---------------- |:----------------:| | Initial release  | 13 November 2015 |-| Update           | 6 May 2020       |+| Update           | 18 May 2020       |

I think this table is meant to be appended to on each update. Actually the Update entry from the first update last week probably should've been

| v1.1 (update OCI scope)           | 6 May 2020        |
| v1.2 (further clarify OCI scope)  | 18 May 2020       |
caniszczyk

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Capabilities struct { 	Ambient []string } +func (hooks *Hooks) RunHooks(name HookName, spec *specs.State) error {

Also, this RunHooks helper is associated with the wrong structure IMHO. You could just make type HookList []Hook and then define a Run method for that. Then you could do

if err := hooks.Prestart.Run(state); err != nil {
    // ...
}

No need for config.HookName everywhere.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Capabilities struct { 	Ambient []string } +func (hooks *Hooks) RunHooks(name HookName, spec *specs.State) error {

Argument name should be state not spec.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Capabilities struct { 	Ambient []string } +func (hooks *Hooks) RunHooks(name HookName, spec *specs.State) error {+	hooksMap := map[HookName][]Hook{+		Prestart:        hooks.Prestart,+		CreateRuntime:   hooks.CreateRuntime,+		CreateContainer: hooks.CreateContainer,+		StartContainer:  hooks.StartContainer,+		Poststart:       hooks.Poststart,+		Poststop:        hooks.Poststop,+	}+	for i, hook := range hooksMap[name] {

Also, this RunHooks helper is unnecessary. You could just make type HookList []Hook and then define a Run method for that. Then you could do

if err := hooks.Prestart.Run(state); err != nil {
    // ...
}
RenaudWasTaken

comment created time in 7 days

pull request commentopencontainers/runc

Add integration tests for Oci hooks: CreateRuntime, CreateContainer and StartContainer

Yeah, tests should be added to the same PR as the feature (if nothing else, to show that the code works). Reviewing tests is much simpler than code changes because if the tests pass and they look like they actually test the feature, then they're fine.

RenaudWasTaken

comment created time in 7 days

pull request commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

Needs a rebase as well.

/cc @opencontainers/runc-maintainers

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) { 		return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs) 	} +	s := iConfig.SpecState+	if s != nil {+		s.Pid = unix.Getpid()+		s.Status = "creating"

Are these status strings really not defined as constants anywhere?

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func prepareRootfs(pipe io.ReadWriter, iConfig *initConfig) (err error) { 		return newSystemErrorWithCausef(err, "changing dir to %q", config.Rootfs) 	} +	s := iConfig.SpecState+	if s != nil {+		s.Pid = unix.Getpid()

This will always be equal to 1 (unless pid namespaces are disabled, obviously). Shouldn't this be the PID in the parent namespace -- which is already filled in? Then again, maybe the hook wants to know the PID of the container process in the namespace it's being run in? I'm not sure...

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Capabilities struct { 	Ambient []string } +func (hooks *Hooks) RunHooks(name HookName, spec *specs.State) error {+	hooksMap := map[HookName][]Hook{+		Prestart:        hooks.Prestart,+		CreateRuntime:   hooks.CreateRuntime,+		CreateContainer: hooks.CreateContainer,+		StartContainer:  hooks.StartContainer,+		Poststart:       hooks.Poststart,+		Poststop:        hooks.Poststop,+	}+	for i, hook := range hooksMap[name] {

Why not make type Hooks map[HookName][]Hook? This would remove the need for this (and similar) code. Normally we couldn't do this because it would break backwards-compatibility with the on-disk config format, but because we always represented Hooks as a JSON map we can do this without issue. That would also theoretically allow us to get rid of all of this serialisation code.

In fact, if you add the marshal/unmarshal interfaces to the Hook interface you could remove all of the translation code. I also really think that the whole CommandHook/FunctionHook/Hook thing needs to just be removed and make everything just be a Hook with the semantics of CommandHook -- we don't support direct usage of libcontainer and having these function hooks (which are ignored when serialised meaning they don't work properly) is a bit silly.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 func (p *initProcess) startTime() (uint64, error) { }  func (p *initProcess) sendConfig() error {+	if p.config.Config.Hooks != nil {

IMO it makes more sense just to always send it.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Config struct { type Hooks struct { 	// Prestart commands are executed after the container namespaces are created, 	// but before the user supplied command is executed from init.+	// Note: This hook is now deprecated+	// Prestart commands are called in the Runtime namespace. 	Prestart []Hook +	// CreateRuntime commands MUST be called as part of the create operation after+	// the runtime environment has been created but before the pivot_root has been executed.+	// CreateRuntime is called immediately after the deprecated Prestart hook.+	// CreateRuntime commands are called in the Runtime Namespace.+	CreateRuntime []Hook++	// CreateContainer commands MUST be called as part of the create operation after+	// the runtime environment has been created but before the pivot_root has been executed.+	// CreateContainer commands are called in the Container namespace.+	CreateContainer []Hook++	// StartContainer commands MUST be called as part of the start operation and before+	// the container process is started.+	// StartContainer commands are called in the Container namespace.+	StartContainer []Hook+ 	// Poststart commands are executed after the container init process starts.+	// Poststart commands are called in the Runtime Namespace. 	Poststart []Hook  	// Poststop commands are executed after the container init process exits.+	// Poststop commands are called in the Runtime Namespace. 	Poststop []Hook } +type HookName int++const (+	Prestart HookName = iota+	CreateRuntime+	CreateContainer+	StartContainer+	Poststart+	Poststop+)++var HookToName = map[HookName]string{+	Prestart:        "prestart",+	CreateRuntime:   "createRuntime",+	CreateContainer: "createContainer",+	StartContainer:  "startContainer",+	Poststart:       "poststart",+	Poststop:        "poststop",+}

It'd be simpler to just make type HookName string and define the constants as strings. That way you wouldn't need HookToName and if you ever do %v on a HookName it's obvious what the hook is.

RenaudWasTaken

comment created time in 7 days

Pull request review commentopencontainers/runc

Add CreateRuntime, CreateContainer and StartContainer Hooks

 type Config struct { type Hooks struct { 	// Prestart commands are executed after the container namespaces are created, 	// but before the user supplied command is executed from init.+	// Note: This hook is now deprecated+	// Prestart commands are called in the Runtime namespace. 	Prestart []Hook +	// CreateRuntime commands MUST be called as part of the create operation after+	// the runtime environment has been created but before the pivot_root has been executed.+	// CreateRuntime is called immediately after the deprecated Prestart hook.+	// CreateRuntime commands are called in the Runtime Namespace.+	CreateRuntime []Hook++	// CreateContainer commands MUST be called as part of the create operation after+	// the runtime environment has been created but before the pivot_root has been executed.+	// CreateContainer commands are called in the Container namespace.+	CreateContainer []Hook++	// StartContainer commands MUST be called as part of the start operation and before+	// the container process is started.+	// StartContainer commands are called in the Container namespace.+	StartContainer []Hook+ 	// Poststart commands are executed after the container init process starts.+	// Poststart commands are called in the Runtime Namespace. 	Poststart []Hook  	// Poststop commands are executed after the container init process exits.+	// Poststop commands are called in the Runtime Namespace. 	Poststop []Hook } +type HookName int++const (+	Prestart HookName = iota+	CreateRuntime+	CreateContainer+	StartContainer+	Poststart+	Poststop+)++var HookToName = map[HookName]string{+	Prestart:        "prestart",+	CreateRuntime:   "createRuntime",+	CreateContainer: "createContainer",+	StartContainer:  "startContainer",+	Poststart:       "poststart",+	Poststop:        "poststop",+}

It'd be simpler to just make type HookName string and define the constants as strings.

RenaudWasTaken

comment created time in 7 days

pull request commentopencontainers/runtime-spec

MAINTAINERS: Add @cyphar as maintainer

For maintainer additions, you need a 2/3rd vote. Namely, we need 4 of the following folks to LGTM it:

  • [ ] @crosbymichael
  • [x] @mrunalp
  • [ ] @vbatts
  • [ ] @dqminh
  • [ ] @tianon
  • [x] @hqhq
giuseppe

comment created time in 8 days

pull request commentopencontainers/runc

cgroups: add copyright header to devices.Emulator implementation

Looks good, but why not add the header to all the files

It's unclear who the copyright holders of the other files are. And unless we want to do the Copyright runc authors thing (which I'm not sure is entirely correct from a legal perspective -- copyright is held by individuals or companies, not loose collections of people) we'd need to dig through git and sort it out.

cyphar

comment created time in 9 days

push eventcyphar/dotfiles

Aleksa Sarai

commit sha f48cc22418c857e814a96360b4053becb9660b3d

nvim: add <C-a> and <C-x> wrappers Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

view details

push time in 10 days

fork cyphar/cgroups

cgroups package for Go

https://containerd.io

fork in 11 days

pull request commentopencontainers/runc

Release rc11

@crosbymichael I guess it's time for me to port the devices.Emulator changes to that. :wink:

mrunalp

comment created time in 11 days

create barnchcyphar/runc

branch : devices-cgroup-header

created branch time in 11 days

PR opened opencontainers/runc

cgroups: add copyright header to devices.Emulator implementation

I forgot to include this in the original patchset.

Signed-off-by: Aleksa Sarai asarai@suse.de

+36 -0

0 comment

2 changed files

pr created time in 11 days

Pull request review commentopencontainers/runc

fix two swap limit regressions in cgroup v2

 func ConvertCPUQuotaCPUPeriodToCgroupV2Value(quota int64, period uint64) string // for use by cgroup v2 drivers. A conversion is needed since Resources.MemorySwap // is defined as memory+swap combined, while in cgroup v2 swap is a separate value. func ConvertMemorySwapToCgroupV2Value(memorySwap, memory int64) (int64, error) {+	// If the memory update is set to -1 we should also+	// set swap to -1, it means unlimited memory.+	if memory == -1 {

The issue is that the behaviour of the two cgroup versions is different -- but the meaning of MemorySwap in the runtime-spec is actually the original cgroupv1 meaning of memory.memsw.limit_in_bytes (which is the total memory + swap, as opposed to memory.swap.max in cgroupv2 which is just swap usage). If we'd done runtime-spec a little differently this wouldn't be such a headache. :wink:

However, the problem is that requesting {Memory: -1, MemorySwap: N} is an illogical request -- by the definition of the runtime-spec you're asking for unlimited memory but limited memory+swap. In cgroupv1 there wasn't a way to just limit swap usage, so for some reason the runc implementation ignores that this request is invalid and sets both options to be unlimited. I personally think giving an error is much more reasonable (and that we should see if changing cgroupv1 to give an error will cause issues with Docker or podman).

I agree with @lifubang that we should have the same behaviour (as much as we can) when we are converting cgroupv1 options to cgroupv2 settings, because not doing that will cause a lot more headaches in the future.

lifubang

comment created time in 11 days

Pull request review commentopencontainers/runc

fix two swap limit regressions in cgroup v2

 func ConvertCPUQuotaCPUPeriodToCgroupV2Value(quota int64, period uint64) string // for use by cgroup v2 drivers. A conversion is needed since Resources.MemorySwap // is defined as memory+swap combined, while in cgroup v2 swap is a separate value. func ConvertMemorySwapToCgroupV2Value(memorySwap, memory int64) (int64, error) {+	// If the memory update is set to -1 we should also+	// set swap to -1, it means unlimited memory.+	if memory == -1 {

The behaviour of the v1 memory controller is different from v2. In that in v1 "MemorySwap" means memory+swap, while in v2 MemorySwap actually refers to just the amount of swap used. It therefore makes perfect sense that the behaviour for the configuration will be different -- in fact in v1 it's not possible to block a cgroup from using swap (unless you set swappiness=0 in the cgroup and even then I'm not sure).

I think @kolyshkin is right.

lifubang

comment created time in 11 days

pull request commentopencontainers/runc

Release rc11

@giuseppe @derekwaynecarr @mrunalp

Regarding my comments above about libcontainer, I do get that libcontainer/cgroups (and libcontainer/user to a lesser extent) are "special" in that folks really do use them and we should support them better. But I think splitting these out into separate libraries really would help us have a clearer picture that libcontainer is something entirely internal to runc and shouldn't be used by anymore who actually wants proper support from us. It would also allow us to version them properly.

Because in my view, the scope of SemVer on runc is purely in terms of the command-line API because that is the way the project is meant to be used. I would prefer if we could actually make that clear without freaking out everyone who depends on some of our internal components (including Kubernetes). However right now the TOB has our hands full with cleaning up the OCI Charter, and we should be pushing to 1.0 with as few distractions as possible. So we can table this discussion until we get past 1.0, IMHO.

mrunalp

comment created time in 12 days

pull request commentopencontainers/runc

Release rc11

The issue is that we have (loosely speaking) an actual OCI Charter requirement to have runc 1.0 be a correct reference implementation of the runtime-spec. Obviously bugs may exist (and in fact, I found one in #2391 an fixed it), but releasing a post-1.0 runc which we know to not implement the spec is acting in bad faith when it comes to our responsibilities within the OCI. Believe me, if that wasn't a requirement I believe we would've released 1.0 about 3 years ago, and we'd be on runc 3.54 or something now.

Note that the hooks issue has been the only blocking issue for 1.0 support for the past 2 years -- we've known about it since #1811. It's just that it took us this long to actually get to a PR being ready. If the hooks issue hadn't come up, we would've released 1.0 back then. And as for any additional features, I actually don't care at all if cgroupv2 works in runc 1.0. I will call a vote for the next runc release the minute we merge #2229.

And yes, I know that releasing 1.0.1-rc.1 technically means we didn't release a 1.0.0 but I'm not sure I buy that argument completely.

mrunalp

comment created time in 12 days

Pull request review commentopencontainers/tob

Update charter to remove scope table

-# Open Container Initiative Charter v1.0+# Open Container Initiative Charter v1.1  Effective 13 November 2015+Updated 6 May 2020  ##  1. Mission of the Open Container Initiative (“OCI”). -The Open Container Initiative provides an open source technical community within which industry participants may easily contribute to building a vendor-neutral, portable and open specification and runtime that deliver on the promise of containers as a source of application portability backed by a certification program.+The Open Container Initiative provides an open source technical community within which industry participants may easily contribute to building vendor-neutral, portable and open specifications and runtime that deliver on the promise of containers as a source of application portability backed by a certification program.

I believe that (as usual), the "and runtime" was probably just a carve-out for runc. As discussed in the call, I think we need to rethink how we describe the OCI fairly fundamentally -- I like the idea of it being described place for "boring white-label infrastructure" to empower the container ecosystem.

caniszczyk

comment created time in 12 days

issue commentopencontainers/runc

`runc exec [tty consolesize]` fails locally (but passing on CI)

I would guess the reason why we hit this with bats in particular is because of the large amount of awful shell nesting done inside bats -- which is what the original reproducer did.

AkihiroSuda

comment created time in 12 days

delete branch cyphar/runc

delete branch : devices-cgroup

delete time in 12 days

pull request commentopencontainers/runc

Release rc11

At the moment, I'm not sure I'm willing to consider libcontainer as something people should be using outside of runc. So much of it is incredibly internal stuff, and in order to use it you'd need to reimplement most of the code we have in the main runc package.

mrunalp

comment created time in 12 days

pull request commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

LGTM.

lifubang

comment created time in 13 days

Pull request review commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

 func CreateCgroupPath(path string, c *configs.Cgroup) (Err error) { 					} 				}() 			}-			// Write cgroup.type explicitly.-			// Otherwise ENOTSUP may happen.-			cgType := filepath.Join(current, "cgroup.type")-			_ = ioutil.WriteFile(cgType, []byte("threaded"), 0644)+			// There are 4 types: 'domain', 'threaded', 'domain threaded', and 'domain invalid'.+			// If got 'domain invalid', we should check whether the current config contains domain controller or not.+			// If it does not contain domain controller, we can write "threaded" without retuning error.+			cgTypeFile := filepath.Join(current, "cgroup.type")+			cgType, _ := ioutil.ReadFile(cgTypeFile)+			if strings.TrimSpace(string(cgType)) == "domain invalid" {

Ah okay, thanks for testing it.

lifubang

comment created time in 13 days

Pull request review commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

 func CreateCgroupPath(path string, c *configs.Cgroup) (Err error) { 					} 				}() 			}-			// Write cgroup.type explicitly.-			// Otherwise ENOTSUP may happen.-			cgType := filepath.Join(current, "cgroup.type")-			_ = ioutil.WriteFile(cgType, []byte("threaded"), 0644)+			// There are 4 types: 'domain', 'threaded', 'domain threaded', and 'domain invalid'.+			// If got 'domain invalid', we should check whether the current config contains domain controller or not.+			// If it does not contain domain controller, we can write "threaded" without retuning error.+			cgTypeFile := filepath.Join(current, "cgroup.type")+			cgType, _ := ioutil.ReadFile(cgTypeFile)+			if strings.TrimSpace(string(cgType)) == "domain invalid" {

I think we can safely use domain controllers in a domain threaded cgroup because it's the cgroup above a threaded cgroup. But honestly I'm not too sure -- I need to set up an Ubuntu 20.04 VM to test this stuff...

lifubang

comment created time in 13 days

pull request commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

I've rebased and cleaned up the commit message for the DeviceAllow commit to better clarify that it was a security issue but since no released version of runc included it, it didn't deserve a proper security advisory.

cyphar

comment created time in 13 days

push eventcyphar/runc

Aleksa Sarai

commit sha b810da149008f1d7d07f481970d40c0d035958af

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad and was a clear security bug. Luckily the commits which introduced this were never in an actual runc release. So we simply generate the cgroupv1-style rules (which is what systemd's DeviceAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DeviceAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: afe83489d424 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Fixes: 1d4ccc8e0caa ("fix data inconsistent when runc update in systemd driven cgroup v1") Fixes: 7682a2b2a575 ("fix data inconsistent when runc update in systemd driven cgroup v2") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 4438eaa5e45aa8d7f1bc109ae5d43760153776ea

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha ba6eb282294dab76d1015c6f90ff3e3849256e49

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 13 days

push eventcyphar/runc

Kir Kolyshkin

commit sha f0daf65100dae51ff31760c0f1f184ff2adcb33e

Vagrantfile: use criu from stable repo CRIU 3.14 has made its way to the F32 stable repo, let's use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

view details

Pradyumna Agrawal

commit sha 4aa9101477fb539abfb30d3385e9fc79e8f879e0

Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig The change ensures that the passed in value of NoNewPrivileges under spec.Process is reflected in the container config generated by specconv.CreateLibcontainerConfig Closes #2397 Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>

view details

Akihiro Suda

commit sha 2b9a36ee8c82b580f09ed853dc72388aadca9ea9

Merge pull request #2398 from pkagrawal/master Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig

view details

Kir Kolyshkin

commit sha 17aee8c432269cf477d274a2e0ad2f178e0f453e

Dockerfile: bump bats to 1.2.0 Release notes: https://github.com/bats-core/bats-core/releases/tag/v1.2.0 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

view details

Akihiro Suda

commit sha 58bf08350087a19fce5b58e69acbc9605ac9ee84

Merge pull request #2400 from kolyshkin/bats-1.2.0 Dockerfile: bump bats to 1.2.0

view details

Mrunal Patel

commit sha df3d7f673aff8344c2b9d0747501f1002af136bb

Merge pull request #2393 from kolyshkin/criu-pi Vagrantfile: use criu from stable repo

view details

Aleksa Sarai

commit sha a79fa7caa08cffc7e01bbad75f273b643206c0f1

contrib: recvtty: add --no-stdin flag This is mostly just useful for testing with the "single" mode, since it allows you to run recvtty in the background without the console being closed. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 859a780d6f326dd3751893620b3198dd55af4ac1

cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha b2bec9806f999f55c0d0f9ce1b1064ffbc62f4a9

cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 60e21ec26e15945259d4b1e790e8fd119ee86467

specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 24388be71e1aef7facd0d78dda22e696c1694272

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 2353ffec2bb670a200009dc7a54a56b93145f141

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: b2bec9806f99 ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha afe83489d4243800cb68b5c766831a4ceaff4578

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha b1e7a036d0c9e4f8df09f1ec1953c4ee787c5d48

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad and was a clear security bug. Luckily the commits which introduced this were never in an actual runc release. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: afe83489d424 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Fixes: 1d4ccc8e0caa ("fix data inconsistent when runc update in systemd driven cgroup v1") Fixes: 7682a2b2a575 ("fix data inconsistent when runc update in systemd driven cgroup v2") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 965e77ea81a4728b311009b9c754b6e66872cd74

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 560f446c63289f4b2a74c7b21317879182accd0a

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 13 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

My vote is v1.0.0-really-sorry-but-still-an-rc.11. In all seriousness, I'll bring this up at the OCI call tomorrow.

thaJeztah

comment created time in 13 days

pull request commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

/cc @kolyshkin

cyphar

comment created time in 13 days

push eventcyphar/runc

Aleksa Sarai

commit sha 0d35c9306c3de2c583454fcd49d68c828631b9bb

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha eaeba27d2b16185cc5c3f308da388453ebe2c719

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: 4d363ccbd9cc ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha c89ffa8fb94383dbb8c21561731164a1fb820f3a

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha b9ef9da5e6002bca2a83e42613797479ca3dd7cd

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: b70aacc28f43 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 2c59d29d9bdab9a9c37cc9db840371cc30d4d7bb

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 998482935a2c952755cb0d03c4b3218abd880b5b

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 13 days

create barnchcyphar/runc

branch : bsc1168481-runc

created branch time in 13 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

I mean, let's just admit it and call it 1.0.0-runc.11 ;)

thaJeztah

comment created time in 13 days

Pull request review commentopencontainers/tob

Draft of a "getting started" document for the OCI

+# Getting Started++While the [OCI Charter](./CHARTER.md) is our official governing document, many+people have asked for a gentler introduction to the Open Container Initiative as+well as general guidance on how to interact with the projects and specifications+found under the OCI umbrella. This document attempts to be that starting point+for those who may be new to the OCI, interested in participation, or who want to+understand if a project is a fit for inclusion or contribution to the OCI.++## What is the OCI?++As our [website](https://opencontainers.org) states, chiefly the OCI is "an open+governance structure for the express purpose of creating open industry standards+around container formats and runtime." Created in 2015, the core initial purpose+was to align Docker and other ecosystem players around a runtime and image+specification that would become an industry baseline for how images are packaged+and executed in OCI compliant runtimes.++Since that initial work, finalized in 1.0 specs in the summer of 2017, a+distribution specification is now underway, based on the HTTP API of the initial+Docker distribution (registry) project. In 2019, an "artifacts" project was+approved by the TOB and will be used to define other artifact types (in addition+to OCI images) which may be defined and stored in registries conforming to the+OCI distribution spec.++## Who uses the OCI?++It makes the most sense to see consumers of OCI as fitting into a few specific+categories: contributors/members, implementors, and end users.++**Contributors**, including the project maintainers, and member companies have a+vested interest in bringing forward the "state of the art" with respect to the+scope of the OCI. Currently this is of course limited to specifications around+runtime, image, and distribution of containers. The artifacts repository and+project is related to both image and distribution and is a natural expansion+of OCI into ongoing experimentation with registries and cloud native tooling.+Specifically, [artifacts](https://github.com/opencontainers/artifacts) expands the concept around the **what** when+discussing non-OCI image content (Helm charts, Singularity container images,+SPDX, source bundles, etc.) and registries.++**Implementors** own projects outside the OCI for which they have determined+value in being "OCI compliant"; whether that's a registry, a unique container+runtime, or an end-user tool that talks to registries. They take the OCI+specifications and implement support for them in their project, and potentially+will use the conformance suite within the OCI to validate their compliance to+the specification(s).++**End Users** tend to gain value from the OCI specifications in an indirect+way: they will expect projects and products that claim OCI compliance to+interoperate smoothly with other projects and products which are OCI compliant.+They will look to the OCI to continue maturing conformance and specifications+to best support the cloud native ecosystem's goal of interoperability for+runtimes, images, and distributing images and artifacts across clouds and+platforms.++## What Types of Projects Exist within the OCI?++There has been some growth in the nature and use of the OCI since those+initial meetings around the image and runtime specification in 2015. The+following subsections define project categories which exist in the OCI today.++### Specifications++Clearly the image, runtime, and distribution specifications are the key+reason for the existence of the OCI today. These standards are meant to+provide a baseline to follow for implementors of runtimes, registries, and+tools which interact with container images and registries.+ - [Runtime spec](https://github.com/opencontainers/runtime-spec)+ - [Image spec](https://github.com/opencontainers/image-spec)+ - [Distribution spec](https://github.com/opencontainers/distribution-spec)++### Spec Conformance Test++Conformance tests should provide a clear, end-user runnable test suite for+implementors to use to determine and demonstrate that an implementing project+or product meets the explicit definitions of the specification.+ - [OCI Conformance](https://github.com/opencontainers/oci-conformance)++The most advanced conformance implementation to date is for the new distribution specification. Additional work on image and runtime conformance is ongoing.++### Libraries ++While hosting library code is not a common goal of the OCI, there are a few+specific cases where small, non-end user focused, and tightly scoped libraries+have been accepted into the OCI. The common denominator for these libraries are+that they help implementors properly use features of the specifications:+ - [go-digest](https://github.com/opencontainers/go-digest)+ - [selinux](https://github.com/opencontainers/selinux)++Utilities and end-user UX-oriented code is most likely better targeted at other+more broad communities like the [CNCF](https://cncf.io). While there are not+explicit rules, a discussion with the TOB is warranted for projects looking to+contribute a library to OCI.++### Reference Implementations++While theoretically a specification can have one or more reference+implementations, the OCI runtime spec and the program, `runc`, have gone+hand in hand simply due to the particulars around the founding of the OCI.++It is not inconceivable that more reference implementations would be +contributed and supported within the OCI, but at this point, the only+active and viable reference implementation within the OCI is the `runc`+implementation of the runtime specification, based around the contributed+**libcontainer** codebase contributed by Docker.+ - [runc](https://github.com/opencontainers/runc)++Runc is also unique in that it is an open source reference implementation,+but also a core dependent ecosystem component underneath the majority of+container engine implementations in use today.++For any future reference implementation to be adopted by the OCI, it would+need to be kept in sync with the specification it implements. For a change+to be accepted to the spec, the equivalent implementation would need to be

Maybe, but there is a necessary balance here.

It would be a mistake to accept changes into a specification if they haven't been tested "in the wild" (or at the very least, implemented at least once). The image-spec had this exact problem -- no reference implementation (at least, until umoci) made it so that we had to do some pretty large reworks during the release-candidate phase of 1.0.

That being said, I think that we made too much of a concession to runc in the runtime-spec (though I would argue this is also historically driven -- runc was ready and in wide use quite a while before the spec was). There are many things in runc which really should be in the spec, and I am constantly frustrated to see that many implementations of the runtime-spec spend more of their time reading the runc source code than they do the spec.

Maybe we should make it so that no release of the project should have any major not-in-the-spec features, which would allow for experimentation and development without risking people becoming dependent on the feature. Unfortunately runc's behaviour doesn't really break this rule (because it breaks the entire concept of releases given that we've been doing 1.0 release candidates for 4 years now) -- but I think it's fair to say that this is something which no other OCI project will or should repeat.

estesp

comment created time in 13 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

Or if we can release the v1.0.0 in a month, I think we can just close this issue and call it a day.

The issue is our existing release (rc10) is already causing this issue. At the very least we should have an rc90 alias purely to stop people from downgrading their runc go modules.

thaJeztah

comment created time in 13 days

issue commentopencontainers/runc

[proposal/rfc] Use github-native review process, say goodbye to pullapprove

Oh, in that case I'm 100% on board for this. I'm not sure if we need formal LGTMs for something like this, I could go enable it now.

kolyshkin

comment created time in 13 days

Pull request review commentopencontainers/runtime-spec

seccomp: fix go-specs for errnoRet

 type LinuxSeccompArg struct { type LinuxSyscall struct { 	Names    []string           `json:"names"` 	Action   LinuxSeccompAction `json:"action"`-	ErrnoRet uint               `json:"errno"`+	ErrnoRet *uint              `json:"errnoRet,omitempty"`

All good, it's definitely not the intended use for seccomp. ;)

giuseppe

comment created time in 13 days

pull request commentopencontainers/tob

Update charter to remove scope table

LGTM.

caniszczyk

comment created time in 13 days

pull request commentopencontainers/runtime-spec

seccomp: fix go-specs for errnoRet

LGTM.

giuseppe

comment created time in 13 days

Pull request review commentopencontainers/runtime-spec

seccomp: fix go-specs for errnoRet

 type LinuxSeccompArg struct { type LinuxSyscall struct { 	Names    []string           `json:"names"` 	Action   LinuxSeccompAction `json:"action"`-	ErrnoRet uint               `json:"errno"`+	ErrnoRet *uint              `json:"errnoRet,omitempty"`

Yeah, it's critical that it be clear whether errnoRet was unset (-EPERM) or 0 (a "success" but the syscall isn't run). As an aside, this trick is used for the relatively-new seccomp syscall emulation.

giuseppe

comment created time in 13 days

Pull request review commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

 func CreateCgroupPath(path string, c *configs.Cgroup) (Err error) { 					} 				}() 			}-			// Write cgroup.type explicitly.-			// Otherwise ENOTSUP may happen.-			cgType := filepath.Join(current, "cgroup.type")-			_ = ioutil.WriteFile(cgType, []byte("threaded"), 0644)+			// There are 4 types: 'domain', 'threaded', 'domain threaded', and 'domain invalid'.+			// If got 'domain invalid', we should check whether the current config contains domain controller or not.+			// If it does not contain domain controller, we can write "threaded" without retuning error.+			cgTypeFile := filepath.Join(current, "cgroup.type")+			cgType, _ := ioutil.ReadFile(cgTypeFile)+			if strings.TrimSpace(string(cgType)) == "domain invalid" {

You can do it right here, maybe something more like this (I've also cleaned up the error messages and comments):

cgTypeFile := filepath.Join(current, "cgroup.type")
cgType, _ := ioutil.ReadFile(cgTypeFile)
switch strings.TrimSpace(string(cgType)) {
// If the cgroup is in an invalid mode (usually this means there's an internal
// process in the cgroup tree, because we created a cgroup under an
// already-populated-by-other-processes cgroup), then we have to error out if
// the user requested controllers which are not thread-aware. However, if all
// the controllers requested are thread-aware we can simply put the cgroup into
// threaded mode.
case "domain invalid":
	if containsDomainController(c) {
		return fmt.Errorf("cannot enter cgroupv2 %q with domain controllers -- it is in an invalid state", current)
	} else {
		// Not entirely correct (in theory we'd always want to be a domain --
		// since that means we're a properly delegated cgroup subtree) but in
		// this case there's not much we can do and it's better than giving an
		// error.
		_ = ioutil.WriteFile(cgTypeFile, []byte("threaded"), 0644)
	}
// If the cgroup is in threaded mode, we can only use thread-aware controllers
// (and you cannot usually take a cgroup out of threaded mode).
case "threaded":
	if containsDomainController(c) {
		return fmt.Errorf("cannot enter cgroupv2 %q with domain controllers -- it is in threaded mode", current)
	}
}
lifubang

comment created time in 13 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

@opencontainers/runc-maintainers Are there any objections to the above plan? I've already got an -rc90 tag ready and waiting.

thaJeztah

comment created time in 13 days

delete tag opencontainers/runc

delete tag : v1.0.0-rc90

delete time in 14 days

created tagopencontainers/runc

tagv1.0.0-rc90

CLI tool for spawning and running containers according to the OCI specification

created time in 14 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

That should work; assuming we're confident that we don't reach -rc100 grimacing as that would be lower than -rc2 again

That's when we roll out rc990. But I agree, let's hope it never comes to that.

thaJeztah

comment created time in 14 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

That sounds scary (and would break anyone that's already using on of these tags).

Ah, right. Well then there's always the nuclear option -- tag 1.0.0-rc90 right now as being the same as 1.0.0-rc10 and then make rc11 be rc91. That way we don't need to skip over 1.0.0 and we don't touch the old releases -- it's just that the rc10 release becomes defunct as everyone uses the rc90 release instead.

thaJeztah

comment created time in 14 days

push eventcyphar/runc

Aleksa Sarai

commit sha 21958f532932feb42ef680bb022a5845fec11efd

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 14 days

pull request commentopencontainers/runc

Release rc11

@thaJeztah Don't worry, we're definitely not going to match releases -- for the reasons you've outlined as well as many others (one being that we would be stuck in rc releases for long periods because runtime-spec releases are few and far between -- this was actually the original reason why we stalled 1.0.0 ~3 years ago).

mrunalp

comment created time in 14 days

Pull request review commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

 function fail() { # support it, the test is skipped with a message. function requires() { 	for var in "$@"; do+		skip= 		case $var in 		criu) 			if [ ! -e "$CRIU" ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		root) 			if [ "$ROOTLESS" -ne 0 ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless) 			if [ "$ROOTLESS" -eq 0 ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_idmap) 			if [[ "$ROOTLESS_FEATURES" != *"idmap"* ]]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_cgroup) 			if [[ "$ROOTLESS_FEATURES" != *"cgroup"* ]]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_no_cgroup) 			if [[ "$ROOTLESS_FEATURES" == *"cgroup"* ]]; then-				skip "test requires ${var}"+				skip=1+			fi+			;;+		cgroups_freezer)+			init_cgroup_paths+			if [[ "$CGROUP_SUBSYSTEMS" != *"freezer"* ]]; then+				skip=1 			fi 			;; 		cgroups_kmem) 			init_cgroup_paths 			if [ ! -e "${CGROUP_MEMORY_BASE_PATH}/memory.kmem.limit_in_bytes" ]; then-				skip "Test requires ${var}"+				skip=1 			fi 			;; 		cgroups_rt) 			init_cgroup_paths 			if [ ! -e "${CGROUP_CPU_BASE_PATH}/cpu.rt_period_us" ]; then-				skip "Test requires ${var}"+				skip=1 			fi 			;; 		cgroups_v1) 			init_cgroup_paths 			if [ "$CGROUP_UNIFIED" != "no" ]; then-				skip "Test requires cgroups v1"+				skip=1 			fi 			;; 		cgroups_v2) 			init_cgroup_paths 			if [ "$CGROUP_UNIFIED" != "yes" ]; then-				skip "Test requires cgroups v2 (unified)"+				skip=1 			fi 			;; 		systemd) 			if [ -z "${RUNC_USE_SYSTEMD}" ]; then-				skip "Test requires systemd"+				skip=1 			fi 			;; 		no_systemd) 			if [ -n "${RUNC_USE_SYSTEMD}" ]; then-				skip "Test requires no systemd"+				skip=1 			fi 			;; 		*)-			fail "BUG: Invalid requires ${var}."+			fail "BUG: Invalid requires $var." 			;; 		esac+		if [ -n "$skip" ]; then+			skip "test requires $var"

Done.

cyphar

comment created time in 14 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

My suggestion would be to just rename all of the releases -- since this only really affects go modules we just need to change the names of the tags (distributions don't care because they use sort -V order).

I'm not sure if it's a silly technicality at this point; projects getting an older version than expected looks like an actual problem to me.

I wasn't trying to say it wasn't a problem, just that releasing a post-1.0.0 release over this when compared to all of the other reasons why we've had to delay 1.0.0 does make this reason look like a "silly technicality" in comparison. I agree we should fix it.

thaJeztah

comment created time in 14 days

push eventcyphar/runc

Aleksa Sarai

commit sha e4a128237e6997029f60274fabb4daf78e3fe288

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 14 days

Pull request review commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

 function fail() { # support it, the test is skipped with a message. function requires() { 	for var in "$@"; do+		skip= 		case $var in 		criu) 			if [ ! -e "$CRIU" ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		root) 			if [ "$ROOTLESS" -ne 0 ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless) 			if [ "$ROOTLESS" -eq 0 ]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_idmap) 			if [[ "$ROOTLESS_FEATURES" != *"idmap"* ]]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_cgroup) 			if [[ "$ROOTLESS_FEATURES" != *"cgroup"* ]]; then-				skip "test requires ${var}"+				skip=1 			fi 			;; 		rootless_no_cgroup) 			if [[ "$ROOTLESS_FEATURES" == *"cgroup"* ]]; then-				skip "test requires ${var}"+				skip=1+			fi+			;;+		cgroups_freezer)+			init_cgroup_paths+			if [[ "$CGROUP_SUBSYSTEMS" != *"freezer"* ]]; then+				skip=1 			fi 			;; 		cgroups_kmem) 			init_cgroup_paths 			if [ ! -e "${CGROUP_MEMORY_BASE_PATH}/memory.kmem.limit_in_bytes" ]; then-				skip "Test requires ${var}"+				skip=1 			fi 			;; 		cgroups_rt) 			init_cgroup_paths 			if [ ! -e "${CGROUP_CPU_BASE_PATH}/cpu.rt_period_us" ]; then-				skip "Test requires ${var}"+				skip=1 			fi 			;; 		cgroups_v1) 			init_cgroup_paths 			if [ "$CGROUP_UNIFIED" != "no" ]; then-				skip "Test requires cgroups v1"+				skip=1 			fi 			;; 		cgroups_v2) 			init_cgroup_paths 			if [ "$CGROUP_UNIFIED" != "yes" ]; then-				skip "Test requires cgroups v2 (unified)"+				skip=1 			fi 			;; 		systemd) 			if [ -z "${RUNC_USE_SYSTEMD}" ]; then-				skip "Test requires systemd"+				skip=1 			fi 			;; 		no_systemd) 			if [ -n "${RUNC_USE_SYSTEMD}" ]; then-				skip "Test requires no systemd"+				skip=1 			fi 			;; 		*)-			fail "BUG: Invalid requires ${var}."+			fail "BUG: Invalid requires $var." 			;; 		esac+		if [ -n "$skip" ]; then+			skip "test requires $var"

Sure. The reason they don't conflict is that variables and functions are in different namespaces.

cyphar

comment created time in 14 days

pull request commentopencontainers/runc

Release rc11

@h-vetinari I would put incredibly strong emphasis on the word "former" in "former plan". As soon as we release 1.0.0 GA, we're going to completely deviate from the existing versioning and release scheme because it's been an absolute nightmare.

mrunalp

comment created time in 14 days

issue commentopencontainers/runc

Tag upcoming release as v1.0.1-rc.1 to fix go modules / SemVer comparison

While this is a correct implementation of SemVer, and it does match the behaviour described in the spec (though it is quite strange since I've never seen 1.x.y-rc.z in any project, despite many projects using SemVer -- and sort -V does work properly):

Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found as follows: identifiers consisting of only digits are compared numerically and identifiers with letters or hyphens are compared lexically in ASCII sort order. Numeric identifiers always have lower precedence than non-numeric identifiers. A larger set of pre-release fields has a higher precedence than a smaller set, if all of the preceding identifiers are equal. Example: 1.0.0-alpha < 1.0.0-alpha.1 < 1.0.0-alpha.beta < 1.0.0-beta < 1.0.0-beta.2 < 1.0.0-beta.11 < 1.0.0-rc.1 < 1.0.0.

I don't think doing a 1.0.1-rc.1 release is reasonable. Honestly, I'd be happier to go back and rename all of the releases (or make rc10 rc90) than to skip over 1.0.0 over such a silly technicality...

thaJeztah

comment created time in 14 days

pull request commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

Rebased, now that the GetPaths changes have been merged.

/cc @opencontainers/runc-maintainers

cyphar

comment created time in 14 days

push eventcyphar/runc

Kir Kolyshkin

commit sha 714c91e9f73a1512808476eb532b4aa36bbb7530

Simplify cgroup path handing in v2 via unified API This unties the Gordian Knot of using GetPaths in cgroupv2 code. The problem is, the current code uses GetPaths for three kinds of things: 1. Get all the paths to cgroup v1 controllers to save its state (see (*linuxContainer).currentState(), (*LinuxFactory).loadState() methods). 2. Get all the paths to cgroup v1 controllers to have the setns process enter the proper cgroups in `(*setnsProcess).start()`. 3. Get the path to a specific controller (for example, `m.GetPaths()["devices"]`). Now, for cgroup v2 instead of a set of per-controller paths, we have only one single unified path, and a dedicated function `GetUnifiedPath()` to get it. This discrepancy between v1 and v2 cgroupManager API leads to the following problems with the code: - multiple if/else code blocks that have to treat v1 and v2 separately; - backward-compatible GetPaths() methods in v2 controllers; - - repeated writing of the PID into the same cgroup for v2; Overall, it's hard to write the right code with all this, and the code that is written is kinda hard to follow. The solution is to slightly change the API to do the 3 things outlined above in the same manner for v1 and v2: 1. Use `GetPaths()` for state saving and setns process cgroups entering. 2. Introduce and use Path(subsys string) to obtain a path to a subsystem. For v2, the argument is ignored and the unified path is returned. This commit converts all the controllers to the new API, and modifies all the users to use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

view details

Mrunal Patel

commit sha 867c9f5bc417a85ca0e5a80412b55961a4f20352

Merge pull request #2386 from kolyshkin/gordian-knot Simplify cgroup paths handling in v2 via unified v1/v2 API

view details

Aleksa Sarai

commit sha 19a3e9bf8053e74738b80a30f3ab1c5b088f8f7e

contrib: recvtty: add --no-stdin flag This is mostly just useful for testing with the "single" mode, since it allows you to run recvtty in the background without the console being closed. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 11736767d2c5b1dc468dd8d54802517b78db50d0

cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 4d363ccbd9cc7583ae21d029d6ea900eb960cf60

cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 5baa34c9134bd29a2dc92f471453421eadbd9a41

specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 8ed0c4c0364122bcee55fb1c4ca1048dca3b5c71

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 2ef5f417d9235e3e791b527ad2cd0da821ff33fd

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: 4d363ccbd9cc ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha b70aacc28f4309d123e8d4e8d079795fbb9d668d

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 410aef3ef8586983d696d6fc880779f59b258eb3

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: b70aacc28f43 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 505319dd805204802778cf2d3ece6abccbb47753

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 3e1dd0e84a518a399cedbc788f91042a5ac0cda9

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 14 days

Pull request review commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

 function enable_cgroup() { 		chmod g+rwx "$CGROUP_MOUNT/$CGROUP_PATH" 		chmod g+rw "$CGROUP_MOUNT/$CGROUP_PATH/cgroup.subtree_control" "$CGROUP_MOUNT/$CGROUP_PATH/cgroup.procs" "$CGROUP_MOUNT/cgroup.procs" 		# Fix up cgroup.type.-		echo threaded > "$CGROUP_MOUNT/$CGROUP_PATH/cgroup.type"+		if grep -qw invalid "$CGROUP_MOUNT/$CGROUP_PATH/cgroup.type"; then+			echo threaded > "$CGROUP_MOUNT/$CGROUP_PATH/cgroup.type"+		fi 		# Make sure cgroup.type doesn't contain "invalid". Otherwise write ops will fail with ENOTSUP.

Is the echo threaded > still needed? Seems a bit odd to run the entire test suite under threaded.

lifubang

comment created time in 14 days

Pull request review commentopencontainers/runc

cgroupv2: don't enable threaded mode by default

 func CreateCgroupPath(path string, c *configs.Cgroup) (Err error) { 					} 				}() 			}-			// Write cgroup.type explicitly.-			// Otherwise ENOTSUP may happen.-			cgType := filepath.Join(current, "cgroup.type")-			_ = ioutil.WriteFile(cgType, []byte("threaded"), 0644)+			// There are 4 types: 'domain', 'threaded', 'domain threaded', and 'domain invalid'.+			// If got 'domain invalid', we should check whether the current config contains domain controller or not.+			// If it does not contain domain controller, we can write "threaded" without retuning error.+			cgTypeFile := filepath.Join(current, "cgroup.type")+			cgType, _ := ioutil.ReadFile(cgTypeFile)+			if strings.TrimSpace(string(cgType)) == "domain invalid" {

One other thing we should consider is to give an error early if we are trying to join a cgroup in threaded mode but have domain controllers enabled in the configuration. That way we can avoid weird error messages and help users understand why cgroup setup is failing.

lifubang

comment created time in 14 days

pull request commentopencontainers/runc

Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig

LGTM, and this is yet another argument for us to completely purge specconv from the codebase.

pkagrawal

comment created time in 14 days

pull request commentopencontainers/runtime-spec

seccomp: allow to override errno return code

Yeah, LGTM as well.

giuseppe

comment created time in 14 days

pull request commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

Alright, this is ready for review. Tests are implemented for the changes, and I enabled the runc pause tests for rootless containers which was missed some time ago.

cyphar

comment created time in 14 days

push eventcyphar/runc

Aleksa Sarai

commit sha 423d0f73717eef2719caa04889f09f5892f139d6

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha cb929d5b4289823df0b9fe003e0194c788a501a6

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 14 days

push eventcyphar/runc

Aleksa Sarai

commit sha 673964e206b6ab0ff66b6f68066293c17c01bbff

cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 2a1f17277c3807617cb9139cd8e8258d6abeae1c

cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 459be065e25f14ba7d8efee466d6af4f179b25e1

specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 039fc12f0d3f8b4a4c5f34b350fc7384f4cb8d94

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 04e00cc3a139d2677c28625464cd5c5576d5dd1a

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: 2a1f17277c38 ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha e45d19b1e476002e39ba50073ebc0ddcbc1609d0

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha e2f7d419ff8cc102c13662523dd6ff95c52b5d1a

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: e45d19b1e476 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 10c3d1d7705b329be4c7e485724a95053f1c3d40

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha d6652729bb9eb6c3808db43579288888ff67c63e

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 14 days

issue commentopencontainers/runc

cgroup2: how can we support nested containers with domain controllers?

It looks like this change requires modifying the state file for containers and switching to trusting the pid1 of the container. My comment was more generally about why it can be a little hairy to trust pid1 in that manner, and that we should work on improving that at some point. I wasn't saying that there's a particular issue in this case.

AkihiroSuda

comment created time in 15 days

push eventcyphar/runc

Aleksa Sarai

commit sha 4a1b6ce879b83fe765766d5cca47c4586a5d7699

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 15 days

pull request commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

Hang on, I'm still working on fixing the tests -- the newest one is breaking (the auto-detection for the freezer cgroup isn't working). Should only take a moment to fix...

cyphar

comment created time in 15 days

push eventcyphar/runc

Aleksa Sarai

commit sha cb6394f58bc3e5892ed782d7f19b9dac10fa0251

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 15 days

push eventcyphar/runc

Aleksa Sarai

commit sha c957bbcf3b7825aa88bbff602fda7a11537e2c0c

cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 43d898a4d4a99fe95845c8fc3d779758326681f9

cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha fa96aa27d226dce9ca9c60eb46c4c9495212c08d

specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha ceea15e0e6df3e96ebd2792f40003ea0fa75c4cb

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 12bb559a357cef0e6777bd547a5b5e160b2ed218

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: 43d898a4d4a9 ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 6ba398c50c72de465d204f5b200d3cda85788a69

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 6c3210c89252a36abe1c0626547b16eda2f677ef

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: 6ba398c50c72 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 6b30d3fb06a3f8ef3bd9cd56dd7836fbf52002c2

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha abcac2e15db03f21e0b360cb8c04f79db648b848

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 15 days

push eventcyphar/runc

Aleksa Sarai

commit sha 2c2a459d58c999f407faf5f883d58d473ca13475

cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 0eca0d9e37b9d4a0828ab3079856a42742d4769d

cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 7c09f38cd02afff8d715ab7d80270267a978d635

specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha ecf9647923836a65ba61dc9dd07f2d197efc3eca

configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 496315cfbd0c34dcb10f1fbde65bebea105b7000

cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: 0eca0d9e37b9 ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha c67a6be7c784e6a9ef5f0faf67be9954e2edf7c5

cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for DevicesEmulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 680ec6f43808c749469a4c9253158c9b7457aaff

cgroups: systemd: make use of Device*= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually *write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad. So we simply generate the cgroupv1-style rules (which is what systemd's DevicesAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DevicesAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: c67a6be7c784 ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 571816220acd39a611090ce8bda26c46a73c56bd

tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

Aleksa Sarai

commit sha 67f5c1d1714f1e4d2a82db029a08aec32bf34aa2

tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>

view details

push time in 15 days

Pull request review commentopencontainers/runc

cgroup: devices: major cleanups and minimal transition rules

 func (m *legacyManager) Set(container *configs.Config) error { 		return err 	} +	// Unfreeze the cgroup.

I wanted to make sure we unfroze the cgroup before we ran the fsManager (to limit the length the time the container remains frozen, as well as to avoid stepping on our toes with the fsManager also writing to the freezer cgroup).

However, thinking about this more the defer is incorrect (as is using configs.Thawed directly) because if you're dealing with a user-frozen container (or the user requested we create a frozen container) then we won't do what they're expecting. I'll fix that now.

cyphar

comment created time in 15 days

PR merged openSUSE/catatonit

Remove duplicated SIGSEGV specification lgtm/need 1
+1 -1

3 comments

1 changed file

t-nelis

pr closed time in 15 days

more