profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/chenyw1990/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Yawei Chen chenyw1990 Huawei Technologies Inc

chenyw1990/dopamine 0

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

chenyw1990/kubernetes 0

Production-Grade Container Scheduling and Management

chenyw1990/NetEase-MusicBox 0

网易云音乐命令行版本,排行榜,搜索,精选歌单,登录,DJ节目,快速打碟,本地收藏歌单

chenyw1990/ohmyzsh 0

🙃 A delightful community-driven (with 1700+ contributors) framework for managing your zsh configuration. Includes 200+ optional plugins (rails, git, OSX, hub, capistrano, brew, ant, php, python, etc), over 140 themes to spice up your morning, and an auto-update tool so that makes it easy to keep up with the latest updates from the community.

chenyw1990/website 0

Kubernetes website and documentation repo:

chenyw1990/zuul 0

Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.

pull request commentkubernetes/enhancements

[WIP] Kube proxy args reconciliation

/retitle Kube proxy args reconciliation

jayunit100

comment created time in an hour

pull request commentkubernetes/enhancements

[WIP] Kube proxy args reconciliation

@jayunit100: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify 68fa4b0290d848fb5edc5382e9b354f8ca056f96 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

jayunit100

comment created time in 2 hours

startedftexchange/ftx

started time in 4 hours

Pull request review commentkubernetes/enhancements

[DO-NOT-MERGE] [WIP] 20201010 kube-proxy v2: reworking the proxy's architecture

+# WORK IN PROGRESS+<!--+**Note:** When your KEP is complete, all of these comment blocks should be removed.++To get started with this template:++- [ ] **Pick a hosting SIG.**+  Make sure that the problem space is something the SIG is interested in taking+  up. KEPs should not be checked in without a sponsoring SIG.+- [ ] **Create an issue in kubernetes/enhancements**+  When filing an enhancement tracking issue, please make sure to complete all+  fields in that template. One of the fields asks for a link to the KEP. You+  can leave that blank until this KEP is filed, and then go back to the+  enhancement and add the link.+- [ ] **Make a copy of this template directory.**+  Copy this template into the owning SIG's directory and name it+  `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no+  leading-zero padding) assigned to your enhancement above.+- [ ] **Fill out as much of the kep.yaml file as you can.**+  At minimum, you should fill in the "Title", "Authors", "Owning-sig",+  "Status", and date-related fields.+- [ ] **Fill out this file as best you can.**+  At minimum, you should fill in the "Summary" and "Motivation" sections.+  These should be easy if you've preflighted the idea of the KEP with the+  appropriate SIG(s).+- [ ] **Create a PR for this KEP.**+  Assign it to people in the SIG who are sponsoring this process.+- [ ] **Merge early and iterate.**+  Avoid getting hung up on specific details and instead aim to get the goals of+  the KEP clarified and merged quickly. The best way to do this is to just+  start with the high-level sections and fill out details incrementally in+  subsequent PRs.++Just because a KEP is merged does not mean it is complete or approved. Any KEP+marked as `provisional` is a working document and subject to change. You can+denote sections that are under active debate as follows:++```+<<[UNRESOLVED optional short context or usernames ]>>+Stuff that is being argued.+<<[/UNRESOLVED]>>+```++When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions+focused. If you disagree with what is already in a document, open a new PR+with suggested changes.++One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.+You do not need a new KEP to move from beta to GA, for example. If+new details emerge that belong in the KEP, edit the KEP. Once a feature has become+"implemented", major changes should get new KEPs.++The canonical place for the latest set of instructions (and the likely source+of this file) is [here](/keps/NNNN-kep-template/README.md).++**Note:** Any PRs to move a KEP to `implementable`, or significant changes once+it is marked `implementable`, must be approved by each of the KEP approvers.+If none of those approvers are still appropriate, then changes to that list+should be approved by the remaining approvers and/or the owning SIG (or+SIG Architecture for cross-cutting KEPs).+-->+# KEP-20201010: rework kube-proxy architecture++<!--+A table of contents is helpful for quickly jumping to sections of a KEP and for+highlighting any additional information provided beyond the standard KEP+template.++Ensure the TOC is wrapped with+  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>+tags, and then generate with `hack/update-toc.sh`.+-->++<!-- toc -->+- [Release Signoff Checklist](#release-signoff-checklist)+- [Summary](#summary)+- [Motivation](#motivation)+  - [Goals](#goals)+  - [Non-Goals](#non-goals)+- [Proposal](#proposal)+  - [User Stories (Optional)](#user-stories-optional)+    - [Story 1](#story-1)+    - [Story 2](#story-2)+  - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)+  - [Risks and Mitigations](#risks-and-mitigations)+- [Design Details](#design-details)+  - [Test Plan](#test-plan)+  - [Graduation Criteria](#graduation-criteria)+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)+  - [Version Skew Strategy](#version-skew-strategy)+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)+  - [Monitoring Requirements](#monitoring-requirements)+  - [Dependencies](#dependencies)+  - [Scalability](#scalability)+  - [Troubleshooting](#troubleshooting)+- [Implementation History](#implementation-history)+- [Drawbacks](#drawbacks)+- [Alternatives](#alternatives)+- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)+<!-- /toc -->++## Release Signoff Checklist++<!--+**ACTION REQUIRED:** In order to merge code into a release, there must be an+issue in [kubernetes/enhancements] referencing this KEP and targeting a release+milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)+of the targeted release**.++For enhancements that make changes to code or processes/procedures in core+Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release+Signoff checklist to be completed.++Check these off as they are completed for the Release Team to track. These+checklist items _must_ be updated for the enhancement to be released.+-->++Items marked with (R) are required *prior to targeting to a milestone / release*.++- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)+- [ ] (R) KEP approvers have approved the KEP status as `implementable`+- [ ] (R) Design details are appropriately documented+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input+- [ ] (R) Graduation criteria is in place+- [ ] (R) Production readiness review completed+- [ ] Production readiness review approved+- [ ] "Implementation History" section is up-to-date for milestone+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes++<!--+**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.+-->++[kubernetes.io]: https://kubernetes.io/+[kubernetes/enhancements]: https://git.k8s.io/enhancements+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes+[kubernetes/website]: https://git.k8s.io/website++## Summary++At the beginning, `kube-proxy` was designed to handle the translation of Service objects to OS-level+resources.  Implementations have been userland, then iptables, and now ipvs. With the growth of the+Kubernetes project, more implementations came to life, for instance with eBPF, and often in relation+to other goals (Calico to manage the network overlay, Cilium to manage app-level security, metallb+to provide an external LB for bare-metal clusters, etc).++Along this cambrian explosion of third-party software, the Service object itself received new+concepts to improve the abstraction, for instance to express topology. This, third-party+implementation are expected to update and become more complex over time, even if their core doesn't+change (ie, the eBPF translation layer is not affected).++This KEP is born from the conviction that more decoupling of the Service object and the actual+implementations is required, by introducing an intermediate, node-level abstraction provider. This+abstraction is expected to be the result of applying Kubernetes' `Service` semantics and business+logic to a simpler, more stable API.++## Motivation++<!--+This section is for explicitly listing the motivation, goals and non-goals of+this KEP.  Describe why the change is important and the benefits to users. The+motivation section can optionally provide links to [experience reports] to+demonstrate the interest in a KEP within the wider Kubernetes community.++[experience reports]: https://github.com/golang/go/wiki/ExperienceReports+-->++### Goals++- provide a node-level abstraction of the cluster-wide `Service` semantics through an API+- allow easier, more stable proxy implementations that don't need updates when `Service` business+  logic changes+- provide a client library with minimal dependencies+- include equivalent implementations of in-project ones (userland, iptables and ipvs)+- (optional) help proxy implementations using the same subsystem (ie iptables) to cooperate more+  easily++<!--+List the specific goals of the KEP. What is it trying to achieve? How will we+know that this has succeeded?+-->++### Non-Goals++- provide equivalent implementations of third-party ones++<!--+What is out of scope for this KEP? Listing non-goals helps to focus discussion+and make progress.+-->++## Proposal++<!--+This is where we get down to the specifics of what the proposal actually is.+This should have enough detail that reviewers can understand exactly what+you're proposing, but should not include things like API designs or+implementation. The "Design Details" section below is for the real+nitty-gritty.+-->++Rewrite the kube-proxy to be a "localhost" gRPC API provider that will be accessible as usual via+TCP (`127.0.0.1:12345`) and/or via a socket (`unix:///path/to/proxy.sock`).++- it will connect to the API server and watch resources, like the current proxy;+- then, it will process them, applying Kubernetes specific business logic like topology computation+  relative to the local host;+- finally, provide the result of this computation to client via a gRPC watchable API.++This decoupling allows kube-proxy and implementation to evolve in their own timeframes. For+instance, introducing optimizations like EndpointSlice or new business semantics like Topology+does not trigger a rebuild/release of any proxy implementation.++The idea is to send the full state to the client, so implementations don't have to do+diff-processing and maintain any internal state. This should provide simple implementations,+reliable results and still be quite optimal, since many kernel network-level objects are+updated via atomic replace APIs. It's also a protection from slow readers, since no stream has to+be buffered.++Since the node-local state computed by the new proxy will be simpler and node-specific, it will+only change when the result for the current node is actually changed. Since there's less data in+the local state, change frequency is reduced compared to cluster state. Testing on actual clusters+showed a frequency reduction of change events by 2 orders of magnitude.++### User Stories (Optional)++<!--+Detail the things that people will be able to do if this KEP is implemented.+Include as much detail as possible so that people can understand the "how" of+the system. The goal here is to make this feel real for users without getting+bogged down.+-->++#### Story 1++TBD (Calico eBPF)++#### Story 2++TBD (node-local cluster DNS provider)++### Notes/Constraints/Caveats (Optional)++<!--+What are the caveats to the proposal?+What are some important details that didn't come across above?+Go in to as much detail as necessary here.+This might be a good place to talk about core concepts and how they relate.+-->++- sending the full-state could be resource consuming on big clusters, but it should still be O(1) to+  the actual kernel definitions (the complexity of what the node has to handle cannot be reduced+  without losing functionality or correctness).++### Risks and Mitigations++<!--+What are the risks of this proposal, and how do we mitigate? Think broadly.+For example, consider both security and how this will impact the larger+Kubernetes ecosystem.++How will security be reviewed, and by whom?++How will UX be reviewed, and by whom?++Consider including folks who also work outside the SIG or subproject.+-->++## Design Details++<!--+This section should contain enough information that the specifics of your+change are understandable. This may include API specs (though not always+required) or even code snippets. If there's any ambiguity about HOW your+proposal will be implemented, this is the place to discuss them.+-->++A [draft implementation] exists and some [performance testing] has been done.++[draft implementation]: https://github.com/mcluseau/kube-proxy2/+[performance testing]: https://github.com/mcluseau/kube-proxy2/blob/master/doc/proposal.md++The watchable API will be a long polling, taking a "last known state info" and returning a stream of+objects. Proposed definition:++```proto+service Endpoints {+    // Returns all the endpoints for this node.+    rpc Next (NextFilter) returns (stream NextItem);+}++message NextFilter {+    // Unique instance ID to manage proxy restarts+    uint64 InstanceID = 1;+    // The latest revision we're aware of (0 at first)+    uint64 Rev = 2;+}++message NextItem {+    // Filter to use to get the next notification (first item in stream)+    NextFilter Next = 1;+    // A service endpoints item (any item after the first)+    ServiceEndpoints Endpoints = 2;+}+```++When the proxy starts, it will generate a random InstanceID, and have Rev at 0. So, a client+(re)connecting will get the new state either after a proxy restart or when an actual change occurs.+The proxy will never send a partial state, only full states. This means it waits to have all its+Kubernetes watchers sync'ed before going to Rev 1.++The first NextItem in the stream will be the state info required for the next polling call, and any+subsequent item will be an actual state object. The stream is closed when the full state has been+sent.++The client library abstracts those details away and provides the full state after each change, and+includes a default Run function, setting up default flags, parsing them and running the client,+allowing very simple clients like this:++```golang+package main++import (+	"fmt"+	"os"+	"time"++	"github.com/mcluseau/kube-proxy2/pkg/api/localnetv1"+	"github.com/mcluseau/kube-proxy2/pkg/client"+)++func main() {+	client.Run(printState)+}++func printState(items []*localnetv1.ServiceEndpoints) {+	fmt.Fprintln(os.Stdout, "#", time.Now())+	for _, item := range items {+		fmt.Fprintln(os.Stdout, item)+	}+}+```++The currently proposed interface for the lower-level client is as follows:++```godoc+package client // import "github.com/mcluseau/kube-proxy2/pkg/client"++type EndpointsClient struct {+	// Target is the gRPC dial target+	Target string++	// InstanceID and Rev are the latest known state (used to resume a watch)+	InstanceID uint64+	Rev        uint64++	// ErrorDelay is the delay before retrying after an error.+	ErrorDelay time.Duration++	// Has unexported fields.+}+    EndpointsClient is a simple client to kube-proxy's Endpoints API.++func New(flags FlagSet) (epc *EndpointsClient)+    New returns a new EndpointsClient with values bound to the given flag-set+    for command-line tools. Other needs can use `&EndpointsClient{...}`+    directly.++func (epc *EndpointsClient) Cancel()+    Cancel will cancel this client, quickly closing any call to Next.++func (epc *EndpointsClient) CancelOn(signals ...os.Signal)+    CancelOn make the given signals to cancel this client.++func (epc *EndpointsClient) CancelOnSignals()+    CancelOnSignals make the default termination signals to cancel this client.++func (epc *EndpointsClient) DefaultFlags(flags FlagSet)+    DefaultFlags registers this client's values to the standard flags.++func (epc *EndpointsClient) Next() (items []*localnetv1.ServiceEndpoints, canceled bool)+    Next returns the next set of ServiceEndpoints, waiting for a new revision as+    needed. It's designed to never fail and will always return latest items,+    unless canceled.+```++A good example of how to use this low level API is the `client.Run` itself:++```golang+// Run the client with the standard options+func Run(handlers ...HandlerFunc) {+	once := flag.Bool("once", false, "only one fetch loop")++	epc := New(flag.CommandLine)++	flag.Parse()++	epc.CancelOnSignals()++	for {+		items, canceled := epc.Next()++		if canceled {+			return+		}++		for _, handler := range handlers {+			handler(items)+		}++		if *once {+			klog.Infof("to resume this watch, use --instance-id %d --rev %d", epc.InstanceID, epc.Rev)+			return+		}+	}+}+```++- use the docker syntax to express binding, allowing sockets with+  `unix:///run/kubernetes/proxy.sock`+- may economize some syscalls for internal implementations by using+  `google.golang.org/grpc/test/bufconn`, but that sounds like premature optimization++### Test Plan++<!--+**Note:** *Not required until targeted at a release.*++Consider the following in developing a test plan for this enhancement:+- Will there be e2e and integration tests, in addition to unit tests?+- How will it be tested in isolation vs with other components?++No need to outline all of the test cases, just the general strategy. Anything+that would count as tricky in the implementation, and anything particularly+challenging to test, should be called out.++All code is expected to have adequate tests (eventually with coverage+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]+when drafting this test plan.++[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md+-->++### Graduation Criteria++<!--+**Note:** *Not required until targeted at a release.*++Define graduation milestones.++These may be defined in terms of API maturity, or as something else. The KEP+should keep this high-level with a focus on what signals will be looked at to+determine graduation.++Consider the following in developing the graduation criteria for this enhancement:+- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]+- [Deprecation policy][deprecation-policy]++Clearly define what graduation means by either linking to the [API doc+definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)+or by redefining what graduation means.++In general we try to use the same stages (alpha, beta, GA), regardless of how the+functionality is accessed.++[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions+[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/++Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].++#### Alpha -> Beta Graduation++- Gather feedback from developers and surveys+- Complete features A, B, C+- Tests are in Testgrid and linked in KEP++#### Beta -> GA Graduation++- N examples of real-world usage+- N installs+- More rigorous forms of testing—e.g., downgrade tests and scalability tests+- Allowing time for feedback++**Note:** Generally we also wait at least two releases between beta and+GA/stable, because there's no opportunity for user feedback, or even bug reports,+in back-to-back releases.++#### Removing a Deprecated Flag++- Announce deprecation and support policy of the existing flag+- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)+- Address feedback on usage/changed behavior, provided on GitHub issues+- Deprecate the flag++**For non-optional features moving to GA, the graduation criteria must include+[conformance tests].**++[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md+-->++### Upgrade / Downgrade Strategy

this section needs to include how deployers and the ecosystem would migrate users from kube-proxy v1 -> v2. existing settings in user KubeProxyConfiguration v1alpha1 -> X would be in scope too.

mcluseau

comment created time in 5 hours

pull request commentkubernetes/enhancements

KEP: Support scaling HPA to/from zero pods for object/external metrics

Sorry for the lack of updates, I pushed a refined proposal around handling disabling and updating from 0 to 1.

Let me know what you think.

johanneswuerbach

comment created time in 6 hours

pull request commentkubernetes/enhancements

KEP: Support scaling HPA to/from zero pods for object/external metrics

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: <a href="https://github.com/kubernetes/enhancements/pull/2022#" title="Author self-approved">johanneswuerbach</a> To complete the pull request process, please ask for approval from gjtempleton after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

<details open> Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment </details> <!-- META={"approvers":["gjtempleton"]} -->

johanneswuerbach

comment created time in 6 hours

pull request commentkubernetes/enhancements

[DO-NOT-MERGE] [WIP] 20201010 kube-proxy v2: reworking the proxy's architecture

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

mcluseau

comment created time in 7 hours

issue commentkubernetes/enhancements

DaemonSets should support MaxSurge to improve workload availability

@fejta-bot: Closing this issue.

<details>

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

smarterclayton

comment created time in 8 hours

issue closedkubernetes/enhancements

DaemonSets should support MaxSurge to improve workload availability

Enhancement Description

  • One-line enhancement description: Many infrastructure components (CNI, CSI) require DaemonSets to place pods on each node, but the current update strategies limit end users from minimizing disruption during updates. It should be possible to surge daemonsets if a user requests it to allow handoff from one running pod to another, like Deployments.
  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/1591-daemonset-surge
  • Primary contact (assignee): @smarterclayton
  • Responsible SIGs: @kubernetes/sig-apps-feature-requests @kubernetes/sig-node-feature-requests
  • Enhancement target (which target equals to which milestone):
    • Alpha release target 1.20
    • Beta release target 1.21
    • Stable release target 1.23

closed time in 8 hours

smarterclayton

issue commentkubernetes/enhancements

DaemonSets should support MaxSurge to improve workload availability

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

smarterclayton

comment created time in 8 hours

pull request commentkubernetes/enhancements

SIG Auth add KEP: Generate and Mount x509 Certificate for ServiceAccount

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

answer1991

comment created time in 8 hours

issue commentkubernetes/enhancements

Support of mixed protocols in Services with type=LoadBalancer

/remove-lifecycle stale

janosi

comment created time in 11 hours

issue commentkubernetes/enhancements

Support of mixed protocols in Services with type=LoadBalancer

/remove-lifecycle-stale

janosi

comment created time in 11 hours

issue commentkubernetes/enhancements

Support of mixed protocols in Services with type=LoadBalancer

/remove-lifecycle sta

janosi

comment created time in 11 hours

issue commentkubernetes/enhancements

Generic node resource scheduler plugin

@fejta-bot: Closing this issue.

<details>

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

houz42

comment created time in 13 hours

issue closedkubernetes/enhancements

Generic node resource scheduler plugin

Enhancement Description

  • Generic node resource scheduler plugin
  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/pull/2144
  • Discussion Link: https://github.com/kubernetes-sigs/scheduler-plugins/issues/95
  • Primary contact (assignee):
  • Responsible SIGs: sig-scheduling
  • Enhancement target (which target equals to which milestone):
    • Alpha release target (x.y):
    • Beta release target (x.y):
    • Stable release target (x.y):
  • [ ] Alpha
    • [ ] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/2144
    • [ ] Code (k/k) update PR(s):
    • [ ] Docs (k/website) update PR(s):

<!-- Uncomment these as you prepare the enhancement for the next stage

  • [ ] Beta
    • [ ] KEP (k/enhancements) update PR(s):
    • [ ] Code (k/k) update PR(s):
    • [ ] Docs (k/website) update(s):
  • [ ] Stable
    • [ ] KEP (k/enhancements) update PR(s):
    • [ ] Code (k/k) update PR(s):
    • [ ] Docs (k/website) update(s): -->

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

closed time in 13 hours

houz42

issue commentkubernetes/enhancements

Generic node resource scheduler plugin

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community. /close

houz42

comment created time in 13 hours

issue commentkubernetes/enhancements

Support of mixed protocols in Services with type=LoadBalancer

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

janosi

comment created time in 13 hours

pull request commentkubernetes/enhancements

[wip] Introduce minReadySeconds in StatefulSets

@ravisantoshgudimetla: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify 6f1aedb6b66a9ce1e2ecf52ecf474fb07c755c0b link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

ravisantoshgudimetla

comment created time in a day

pull request commentkubernetes/enhancements

[wip] Introduce minReadySeconds in StatefulSets

@ravisantoshgudimetla: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify 7824993ac9eda1b307c442be164cc045e2006e20 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

ravisantoshgudimetla

comment created time in a day

starteddogecoin/dogecoin

started time in a day

Pull request review commentkubernetes/enhancements

PSP Replacement KEP

+# KEP-2579: PSP Replacement Policy (placeholder)++<<[UNRESOLVED]>>++The name of this feature / policy is open to discussion. Options considered:+- Pod Isolation Policy+- Pod Policy Check+- Pod Security Standards+- ~~Namsepace Security Policy~~ (there is a lot more to namespace security that is out-of-scope for this policy)+- ~~Pod Configuration Constraints~~ (not all of pod configuration is in-scope)+- Pod Security Constraints+- Pod Isolation Constraints+- Pod Isolation Requirements+- Namespace Security Labels+- ~~Pod Security Defaults~~ (policy is non-mutating)+- Pod Security Policy v2++<<[/UNRESOLVED]>>++<!-- toc -->+- [Summary](#summary)+- [Motivation](#motivation)+  - [Goals](#goals)+    - [Requirements](#requirements)+  - [Non-Goals](#non-goals)+- [Proposal](#proposal)+  - [API](#api)+  - [Validation](#validation)+  - [Versioning](#versioning)+  - [PodTemplate Resources](#podtemplate-resources)+  - [Namespace policy update warnings](#namespace-policy-update-warnings)+  - [Admission Configuration](#admission-configuration)+    - [Defaulting](#defaulting)+    - [Exemptions](#exemptions)+  - [Risks and Mitigations](#risks-and-mitigations)+- [Design Details](#design-details)+  - [Updates](#updates)+    - [Ephemeral Containers](#ephemeral-containers)+    - [Other Pod Subresources](#other-pod-subresources)+  - [Pod Security Standards](#pod-security-standards)+  - [Flexible Extension Support](#flexible-extension-support)+  - [Test Plan](#test-plan)+  - [Monitoring](#monitoring)+  - [Graduation Criteria](#graduation-criteria)+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)+  - [Version Skew Strategy](#version-skew-strategy)+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)+  - [Monitoring Requirements](#monitoring-requirements)+  - [Dependencies](#dependencies)+  - [Scalability](#scalability)+  - [Troubleshooting](#troubleshooting)+- [Optional Future Extensions](#optional-future-extensions)+  - [Rollout of baseline-by-default for unlabeled namespaces](#rollout-of-baseline-by-default-for-unlabeled-namespaces)+  - [PodSecurityPolicy Migration](#podsecuritypolicy-migration)+  - [Custom Profiles](#custom-profiles)+  - [Custom Warning Messages](#custom-warning-messages)+  - [Windows Support](#windows-support)+  - [Offline Policy Checking](#offline-policy-checking)+  - [Event recording](#event-recording)+  - [Conformance](#conformance)+- [Implementation History](#implementation-history)+- [Drawbacks](#drawbacks)+- [Alternatives](#alternatives)+- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)+<!-- /toc -->++## Summary++Replace PodSecurityPolicy with a new built-in admission controller that enforces the+[Pod Security Standards].++- Policy enforcement is controlled at the namespace level through labels+- Policies can be applied in 3 modes. Multiple modes can apply to a single namespace.+    - Enforcing: policy violations cause the pod to be rejected+    - Audit: policy violations trigger an audit annotation, but are otherwise allowed+    - Warning: policy violations trigger a user-facing warning, but are otherwise allowed+- An optional per-mode version label can be used to pin the policy to the version that shipped+  with a given Kubernetes minor version (e.g. `v1.18`)+- Dry-run of namespace updates is supported to test enforcing policy changes against existing pods.+- Policy exemptions can be statically configured based on (requesting) user, RuntimeClass, or+  namespace. A request meeting exemption criteria is ignored by the admission plugin.++[Pod Security Standards]: https://kubernetes.io/docs/concepts/security/pod-security-standards/++## Motivation++Pod Security Policy is deprecated as of Kubernetes v1.21. There were numerous problems with+PSP that lead to the decision to deprecate it, rather than promote it to GA, including:++1. Policy authorization model - Policies are bound to the requesting user OR the pod’s service+   account+     - Granting permission to the user is intuitive, but breaks controllers+     - Dual model weakens security+2. Rollout challenges - PSP fails closed in the absence of policy+     - The feature can never be enabled by default+     - Need 100% coverage before rolling out (and no dry-run / audit mode)+     - Leads to insufficient test coverage+3. Inconsistent & unbounded API - Partly an aesthetic issue, but highlights lack of flexibility+     - API has grown organically and has many internal inconsistencies (usability challenge)+     - Unclear how to decide what should be part of PSP (e.g. fine-grained volume restrictions)+     - Doesn’t compose well+     - Mutation priority can be unexpected++However, we still feel that Kubernetes should include a mechanism to prevent privilege escalation+through the create-pod permission.++### Goals++Replace PodSecurityPolicy without compromising the ability for Kubernetes to limit privilege+escalation out of the box. Specifically, there should be a built-in way to limit create/update pod+permissions so they are not equivalent to root-on-node (or cluster).++#### Requirements++1. Validating only (i.e. no changing pods to make them comply with policy)+2. Safe to enable in new AND upgraded clusters+    - Dryrun policy changes and/or Audit-only mode+3. Built-in in-tree controller+4. Capable of supporting Windows in the future, if not in the initial release+    - Don’t automatically break windows pods+5. Must be responsive to Pod API evolution across versions+6. (fuzzy) Easy to use, don’t need to be a kubernetes/security/linux expert to meet the basic objective+7. Extensible: should work with custom policy implementations without whole-sale replacement+    - Enable custom policies keyed off of RuntimeClassNames++Nice to have:++1. Exceptions or policy bindings by requesting user+2. (fuzzy) Windows support in the initial release+3. Admission controller is enabled by default in beta or GA phase+4. Provide an easy migration path from PodSecurityPolicy+5. Enforcement on pod-controller resources (i.e. things embedding a PodTemplate)++### Non-Goals++1. Provide a configurable policy mechanism that meets the needs of all use-cases+2. Limit privilege escalation and other attacks beyond the pod to host boundary+   (e.g. policies on services, secrets, etc.)+++## Proposal++The three profile levels (privileged, baseline, restricted) of the [Pod Security Standards] will+be hardcoded into the new admission plugin. Changes to the standards will be tied to the Kubernetes+version that they were introduced in.++### API++<<[UNRESOLVED]>>++The annotation prefix is a placeholder, and will depend on the final feature name. The proposed+prefix structure is `<featurename>.kubernetes.io`.++<<[/UNRESOLVED]>>++Policy application is controlled based on labels on the namespace. The following labels are supported:+```+<prefix>/allow: <policy level>+<prefix>/allow-version: <policy version>+<prefix>/audit: <policy level>+<prefix>/audit-version: <policy version>+<prefix>/warn: <policy level>+<prefix>/warn-version: <policy version>+```++**Allow:** Pods meeting the requirements of the allowed level are allowed. Violations are rejected+in admission.++**Audit:** Pods and [templated pods] meeting the requirements of the audit policy level are ignored.+Violations are recorded in a `<prefix>/audit: <violation>` annotation on the+audit event for the request. Audit annotations will **not** be applied to the pod objects+themselves, as doing so would violate the non-mutating requirement.++**Warn:** Pods and [templated pods] meeting the requirements of the warn level are ignored.+Violations are returned in a user-facing warning message. Warn & audit modes are independent; if the+functionality of both is desired, then both labels must be set.++[templated pods]: #podtemplate-resources++<<[UNRESOLVED]>>++The mode key names are open to discussion. In particular, I'm not satisfied that `allow` and+`audit/warn` carry slightly different meanings:++- `allow` means "allow pods meeting this profile level & below" (i.e. deny above)+- `audit` and `warn` mean "audit/warn on pods exceeding this profile level" (i.e. audit/warn above)++~~Using `deny` for the enforcing level would be more consistent with audit/warn (deny above), but is+a counter-intuitive user experience. One would expect `deny = privileged` to deny privileged pods,+but it really means allow privileged pods (hence using `allow` for the key).~~ (Rejected)++Another option is to change the `audit` and `warn` keys to something meaning "don't audit/warn at this+level and bellow", but we haven't thought of a good name. The best alternative we've come up with is+`audit-above` and `warn-above`, but that is cumbersome.++A final option is to change `allow` to `enforce`, which is more ambiguous. The ambiguity makes the+inconsistency with audit & warn less obvious, but doesn't help the user experience.++<<[/UNRESOLVED]>>++There are several reasons for controlling the policy directly through namespace labels, rather than+through a separate object:++- Using labels enables various workflows around policy management through kubectl, for example+  issuing queries like `kubectl get namespaces -l+  <prefix>/allow-version!=v1.22` to find namespaces where the enforcing+  policy isn't pinned to the most recent version.+- Keeping the options on namespaces allows atomic create-and-set-policy, as opposed to creating a+  namespace and then creating a second object inside the namespace.+- Policies settings are per-namespace singletons, and singleton objects are not well supported in+  Kubernetes.+- Labels are lighter-weight than namespace fields, making it clear that this is a policy layer on+  top of namespaces, not inherent to namespaces itself.++### Validation++The following restrictions are placed (by the admission plugin) on the policy namespace labels:++1. Unknown labels with the pod security prefix (TBD) are rejected, e.g. `<prefix>/foo-bar`+2. Policy level must be one of: `privileged`, `baseline`, `restricted`+3. Version values must be match `(latest|v[0-9]+\.[0-9]+`. That is, one of:+    1. `latest`+    2. `vMAJOR.MINOR` (e.g. `v1.21`)++Enforcement is best effort, and invalid labels that pre-existed the admission controller enablement+are ignored. Updates to a previously invalid label are only allowed if the new value is valid.++### Versioning++A specific version can be supplied for each enforcement mode. The version pins the policy to the+version that was defined at that kubernetes version. The default version is `latest`, which can be+provided to explicitly use the latest definition of the policy. There is some nuance to how versions+other than latest are applied:++- If the constrained pod field has not changed since the pinned version, the policy is applied as+  originally specified.+- The privileged profile always means fully unconstrained and is effectively unversioned (specifying+  a version is allowed but ignored).+- Specifying a version more recent than the current Kubernetes version is allowed (for rollback &+  version skew reasons), but is treated as `latest`.++<<[UNRESOLVED]>>++Under the webhook implementation, policy versions are tied to the webhook version, not the cluster+version. This means that it is recommended for the webhook to updated prior to updating the cluster.++~~if the webhook version is newer than the cluster version, how+should policy versions past the cluster version be handled? Should `latest` mean the latest version+supported by the webhook, or the policy version matching the cluster version? _Leaning towards+supporting all policy versions supported by the webhook, even if they are newer than the cluster+version._~~++Note that policies are not guaranteed to be backwards compatible, and a newer restricted policy+could require setting a field that doesn't exist in the current API version.++<<[/UNRESOLVED]>>++- Under an older version X of a policy:+    - Allow pods that were allowed under policy version X running on cluster version X+    - Allow pods that set new fields that the policy level has no opinion about (e.g. pod overhead+      fields)+    - Allow pods that set new fields that the policy level has an opinion about if the value is the+      default (explicit or implicit) value OR the value is allowed by newer versions of the policy+      level (e.g. a less privileged value of a new field)++For example, the restricted policy level now requires `allowPrivilegeEscalation=false`, but this+field wasn't added until Kubernetes v1.8, and all containers prior to v1.8 implicitly ran as+`allowPrivilegeEscalation=true`. Under the **restricted v1.7** profile, the following+`allowPrivilegeEscalation` configurations would be allowed on a v1.8 cluster:+- `null` (allowed during the v1.7 release)+- `true` (equal in privilege to a v1.7 pod that didn't set the field)+- `false` (strictly less privileged than other allowed values)++### PodTemplate Resources++Audit and Warn modes are also checked on resource types that embed a PodTemplate (enumerated below),+but Allow mode only applies to actual pod resources.++Since users do not create pods directly in the typical deployment model, the warning mechanism is+only effective if it can also warn on templated pod resources. Similarly, for audit it is useful to+tie the audited violation back to the requesting user, so audit will also apply to templated pod+resources. In the interest of supporting mutating admission controllers, policies will only+be enforced on actual pods.++Templated pod resources include:++- v1 ReplicationController+- v1 PodTemplate+- apps/v1 ReplicaSet+- apps/v1 Deployment+- apps/v1 StatefulSet+- apps/v1 DaemonSet+- batch/v1 CronJob+- batch/v1 Job++PodTemplate warnings & audit will only be applied to built-in types. CRDs that wish to take+advantage of this functionality can use an object reference to a v1/PodTemplate resource rather than+inlining a PodTemplate. We will publish a guide (documentation and/or examples) that demonstrate+this pattern. Alternatively, the functionality can be implemented in a 3rd party admission plugin+leveraging the library implementation.++### Namespace policy update warnings++When an allow policy (or version) label is added or changed, the admission plugin will test each pod+in the namespace against the new policy. Violations are returned to the user as warnings. These+checks have a timeout of XX seconds and a limit of YY pods, and will return a warning in the event+that not every pod was checked. User exemptions are ignored by these checks, but runtime class+exemptions still apply. Namespace exemptions are also ignored, but an additional warning will be+returned when updating the policy on an exempt namespace. These checks only consider actual Pod+resources, not [templated pods].++These checks are also performed when making a dry-run request, which can be an effective way of+checking for breakages before updating a policy, for example:++```+kubectl label --dry-run=server --overwrite ns --all <prefix>/allow=baseline+```++<<[UNRESOLVED]>>++- What should the timout be for pod update warnings?+- What should the pod limit be set to?++<<[/UNRESOLVED]>>+++### Admission Configuration++A number of options can be statically configured through the [Admission Configuration file][]:++```+apiVersion: apiserver.config.k8s.io/v1+kind: AdmissionConfiguration+plugins:+- name: PSPReplacement # Placeholder+  configuration:+    defaults:  # Defaults applied when a mode label is not set.+      allow:         <default allow policy level>+      allow-version: <default allow policy version>+      audit:         <default audit policy level>+      audit-version: <default audit policy version>+      warn:          <default warn policy level>+      warn-version:  <default warn policy version>+    exemptions:+      usernames:         [ <array of authenticated usernames to exempt> ]+      runtimeClassNames: [ <array of runtime class names to exempt> ]+      namespaces:        [ <array of namespaces to exempt> ]+...+```++[Admission Configuration file]: https://github.com/kubernetes/kubernetes/blob/3d6026499b674020b4f8eec11f0b8a860a330d8a/staging/src/k8s.io/apiserver/pkg/apis/apiserver/v1/types.go#L27++#### Defaulting++The default policy level and version for each mode (when no label is present) can be statically+configured. The default for the static configuration is `privileged` and `latest`.++#### Exemptions++Policy exemptions can be statically configured. Exemptions must be explicitly enumerated, and don’t+support indirection such as label or group selectors. Requests meeting criteria are completely by+the admission controller (allow, audit and warn). Exemption dimensions include:++- Usernames: requests from users with an exempt authenticated (or impersonated) username are ignored.+- RuntimeClassNames: pods and [templated pods] with specifying an exempt runtime class name are ignored.+- Namespaces: pods and [templated pods] in an exempt namespace are ignored.++<<[UNRESOLVED]>>++Should the `kube-system` namespace default to being exempt? Alternatively, we can just label it as+privileged out of the box.++<<[/UNRESOLVED]>>++### Risks and Mitigations++**Future proofing:** The policy versioning aspects of this proposal are designed to anticipate+breaking changes to policies, either in light of new threats or new knobs added to pods. However, if+there is a new feature that needs to be restricted but doesn't have a sensible hardcoded+requirement, we would be put in a hard place. Hopefully the adoption of this proposal will+discourage such fields from being added.++**Scope creep:** How do we decide which fields are in-scope for policy restrictions? There are a+number of fields that are relevant to local-DoS prevention that are somewhat ambiguous.++**Ecosystem suppression:** In the discussions of PodSecurityPolicy replacements, there was a concern+that whatever we picked would become not only an ecosystem standard best-practice, but practically a+requirement (e.g. for compliance), and that in doing so we would prevent 3rd party policy+controllers from innovating and getting meaningful adoption. To mitigate these concerns, we have+tightly scoped this proposal, and hopefully struck an effective balance between configurability and+usefulness that still leaves plenty of room for 3rd party extensions. We are also providing a+library implementation and spec to encourage custom controller development.++**Exemptions creep:** We have already gotten requests to trigger exemptions on various different+fields. The more exemption knobs we add, the harder it becomes to comprehend the current state of+the policy, and the more room there is for error. To prevent this, we should be very conservative+about adding new exemption dimensions. The existing knobs were carefully chosen with specific+extensibility usecases in mind.++<!--+What are the risks of this proposal, and how do we mitigate? Think broadly.+For example, consider both security and how this will impact the larger+Kubernetes ecosystem.++How will security be reviewed, and by whom?++How will UX be reviewed, and by whom?++Consider including folks who also work outside the SIG or subproject.+-->++## Design Details++### Updates++Updates to the following pod fields are exempt from policy checks, meaning that if a pod update+request only changes these fields it will not be denied even if the pod is in violation of the+current policy level:++- Any metadata updates _EXCEPT_ changes to the seccomp or apparmor annotations:+    - `seccomp.security.alpha.kubernetes.io/pod` (deprecated)+    - `container.seccomp.security.alpha.kubernetes.io/*` (deprecated)+    - `container.apparmor.security.beta.kubernetes.io/*`+- Valid updates to `.spec.activeDeadlineSeconds`+- Valid updates to `.spec.tolerations`+- Valid updates to [Pod resources](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)++Note that updates to container images WILL require a policy reevaluation.++Pod status & nodeName updates are handled by `status` and `binding` subresource requests+respectively, and are not checked against policies.++Update requests to Pods and [PodTemplate resources](#podtemplate-resources) will reevaluate the full object+against audit & warn policies, independent of which fields are being modified.++#### Ephemeral Containers++In the initial implementation, ephemeral containers will be subject to the same policy restrictions,+and adding or updating ephemeral containers will require a full policy check.++<<[UNRESOLVED]>>++Once ephemeral containers allow [custom security contexts], it may be desireable to run an ephemeral+container with higher privileges for debugging purposes. For example, CAP_SYS_PTRACE is forbidden by+the baseline policy but can be useful in debugging. We could introduce yet-another-mode-label that+only applies enforcement to ephemeral containers (defaults to the allow policy).++[custom security contexts]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/277-ephemeral-containers#configurable-security-policy++One way this could be handled under the current model is:+1. Exempt a special username (not one that can be authenticated directly) from policy enforcement,+   e.g. `ops:privileged-debugger`+2. Grant the special user permission to ONLY operate on the ephemeral containers subresource (it is+   critical that they cannot create or update pods directly).+3. Grant (real) users that should have privileged debug capability the ability to impersonate the+   exempt user.++We could consider ways to streamline the user experience of this, for instance adding a special RBAC+binding that exempts users when operating on the ephemeral containers subresource (e.g. an+`escalate-privilege` verb on the ephemeral containers subresource).++<<[/UNRESOLVED]>>++#### Other Pod Subresources++Aside from ephemeral containers, the policy is not checked for any other Pod subresources (status,+bind, logs, exec, attach, port-forward).++Although annotations can be updated through the status subresource, the apparmor annotations are+immutable and the seccomp annotations are deprecated and slated for removal in v1.23.++### Pod Security Standards++Policy level definitions are hardcoded and unconfigurable out of the box. However, the [Pod Security+Standards] leave open ended guidance on a few items, so we must make a static decision on how to+handle these elements:++**HostPorts** - (baseline) HostPorts will be forbidden. This is a more niche feature, and violates+the container-host boundary.

@thockin this solves the port holding problem, right?

tallclair

comment created time in a day

pull request commentkubernetes/enhancements

KEP-2568: Run control-plane as non-root in kubeadm.

@neolit123 - If the changes look good to you, please approve so that I can merge this PR with the KEP in provisional state. Thanks!

vinayakankugoyal

comment created time in a day

pull request commentkubernetes/enhancements

[wip] Introduce minReadySeconds in StatefulSets

@ravisantoshgudimetla: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify 8964397314cf0984fb1baffae335bc461cd3b771 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

ravisantoshgudimetla

comment created time in a day

pull request commentkubernetes/enhancements

[wip] Introduce minReadySeconds in StatefulSets

@ravisantoshgudimetla: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify c8d2056904c87cb727663ba9aaec2669d3576f02 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

ravisantoshgudimetla

comment created time in a day

pull request commentkubernetes/enhancements

KEP-1664: Better Support for Dual-Stack Node Addresses

@danwinship do we still need the KEP?

so, of the Goals in the KEP:

  1. Assign dual-stack Pod.Status.PodIPs to host-network Pods on nodes that have both IPv4 and IPv6 IPs

This was implemented in https://github.com/kubernetes/kubernetes/pull/95239

  1. Make the necessary changes to kubelet to allow bare-metal clusters to have dual-stack node IPs (either auto-detected or specified on the command line) rather than limiting them to a single node IP.

95239 allows manually specifying dual-stack node IPs for bare metal clusters on the command line. It does not do dual-stack autodetection.

  1. Define how cloud providers should handle IPv4 and IPv6 node IPs in different cluster types (single-stack IPv4, single-stack IPv6, dual-stack) so as to enable IPv6/dual-stack functionality in clusters that want it without accidentally breaking old IPv4-only clusters.

This is not done, but also, it seems like https://github.com/kubernetes/kubernetes/pull/86918 (dual-stack IPs on AWS) may merge now, and people are just leaning toward "well if your cluster has pods that will get confused by seeing dual-stack node addresses then you shouldn't be running on nodes with dual-stack IPs".

  1. Make built-in cloud providers and external cloud providers behave the same way with respect to detecting and overriding the Node IP(s). Allow administrators to override both IPv4 and IPv6 Node IPs in dual-stack clusters.

This is not done but as discussed in https://github.com/kubernetes/enhancements/pull/1665#issuecomment-706146957 it's possibly not important, unless we care about letting people install single-stack IPv6 clusters on dual-stack cloud nodes. Arguably we should not care about that, since clouds that don't allow provisioning single-stack IPv6 nodes are probably not going to be able to bring up single-stack IPv6 clusters anyway (eg, due to IPv4-only DNS or other things like that).

  1. Find a home for the node-address-handling code which is shared between kubelet and external cloud providers.

Not done


There was also discussion in the PR about introducing alternative --node-ip behavior (https://github.com/kubernetes/enhancements/pull/1665#discussion_r448461895, https://github.com/kubernetes/enhancements/pull/1665#issuecomment-702410028) which maybe would be a good idea. In particular, the current kubelet bare-metal node-ip autodetection code is basically useless if you have multiple IPs, and in OCP we run a separate program to detect the node IP and then pass it to kubelet explicitly.

Also I think at some point there was discussion about the fact that we never pluralized pod.Status.HostIP, and maybe we should (but also maybe we don't actually need to).


It would be cool to have some kubelet option for better node IP autodetection. eg, the default should be something like "the first non-secondary non-deprecated IP on the lowest-ifindexed NIC that contains a default route which is not out-prioritied by any other default route", and there could maybe also additional modes ("use dns", "use interface ethX", "use the interface that has the route that would be used to reach IP W.X.Y.Z")

Combining also with some of the discussion in https://github.com/kubernetes/kubernetes/pull/95768, and also kind of https://github.com/kubernetes/kubernetes/issues/96981, I'd say maybe there's an argument for having a mode where Node.Status.Addresses always contains exactly one NodeInternalIP (and nothing else) on single-stack, and exactly two NodeInternalIPs (and nothing else) on dual-stack.

danwinship

comment created time in a day

pull request commentkubernetes/enhancements

[wip] Introduce minReadySeconds in StatefulSets

@ravisantoshgudimetla: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-enhancements-verify 2282284ebc5b0b9290b25905db86c6cab3d7a338 link /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

ravisantoshgudimetla

comment created time in a day

issue commentkubernetes/enhancements

Allow the configuration of the interval of container runtime healthcheck

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

tharun208

comment created time in a day

Pull request review commentkubernetes/enhancements

sig-release: Add release cadence KEP

+title: Defining the Kubernetes Release Cadence+kep-number: 2572+authors:+  - "@justaugustus"+owning-sig: sig-release+participating-sigs:+  - sig-architecture+  - sig-testing+status: implementable+creation-date: 2021-01-21+reviewers:+  - "@BenTheElder"+  - "@derekwaynecarr"+  - "@dims"+  - "@ehashman"+  - "@hasheddan"+  - "@jeremyrickard"+  - "@johnbelamaric"+  - "@spiffxp"+  - "@stevekuznetsov"+approvers:+  - "@LappleApple"+  - "@saschagrunert"++# The target maturity stage in the current dev cycle for this KEP.+stage: alpha++# The most recent milestone for which work toward delivery of this KEP has been+# done. This can be the current (upcoming) milestone, if it is being actively+# worked on.+latest-milestone: "v1.22"++# The milestone at which this feature was, or is targeted to be, at each stage.+milestone:+  alpha: "v1.22"+  beta: "v1.23"+  stable: "v1.25"

A similar KEP in terms of it being about a process change: https://github.com/kubernetes/enhancements/tree/master/keps/sig-docs/1326-third-party-content-in-docs

justaugustus

comment created time in 2 days

Pull request review commentkubernetes/enhancements

[WIP] Add a KEP for supporting AllPorts services.

+# KEP-2610: AllPorts Services++<!-- toc -->+- [Release Signoff Checklist](#release-signoff-checklist)+- [Summary](#summary)+- [Motivation](#motivation)+  - [Goals](#goals)+  - [Non-Goals](#non-goals)+- [Proposal](#proposal)+  - [User Stories (Optional)](#user-stories-optional)+    - [Story 1](#story-1)+    - [Story 2](#story-2)+  - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)+  - [Risks and Mitigations](#risks-and-mitigations)+- [Design Details](#design-details)+  - [Test Plan](#test-plan)+  - [Graduation Criteria](#graduation-criteria)+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)+  - [Version Skew Strategy](#version-skew-strategy)+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)+  - [Monitoring Requirements](#monitoring-requirements)+  - [Dependencies](#dependencies)+  - [Scalability](#scalability)+  - [Troubleshooting](#troubleshooting)+- [Implementation History](#implementation-history)+- [Drawbacks](#drawbacks)+- [Alternatives](#alternatives)+- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)+<!-- /toc -->++## Release Signoff Checklist++Items marked with (R) are required *prior to targeting to a milestone / release*.++- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)+- [ ] (R) KEP approvers have approved the KEP status as `implementable`+- [ ] (R) Design details are appropriately documented+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)+- [ ] (R) Graduation criteria is in place+- [ ] (R) Production readiness review completed+- [ ] (R) Production readiness review approved+- [ ] "Implementation History" section is up-to-date for milestone+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes++<!--+**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.+-->++[kubernetes.io]: https://kubernetes.io/+[kubernetes/enhancements]: https://git.k8s.io/enhancements+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes+[kubernetes/website]: https://git.k8s.io/website++## Summary++Today, a Kubernetes Service accepts a list of ports to be exposed by it. It is possible to specify any number of ports, including the entire port range, by enlisting them in the service spec. This can be tedious if the service needs a large number of ports. This KEP proposes to add a new field to the ServicePort spec to allow exposing the entire port range( 1 to 65535). ++## Motivation++There are several applications like SIP apps or RTP which needs a lot of ports to run multiple calls or media streams. Currently the only option is to specify every port in the+Service Spec. A request for port ranges in Services has been open in https://github.com/kubernetes/kubernetes/issues/23864. Implementing port ranges are challenging since iptables/ipvs do not support port ranges. Hence, the proposal to set a single field in order to expose the entire port range and implement the service clients and endpoints accordingly.++### Goals++* Allow users to optionally expose the entire port range via a Service (of Type LoadBalancer or ClusterIP).++### Non-Goals++* Supporting Port Ranges in a Service.+* Changing the default behavior of Service ports.++## Proposal++The proposal here is to introduce an “allPorts” boolean field to the [servicePort API.](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#serviceport-v1-core)+This field is introduced within ServicePort rather than ServiceSpec, because a ServicePort entry is needed to infer the protocol.++This field will be supported for Service Type=ClusterIP and ServiceType=LoadBalancer. This is not applicable to ExternalName services. NodePort services are not supported.+A NodePort service accepts traffic on a given port on the Node, redirecting it to the specified targetPort of the service endpoints. Support AllPorts for a NodePort service means -+traffic to any port on the node, will be forwarded to its endpoints on the same port. This could potentially break networking on the node, if traffic for, say, port 22 got forwarded to \<endpoint IP\>:22.++In order to expose the entire port range on a supported service, a user needs to:+1) Create a single ServicePort entry in the ServiceSpec and set "allPorts" to True.+2) Populate the protocol field.+3) Populate a valid port in the port field.+   This is required to make sure we do not break other Service API clients (like DNS) that might always expect a non-zero port value. The [APIServer validation](https://github.com/kubernetes/kubernetes/blob/b9ce4ac212d150212485fa29d62a2fbd783a57b0/pkg/apis/core/validation/validation.go#L4373) could be relaxed, but we might break other clients.++By default, a NodePort will assigned for the single valid port in the service, if it is of Service Type=LoadBalancer. Traffic Forwarding will work as usual for that NodePort.+NodePort allocation can be disabled by setting "allocateLoadBalancerNodePorts: false" on the service spec.++### Risks and Mitigations++A user could accidentally expose the entire port range on their cluster nodes, by enabling it for a Service. This is mostly applicable to LoadBalancer services, that are typically+accessible from outside the cluster.+The behavior depends on the firewall implementation in the environment. Users should make sure that traffic to NodeIP:\<servicePor\t> is disallowed by the right firewall rules.  +If such firewalling does not exist, it is possible that ports on the nodes are exposed via Services even today, but will allPorts, it becomes trivial to accidentally expose+the entire port range.++Currently, kube-proxy adds the right rules to only allow traffic to the ServiceIP/Port combination specified in the service. When using allPorts, kube-proxy will allow all traffic for the given protocol and given serviceIP. Kube-Proxy alone cannot drop traffic to NodeIP:\<servicePort\>.++## Design Details++Changes are required to APIServer validation, kube-proxy and controllers that use the ServicePort field.++1) New Validation checks:++   * allPorts can be set to true only for ClusterIP and LoadBalancer services.+   * Only one ServicePort object per protocol can be specified in the ServiceSpec, if allPorts is set to true.++2) Kube-Proxy should configure iptables/ipvs rules by skipping port filter, if allPorts is true.+3) LoadBalancer controllers should create LoadBalancer resources with the appropriate port values.+4) TODO - Verify if Endpoints and EndpointSlices controller can continue to fill in the single port number as Endpoint Port.++DNS SRV Records, environment variables will continue to be created for the single valid port value specified.++### Test Plan++<!--+**Note:** *Not required until targeted at a release.*++Consider the following in developing a test plan for this enhancement:+- Will there be e2e and integration tests, in addition to unit tests?+- How will it be tested in isolation vs with other components?++No need to outline all of the test cases, just the general strategy. Anything+that would count as tricky in the implementation, and anything particularly+challenging to test, should be called out.++All code is expected to have adequate tests (eventually with coverage+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]+when drafting this test plan.++[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md+-->++### Graduation Criteria++<!--+**Note:** *Not required until targeted at a release.*++Define graduation milestones.++These may be defined in terms of API maturity, or as something else. The KEP+should keep this high-level with a focus on what signals will be looked at to+determine graduation.++Consider the following in developing the graduation criteria for this enhancement:+- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]+- [Deprecation policy][deprecation-policy]++Clearly define what graduation means by either linking to the [API doc+definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)+or by redefining what graduation means.++In general we try to use the same stages (alpha, beta, GA), regardless of how the+functionality is accessed.++[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions+[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/++Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].++#### Alpha -> Beta Graduation++- Gather feedback from developers and surveys+- Complete features A, B, C+- Tests are in Testgrid and linked in KEP++#### Beta -> GA Graduation++- N examples of real-world usage+- N installs+- More rigorous forms of testing—e.g., downgrade tests and scalability tests+- Allowing time for feedback++**Note:** Generally we also wait at least two releases between beta and+GA/stable, because there's no opportunity for user feedback, or even bug reports,+in back-to-back releases.++#### Removing a Deprecated Flag++- Announce deprecation and support policy of the existing flag+- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)+- Address feedback on usage/changed behavior, provided on GitHub issues+- Deprecate the flag++**For non-optional features moving to GA, the graduation criteria must include +[conformance tests].**++[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md+-->++### Upgrade / Downgrade Strategy++<!--+If applicable, how will the component be upgraded and downgraded? Make sure+this is in the test plan.++Consider the following in developing an upgrade/downgrade strategy for this+enhancement:+- What changes (in invocations, configurations, API use, etc.) is an existing+  cluster required to make on upgrade, in order to maintain previous behavior?+- What changes (in invocations, configurations, API use, etc.) is an existing+  cluster required to make on upgrade, in order to make use of the enhancement?+-->++### Version Skew Strategy++<!--+If applicable, how will the component handle version skew with other+components? What are the guarantees? Make sure this is in the test plan.++Consider the following in developing a version skew strategy for this+enhancement:+- Does this enhancement involve coordinating behavior in the control plane and+  in the kubelet? How does an n-2 kubelet without this feature available behave+  when this feature is used?+- Will any other components on the node change? For example, changes to CSI,+  CRI or CNI may require updating that component before the kubelet.+-->++## Production Readiness Review Questionnaire++<!--++Production readiness reviews are intended to ensure that features merging into+Kubernetes are observable, scalable and supportable; can be safely operated in+production environments, and can be disabled or rolled back in the event they+cause increased failures in production. See more in the PRR KEP at+https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.++The production readiness review questionnaire must be completed and approved+for the KEP to move to `implementable` status and be included in the release.++In some cases, the questions below should also have answers in `kep.yaml`. This+is to enable automation to verify the presence of the review, and to reduce review+burden and latency.++The KEP must have a approver from the+[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)+team. Please reach out on the+[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if+you need any help or guidance.+-->++### Feature Enablement and Rollback++<!--+This section must be completed when targeting alpha to a release.+-->++###### How can this feature be enabled / disabled in a live cluster?++<!--+Pick one of these and delete the rest.+-->++- [ ] Feature gate (also fill in values in `kep.yaml`)+  - Feature gate name:+  - Components depending on the feature gate:+- [ ] Other+  - Describe the mechanism:+  - Will enabling / disabling the feature require downtime of the control+    plane?+  - Will enabling / disabling the feature require downtime or reprovisioning+    of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++###### Does enabling the feature change any default behavior?++<!--+Any change of default behavior may be surprising to users or break existing+automations, so be extremely careful here.+-->++###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?++<!--+Describe the consequences on existing workloads (e.g., if this is a runtime+feature, can it break the existing applications?).++NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.+-->++###### What happens if we reenable the feature if it was previously rolled back?++###### Are there any tests for feature enablement/disablement?++<!--+The e2e framework does not currently support enabling or disabling feature+gates. However, unit tests in each component dealing with managing data, created+with and without the feature, are necessary. At the very least, think about+conversion tests if API types are being modified.+-->++### Rollout, Upgrade and Rollback Planning++<!--+This section must be completed when targeting beta to a release.+-->++###### How can a rollout fail? Can it impact already running workloads?++<!--+Try to be as paranoid as possible - e.g., what if some components will restart+mid-rollout?+-->++###### What specific metrics should inform a rollback?++<!--+What signals should users be paying attention to when the feature is young+that might indicate a serious problem?+-->++###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?++<!--+Describe manual testing that was done and the outcomes.+Longer term, we may want to require automated upgrade/rollback tests, but we+are missing a bunch of machinery and tooling and can't do that now.+-->++###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?++<!--+Even if applying deprecation policies, they may still surprise some users.+-->++### Monitoring Requirements++<!--+This section must be completed when targeting beta to a release.+-->++###### How can an operator determine if the feature is in use by workloads?++<!--+Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,+checking if there are objects with field X set) may be a last resort. Avoid+logs or events for this purpose.+-->++###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?++<!--+Pick one more of these and delete the rest.+-->++- [ ] Metrics+  - Metric name:+  - [Optional] Aggregation method:+  - Components exposing the metric:+- [ ] Other (treat as last resort)+  - Details:++###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?++<!--+At a high level, this usually will be in the form of "high percentile of SLI+per day <= X". It's impossible to provide comprehensive guidance, but at the very+high level (needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code+-->++###### Are there any missing metrics that would be useful to have to improve observability of this feature?++<!--+Describe the metrics themselves and the reasons why they weren't added (e.g., cost,+implementation difficulties, etc.).+-->++### Dependencies++<!--+This section must be completed when targeting beta to a release.+-->++###### Does this feature depend on any specific services running in the cluster?++<!--+Think about both cluster-level services (e.g. metrics-server) as well+as node-level agents (e.g. specific version of CRI). Focus on external or+optional services that are needed. For example, if this feature depends on+a cloud provider API, or upon an external software-defined storage or network+control plane.++For each of these, fill in the following—thinking about running existing user workloads+and creating new ones, as well as about cluster-level services (e.g. DNS):+  - [Dependency name]+    - Usage description:+      - Impact of its outage on the feature:+      - Impact of its degraded performance or high-error rates on the feature:+-->++### Scalability++<!--+For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them.++For beta, this section is required: reviewers must answer these questions.++For GA, this section is required: approvers should be able to confirm the+previous answers based on experience in the field.+-->++###### Will enabling / using this feature result in any new API calls?++<!--+Describe them, providing:+  - API call type (e.g. PATCH pods)+  - estimated throughput+  - originating component(s) (e.g. Kubelet, Feature-X-controller)+Focusing mostly on:+  - components listing and/or watching resources they didn't before+  - API calls that may be triggered by changes of some Kubernetes resources+    (e.g. update of object X triggers new updates of object Y)+  - periodic API calls to reconcile state (e.g. periodic fetching state,+    heartbeats, leader election, etc.)+-->++###### Will enabling / using this feature result in introducing new API types?++<!--+Describe them, providing:+  - API type+  - Supported number of objects per cluster+  - Supported number of objects per namespace (for namespace-scoped objects)+-->++###### Will enabling / using this feature result in any new calls to the cloud provider?++<!--+Describe them, providing:+  - Which API(s):+  - Estimated increase:+-->++###### Will enabling / using this feature result in increasing size or count of the existing API objects?++<!--+Describe them, providing:+  - API type(s):+  - Estimated increase in size: (e.g., new annotation of size 32B)+  - Estimated amount of new objects: (e.g., new Object X for every existing Pod)+-->++###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?++<!--+Look at the [existing SLIs/SLOs].++Think about adding additional work or introducing new steps in between+(e.g. need to do X to start a container), etc. Please describe the details.++[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos+-->++###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?++<!--+Things to keep in mind include: additional in-memory state, additional+non-trivial computations, excessive access to disks (including increased log+volume), significant amount of data sent and/or received over network, etc.+This through this both in small and large cases, again with respect to the+[supported limits].++[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md+-->++### Troubleshooting++<!--+This section must be completed when targeting beta to a release.++The Troubleshooting section currently serves the `Playbook` role. We may consider+splitting it into a dedicated `Playbook` document (potentially with some monitoring+details). For now, we leave it here.+-->++###### How does this feature react if the API server and/or etcd is unavailable?++###### What are other known failure modes?++<!--+For each of them, fill in the following information by copying the below template:+  - [Failure mode brief description]+    - Detection: How can it be detected via metrics? Stated another way:+      how can an operator troubleshoot without logging into a master or worker node?+    - Mitigations: What can be done to stop the bleeding, especially for already+      running user workloads?+    - Diagnostics: What are the useful log messages and their required logging+      levels that could help debug the issue?+      Not required until feature graduated to beta.+    - Testing: Are there any tests for failure mode? If not, describe why.+-->++###### What steps should be taken if SLOs are not being met to determine the problem?++## Implementation History++<!--+Major milestones in the lifecycle of a KEP should be tracked in this section.+Major milestones might include:+- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance+- the `Proposal` section being merged, signaling agreement on a proposed design+- the date implementation started+- the first Kubernetes release where an initial version of the KEP was available+- the version of Kubernetes where the KEP graduated to general availability+- when the KEP was retired or superseded+-->++## Drawbacks++<!--+Why should this KEP _not_ be implemented?+-->++## Alternatives

I think this is the way to implement all-ip services but still with type:LoadBalancer but with no ports.

prameshj

comment created time in 2 days