profile
viewpoint
Alena Varkockova alenkacz Prague http://twitter.com/alenkacz developer, traveller, foodie

alenkacz/gradle-scalafmt 18

Gradle plugin for scalafmt

alenkacz/gradle-json-validator 8

Gradle plugin for json validation

alenkacz/gradle-marathon-deployer 8

Gradle plugin that can deploy your application to Marathon (https://mesosphere.github.io/marathon/)

alenkacz/ci-signal-report 5

generating report for CI signal release team

alenkacz/Expanze 3

XNA game

alenkacz/Colorific 2

HTML5 game

alenkacz/Bookfan 1

Bookfan Android client

alenkacz/akka 0

Build highly concurrent, distributed, and resilient message-driven applications on the JVM

alenkacz/blozinek.cz 0

Github pages for blozinek.cz

alenkacz/bookshelf 0

Distribuovaná knižnica priateľov Rubyslavy

push eventkubernetes/enhancements

Aldo Culquicondor

commit sha bdde5d17cca31879f1b44ca41111061c4b39eaba

PRR as approver in scheduler component config KEP According to KEP template

view details

Kubernetes Prow Robot

commit sha c258a34ae2154db7c75dbfa4c21fa1171ddf4cd1

Merge pull request #1808 from alculquicondor/patch-2 PRR as approver in scheduler component config KEP

view details

push time in 8 minutes

PR merged kubernetes/enhancements

Reviewers
PRR as approver in scheduler component config KEP approved cncf-cla: yes kind/kep lgtm sig/architecture sig/scheduling size/XS

According to KEP template

/assign @ahg-g cc @wojtek-t

+1 -1

2 comments

1 changed file

alculquicondor

pr closed time in 8 minutes

pull request commentkubernetes/enhancements

PRR as approver in scheduler component config KEP

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: <a href="https://github.com/kubernetes/enhancements/pull/1808#issuecomment-633703380" title="Approved">ahg-g</a>, <a href="https://github.com/kubernetes/enhancements/pull/1808#" title="Author self-approved">alculquicondor</a>

The full list of commands accepted by this bot can be found here.

The pull request process is described here

<details > Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment </details> <!-- META={"approvers":[]} -->

alculquicondor

comment created time in 9 minutes

pull request commentkubernetes/enhancements

PRR as approver in scheduler component config KEP

/lgtm /approve

alculquicondor

comment created time in 10 minutes

pull request commentkubernetes/enhancements

PRR as approver in scheduler component config KEP

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: <a href="https://github.com/kubernetes/enhancements/pull/1808#" title="Author self-approved">alculquicondor</a> To complete the pull request process, please assign ahg-g You can assign the PR to them by writing /assign @ahg-g in a comment when ready.

The full list of commands accepted by this bot can be found here.

<details open> Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment </details> <!-- META={"approvers":["ahg-g"]} -->

alculquicondor

comment created time in 11 minutes

PR opened kubernetes/enhancements

PRR as approver in scheduler component config KEP

According to KEP template

/assign @ahg-g cc @wojtek-t

+1 -1

0 comment

1 changed file

pr created time in 11 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 approvers:   - "@wojtekt"   - "@brancz" creation-date: 2019-01-31-last-updated: 2020-05-01+last-updated: 2020-05-19

Please add me as a PRR approver: https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/kep.yaml#L16

chelseychen

comment created time in 17 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't+  happen for 6 minutes, it will be GC-ed).++* **Will enabling / using this feature result in introducing new API types?**++  Yes, a new API type "eventsv1.Event" is being introduced.+  The migration of Event API will cause creation of new types of Event objects.+  The number of Event objects depends on cluster state, which theoretically+  won't be too large due to deduplication logic and reasonable-cardinality+  of objects in the system.

This sentence is super misleading. Suggest:

The number of Event objects depends on the cluster state and its churn. Event deduplication and reasonable cardinality of the fields should keep their number within reasonable boundaries (obviously dependent on cluster size).

chelseychen

comment created time in 23 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't+  happen for 6 minutes, it will be GC-ed).++* **Will enabling / using this feature result in introducing new API types?**++  Yes, a new API type "eventsv1.Event" is being introduced.+  The migration of Event API will cause creation of new types of Event objects.+  The number of Event objects depends on cluster state, which theoretically+  won't be too large due to deduplication logic and reasonable-cardinality+  of objects in the system.++* **Will enabling / using this feature result in any new calls to cloud+  provider?**++  No.++* **Will enabling / using this feature result in increasing size or count+  of the existing API objects?**+  Describe them providing:+  +  The difference in size of the Event object comes from new Action and Related+  fields. We can safely estimate the increase to be smaller than 30%. We'll

Let's change the "We'll ..." sentence to the following:

... 30%.
However, more events may be emitted. As an example, new Event will be emitted for Pod creation done by standard controllers (e.g. ReplicaSet), as they are currently deduplicated across all 'owner' objects. However, given that that are at least 5 other events being emitted during pod startup, the impact for it can be bounded by 20%."
chelseychen

comment created time in 20 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't+  happen for 6 minutes, it will be GC-ed).++* **Will enabling / using this feature result in introducing new API types?**++  Yes, a new API type "eventsv1.Event" is being introduced.+  The migration of Event API will cause creation of new types of Event objects.+  The number of Event objects depends on cluster state, which theoretically+  won't be too large due to deduplication logic and reasonable-cardinality+  of objects in the system.++* **Will enabling / using this feature result in any new calls to cloud+  provider?**++  No.++* **Will enabling / using this feature result in increasing size or count+  of the existing API objects?**+  Describe them providing:+  +  The difference in size of the Event object comes from new Action and Related+  fields. We can safely estimate the increase to be smaller than 30%. We'll+  also emit additional Event per Pod creation, as currently Events for that+  are being deduplicated. There are currently at least 6 Events emitted when+  Pod is started, so impact of this change can be bounded by 20%. This means+  that in the worst case the increase in Event size can be bounded by 56%.

I've never really understood where this 56 is coming from. Especially, given the above is just an example.

So I suggest rephrasing this sentence to:

In total, we estimated that increase in total size of all Events can be conservatively bounded by ~50%, but practical boundary should be much smaller.
chelseychen

comment created time in 18 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't+  happen for 6 minutes, it will be GC-ed).++* **Will enabling / using this feature result in introducing new API types?**++  Yes, a new API type "eventsv1.Event" is being introduced.+  The migration of Event API will cause creation of new types of Event objects.+  The number of Event objects depends on cluster state, which theoretically+  won't be too large due to deduplication logic and reasonable-cardinality+  of objects in the system.++* **Will enabling / using this feature result in any new calls to cloud+  provider?**++  No.++* **Will enabling / using this feature result in increasing size or count+  of the existing API objects?**+  Describe them providing:+  +  The difference in size of the Event object comes from new Action and Related+  fields. We can safely estimate the increase to be smaller than 30%. We'll+  also emit additional Event per Pod creation, as currently Events for that+  are being deduplicated. There are currently at least 6 Events emitted when+  Pod is started, so impact of this change can be bounded by 20%. This means+  that in the worst case the increase in Event size can be bounded by 56%.++* **Will enabling / using this feature result in increasing time taken by any+  operations covered by [existing SLIs/SLOs][]?**+  +  No++* **Will enabling / using this feature result in non-negligible increase of+  resource usage (CPU, RAM, disk, IO, ...) in any components?**+  +  The potential increase of Event size might cause non-negligible storage+  increase in Etcd.

Which means also:

  • network bandwidth to sent them
  • cpu to process them

[neither of them should be dominating to what we already have, but should be mentioned]

chelseychen

comment created time in 18 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.

This one is a bit misleading. And also isn't fully true imho.

Basically, the currently implemented fallback happens only at component initialization - if the API was enabled at that point, we will never recheck it later. So I would say, that if you enable/disable the API, you also need to restart the components using that.

chelseychen

comment created time in 32 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.

I don't really understand this one - can you clarify?

chelseychen

comment created time in 30 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't+  happen for 6 minutes, it will be GC-ed).++* **Will enabling / using this feature result in introducing new API types?**++  Yes, a new API type "eventsv1.Event" is being introduced.+  The migration of Event API will cause creation of new types of Event objects.

Let's remove this sentence. Given they have common representation in etcd (they are roundtrippable), it's not fully true.

chelseychen

comment created time in 26 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**

Too high/too low (vs what is expected) apiserver_request_total: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L66

[that may suggest bug in the library]

chelseychen

comment created time in 28 minutes

Pull request review commentkubernetes/enhancements

Add PRR questionnaire section to New Event API KEP

 List Events from the NamespaceSystem with field selector `reportingController =  List all Event with field selector `regarding.name = podName, regarding.namespace = podNamespace`, and `related.name = podName, related.namespace = podNamespace`. You need to join results outside of the kubernetes API. +## Production Readiness Review Questionnaire++### Feature enablement and rollback++_This section must be completed when targeting alpha to a release._++* **How can this feature be enabled / disabled in a live cluster?**+  - [ ] Feature gate (also fill in values in `kep.yaml`)+    - Feature gate name:+    - Components depending on the feature gate:+  - [x] Other+    - Describe the mechanism:++      (1) The API itself can be enabled / disabled at kube-apiserver level+      by using `--runtime-config` flag;++      (2) For the use of API, we have a fallback mechanism instead of using+      a feature gate. That is, we simply fallback to the old Event libraries+      if the API is diabled.++      Currently this fallback is implemented purely in scheduler but we're+      planning to move it into the library itself.++    - Will enabling / disabling the feature require downtime of the control+      plane?++      (1) Yes, enabling API requires to restart apiserver.++      (2) No, enabling the use of the API doesn't require that.++    - Will enabling / disabling the feature require downtime or reprovisioning+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).++      No.++* **Does enabling the feature change any default behavior?**+  Any change of default behavior may be surprising to users or break existing+  automations, so be extremely careful here.++  While the graduation of the API itself doesn't change default behavior,+  migration of individual components does, as the events will be reported+  differently.++* **Can the feature be disabled once it has been enabled (i.e. can we rollback+  the enablement)?**+  Also set `rollback-supported` to `true` or `false` in `kep.yaml`.+  Describe the consequences on existing workloads (e.g. if this is runtime+  feature, can it break the existing applications?).++  Yes. If the new Event API is disabled, it will fallback to the original one +  (The new events are roundtrippable with the old `corev1.Events`).++  If individual components don't implement it, rollback of client-library use+  may not be possible (i.e. they only fallback to the old API if the new API+  is disabled, if there is bug in the client-library, there is no way to+  fallback as of now).++* **What happens if we reenable the feature if it was previously rolled back?**++  New types of Events will be generated instead of the old one.++* **Are there any tests for feature enablement/disablement?**+  The e2e framework does not currently support enabling and disabling feature+  gates. However, unit tests in each component dealing with managing data created+  with and without the feature are necessary. At the very least, think about+  conversion tests if API types are being modified.++  Manual tests will be performed to ensure things work when either enabling+  or disabling the new Event API.++  More information in [Test Plan](#test-plan) section.++### Rollout, Upgrade and Rollback Planning++_This section must be completed when targeting beta graduation to a release._++* **How can a rollout fail? Can it impact already running workloads?**+  Try to be as paranoid as possible - e.g. what if some components will restart+  in the middle of rollout?++  A rollout could fail if some components restart in the middle of the rollout.+  Then those components will continue using the old Event API.++* **What specific metrics should inform a rollback?**++* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**+  Describe manual testing that was done and the outcomes.+  Longer term, we may want to require automated upgrade/rollback tests, but we+  are missing a bunch of machinery and tooling and do that now.++  Not yet. It could be done by enabling / disabling new Event API.++* **Is the rollout accompanied by any deprecations and/or removals of features,+  APIs, fields of API types, flags, etc.?**+  Even if applying deprecation policies, they may still surprise some users.++  State field of EventSeries will be removed from corev1.Event API.++### Monitoring requirements++_This section must be completed when targeting beta graduation to a release._++* **How can an operator determine if the feature is in use by workloads?**+  Ideally, this should be a metrics. Operations against Kubernetes API (e.g.+  checking if there are objects with field X set) may be last resort. Avoid+  logs or events for this purpose.++  The API, as a feature that workloads may in theory use,+  can be determined by looking into the apiserver_requests_total metric.++* **What are the SLIs (Service Level Indicators) an operator can use to+  determine the health of the service?**+  - [x] Metrics+    - Metric name: apiserver_requests_total+    - Components exposing the metric: kube-apiserver+  - [ ] Other (treat as last resort)+    - Details:++* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**+  At the high-level this usually will be in the form of "high percentile of SLI+  per day <= X". It's impossible to provide a comprehensive guidance, but at the very+  high level (they needs more precise definitions) those may be things like:+  - per-day percentage of API calls finishing with 5XX errors <= 1%+  - 99% percentile over day of absolute value from (job creation time minus expected+    job creation time) for cron job <= 10%+  - 99,9% of /health requests per day finish with 200 code++  Events have always been "best-effort".+  We're sticking to that with the new API too, so no SLO will be introduced.++* **Are there any missing metrics that would be useful to have to improve+  observability if this feature?**+  Describe the metrics themselves and the reason they weren't added (e.g. cost,+  implementation difficulties, etc.).++  No.++### Dependencies++_This section must be completed when targeting beta graduation to a release._++* **Does this feature depend on any specific services running in the cluster?**+  Think about both cluster-level services (e.g. metrics-server) as well+  as node-level agents (e.g. specific version of CRI). Focus on external or+  optional services that are needed. For example, if this feature depends on+  a cloud provider API, or upon an external software-defined storage or network+  control plane.++  For each of the fill in the following, thinking both about running user workloads+  and creating new ones, as well as about cluster-level services (e.g. DNS):+  +  N/A+++### Scalability++_For alpha, this section is encouraged: reviewers should consider these questions+and attempt to answer them._++_For beta, this section is required: reviewers must answer these questions._++_For GA, this section is required: approvers should be able to confirms the+previous answers based on experience in the field._++* **Will enabling / using this feature result in any new API calls?**+  Describe them, providing:++  In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed+  to update Event status and prevent garbage collection in etcd. This heartbeat+  is happening for events that are happening all the time (If an event didn't

nit: s/all the time/periodically/ ?

chelseychen

comment created time in 27 minutes

issue openedavast/gradle-docker-compose-plugin

Please list those properties: "... few other properties."

Configuration of the nested settings defaults to the main dockerCompose settings (declared before the nested settings), except projectName, startedServices, and few other properties.

created time in 27 minutes

pull request commentlinkerd/linkerd2

cli: rename cluster cli command to multicluster

@zaharidichev solved the conflicts.

psinghal20

comment created time in 34 minutes

issue commentkubernetes/enhancements

Immutable Secrets and ConfigMaps

@mikejoh - opened https://github.com/kubernetes/website/pull/21189

wojtek-t

comment created time in an hour

pull request commentkubernetes/enhancements

An initial version of the External TLS certificate authenticator KEP

Based on the feedback from @enj and @awly, we have came up with a draft of a protocol for the communication between kubectl/client-go and external signer using gRPC over a unix socket. Please find it below.

syntax = "proto3";

package v1alpha1;

// This service defines the public APIs for external signer plugin.
service ExternalSignerService {
    // Version returns the version of the external signer plugin.
    rpc Version(VersionRequest) returns (VersionResponse) {}
    // Get certificate from the external signer.
    rpc GetCertificate(CertificateRequest) returns (stream CertificateResponse) {}
    // Execute signing operation in the external signer plugin.
    rpc Sign(SignatureRequest) returns (stream SignatureResponse) {}
}
message VersionRequest {
    // Version of the external signer plugin API.
    string version = 1;
}
message VersionResponse {
    // Version of the external signer plugin API.
    string version = 1;
}
message CertificateRequest {
    // Version of the external signer plugin API.
    string version = 1;
    // Name of the Kubernetes cluster.
    string clusterName = 2;
    // Configuration of the external signer plugin. This configuration is specific to the external signer, but stored in KUBECONFIG for the user's convenience to allow multiplexing a single external signer for several K8s users.
    map<string, string> configuration = 3;
}
message CertificateResponse {
    oneof content {
        // Client certificate.
        bytes certificate = 1;
        // User prompt.
        string userPrompt = 2;
    }
}
message SignatureRequest {
    // Version of the external signer plugin API.
    string version = 1;
    // Name of the Kubernetes cluster.
    string clusterName = 2;
    // Configuration of the external signer plugin (HSM protocol specific).
    map<string, string> configuration = 3;
    // Digest to be signed.
    bytes digest = 4;
    // Enumeration of supported signer types.
    enum SignerType {
        RSAPSS = 0;
    }
    // Type of signer.
    SignerType signerType = 5;
    // Definition of options for creating the PSS signature.
    message RSAPSSOptions {
        // Length of the salt for creating the PSS signature.
        int32 saltLenght = 1;
        // Hash function for creating the PSS signature.
        uint32 hash = 2;
    }
    // Options for creating the PSS signature (used when signerType is set to RSAPSS).
    RSAPSSOptions signerOptsRSAPSS = 6;
}
message SignatureResponse {
    oneof content {
        // Signature.
        bytes signature = 1;
        // User prompt.
        string userPrompt = 2;
    }
}
jakubkrzywda

comment created time in an hour

issue commentkudobuilder/kudo

Cluster-scoped resources are not being deleted after operator instance is removed

See https://github.com/kudobuilder/kudo.dev/pull/247

alembiewski

comment created time in an hour

pull request commentkubernetes/enhancements

Update release-notes KEP to reflect the current state

@saschagrunert thanks for the PR. I think your changes capture most of the current direction I understand the project to be taking. There are three different areas we could address when thinking about the KEP:

  1. Updating general design issues to where they are now and where they are heading. I think you already mentioned most of them.
  2. Addressing those areas where the original ideas from @jeefy 's KEP have already been implemented.
  3. Future plans and direction

Here what I think about those three items:

  1. As I said before, I think your changes reflect the current design well, those areas that have shifted from a year ago when the KEP was written.
  2. Regarding the actual implementation, I like your checklist because it shows what has been implemented without altering the KEP much. If we are trying to reflecting the current progress of the implementation in the KEP itself, there are other areas we ought to note as well. For example, the fact that the website is already up, with it's own domain and out of the personal repo.
  3. Finally, there are the plans that lie ahead of us (as of 2020). I think we are due for a good talk on the focus of the tools and how they are used. Mostly derived from the current status of the code but also from the human/organizational side of things.

But perhaps this last point should left out of the KEP. After all, the original intent of the KEP was this:

this KEP would graduate once we have a dedicated release notes website that is automatically updated with minimal human interaction.

And we are at the brink of that. In fact, if we were to leave out the scope of the KEP the bucket issue we could say that the original mission of the KEP has already been fulfilled already as it only takes one command to go from nothing to the PR that updates the website.

What do you think ?

saschagrunert

comment created time in an hour

PR opened mesosphere/kubernetes-base-addons

Move to KUDO based istio operator

What type of PR is this? Feature

What this PR does/ why we need it: Istio 1.5.x is from KUDO based istio operator. The operator repo is mesosphere/kudo-istio

Which issue(s) this PR fixes: no issue

Special notes for your reviewer:

Does this PR introduce a user-facing change?: NONE

Checklist

  • [ ] The commit message explains the changes and why are needed.
  • [ ] The code builds and passes lint/style checks locally.
  • [ ] The relevant subset of integration tests pass locally.
  • [ ] The core changes are covered by tests.
  • [ ] The documentation is updated where needed.
+39 -0

0 comment

1 changed file

pr created time in an hour

pull request commentkubernetes/enhancements

optionally disable node ports for Service Type=LoadBalancer

I think a flag is fine but not a boolean. If possible coordinate some --lb-default-mode with #1392. If I have got it right some state must be added in the service object like "bind-always", "bind-never", can a "disable-nodeport" also be an option here? the --lb-default-mode would then be the default for new services.

andrewsykim

comment created time in 2 hours

create barnchmesosphere/kubernetes-base-addons

branch : deepak/istio

created branch time in 2 hours

issue commentcontainerd/containerd

Pod is stuck in terminating: runc did not terminate sucessfully: container does not exist

My Windows worker nodes are being flooded with these:

E0524 17:24:38.357894 4028 remote_runtime.go:495] ListContainerStats with filter &ContainerStatsFilter{Id:,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem f76dd003c4f827c36bb10d24d08e9f14f6e68be728b727126741fd77c6185ac3: A virtual machine or container with the specified identifier does not exist. E0524 17:24:38.357894 4028 eviction_manager.go:255] eviction manager: failed to get summary stats: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem f76dd003c4f827c36bb10d24d08e9f14f6e68be728b727126741fd77c6185ac3: A virtual machine or container with the specified identifier does not exist.

Any relation?

Thanks.

JulienBalestra

comment created time in 2 hours

issue commentkubernetes/enhancements

Seccomp

@palnabarun here's the current ones: https://github.com/kubernetes/kubernetes/pull/91381 https://github.com/kubernetes/kubernetes/pull/91408 https://github.com/kubernetes/kubernetes/pull/91182

I also created an umbrella issue that contained all of them.

pweil-

comment created time in 2 hours

issue commentlinkerd/linkerd2

Header based routing

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

bnookala

comment created time in 3 hours

issue commentkubernetes/enhancements

Seccomp

@pjbgf -- Can you please link to all the implementation PR's here - k/k or otherwise? :slightly_smiling_face:


The current release schedule is:

  • ~Monday, April 13: Week 1 - Release cycle begins~
  • ~Tuesday, May 19: Week 6 - Enhancements Freeze~
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released
pweil-

comment created time in 3 hours

issue commentkubernetes/enhancements

Allow users to set a pod’s hostname to its Fully Qualified Domain Name (FQDN)

@javidiaz -- Can you please link to all the implementation PR's here - k/k or otherwise? :slightly_smiling_face:


The current release schedule is:

  • ~Monday, April 13: Week 1 - Release cycle begins~
  • ~Tuesday, May 19: Week 6 - Enhancements Freeze~
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released
javidiaz

comment created time in 3 hours

more