profile
viewpoint

dcos-labs/professional-services 9

Public resources from the Mesosphere Professional Services team

dcos-labs/charts 6

Mesosphere Kubernetes-a-a-S Helm charts repository

dcos-labs/marathon_exporter 2

A Prometheus metrics exporter for the Marathon Mesos framework

gerred/Anypic 1

An open source mobile and web app that lets users share photos similar to Instagram

gerred/busybox-ssl 1

A busybox container with SSL support

startedaquasecurity/trivy

started time in 15 days

starteddefold/defold

started time in 17 days

issue openedBetterThanTomorrow/calva

Prevent extra closing brackets in strict mode

Currently, if you open a form, the editor gives you the closing bracket as expected:

()

However, in either mode, you currently can add more closing brackets to this. These brackets are then not deleted (despite there being no form to balance) unless you perform a force delete:

`()))))]])} ; backspacing or deleting over these will not remove the unbalanced trailing brackets!

Should this automatically work in strict mode? I go back and forth on this, as force delete gives me a mechanism to still do what I want.

created time in 19 days

startedsekey/sekey

started time in 25 days

startedrapid-sensemaking-framework/noflo-rsf

started time in a month

pull request commentkudobuilder/kuttl

ENV VAR shell expansion to KUTTL commands

Discussed in standup this only affects TestStep, so this looks good.

kensipe

comment created time in a month

pull request commentkudobuilder/kuttl

ENV VAR shell expansion to KUTTL commands

Is there a way to escape this to still use literals? This will be important for pod commands that take env vars from a ConfigMap, Secret, or literal.

It might be worth considering changing the syntax over to something like ${MYENVVAR} so we can leave Deployment specs alone, since that's equally hard to change in Kubernetes itself, and might cause issues with kuttl trying to expand something that should be a string literal, yeah?

kensipe

comment created time in a month

issue commentBetterThanTomorrow/calva

Calva commands not found

This works for me. Thanks team!

AFrancoB

comment created time in a month

issue commentBetterThanTomorrow/calva

Calva commands not found

Absolutely @PEZ! Glad we could bring a close to this issue, I hadn't actually run into it until today when I went to add some new global aliases. :D

AFrancoB

comment created time in a month

issue commentBetterThanTomorrow/calva

Calva commands not found

I figured it out.

I went to open ~/.clojure/deps.edn by itself as a file, rather than the folder containing it.

AFrancoB

comment created time in a month

issue commentBetterThanTomorrow/calva

Calva commands not found

My ExtensionHost logs:

[2020-05-04 11:48:02.494] [exthost] [info] extension host started
[2020-05-04 11:48:02.506] [exthost] [info] ExtensionService#_doActivateExtension borkdude.clj-kondo {"startup":false,"extensionId":{"value":"borkdude.clj-kondo","_lower":"borkdude.clj-kondo"},"activationEvent":"onLanguage:clojure"}
[2020-05-04 11:48:02.506] [exthost] [info] ExtensionService#loadCommonJSModule file:///Users/gerred/.vscode/extensions/borkdude.clj-kondo-2020.5.2/out/extension.js
[2020-05-04 11:48:02.548] [exthost] [info] ExtensionService#_doActivateExtension vscode.debug-auto-launch {"startup":true,"extensionId":{"value":"vscode.debug-auto-launch","_lower":"vscode.debug-auto-launch"},"activationEvent":"*"}
[2020-05-04 11:48:02.548] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/debug-auto-launch/dist/extension
[2020-05-04 11:48:02.551] [exthost] [info] ExtensionService#_doActivateExtension vscode.emmet {"startup":true,"extensionId":{"value":"vscode.emmet","_lower":"vscode.emmet"},"activationEvent":"*"}
[2020-05-04 11:48:02.551] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/emmet/dist/extension
[2020-05-04 11:48:02.560] [exthost] [info] ExtensionService#_doActivateExtension vscode.git {"startup":true,"extensionId":{"value":"vscode.git","_lower":"vscode.git"},"activationEvent":"*"}
[2020-05-04 11:48:02.560] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/main
[2020-05-04 11:48:02.575] [exthost] [info] ExtensionService#_doActivateExtension vscode.github-authentication {"startup":true,"extensionId":{"value":"vscode.github-authentication","_lower":"vscode.github-authentication"},"activationEvent":"*"}
[2020-05-04 11:48:02.575] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/github-authentication/dist/extension.js
[2020-05-04 11:48:02.577] [exthost] [info] ExtensionService#_doActivateExtension vscode.merge-conflict {"startup":true,"extensionId":{"value":"vscode.merge-conflict","_lower":"vscode.merge-conflict"},"activationEvent":"*"}
[2020-05-04 11:48:02.577] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/merge-conflict/dist/extension
[2020-05-04 11:48:02.579] [exthost] [info] ExtensionService#_doActivateExtension vscode.search-result {"startup":true,"extensionId":{"value":"vscode.search-result","_lower":"vscode.search-result"},"activationEvent":"*"}
[2020-05-04 11:48:02.580] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/search-result/dist/extension.js
[2020-05-04 11:48:02.581] [exthost] [info] ExtensionService#_doActivateExtension vscode.vscode-account {"startup":true,"extensionId":{"value":"vscode.vscode-account","_lower":"vscode.vscode-account"},"activationEvent":"*"}
[2020-05-04 11:48:02.581] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/vscode-account/dist/extension.js
[2020-05-04 11:48:02.583] [exthost] [info] eager extensions activated
[2020-05-04 11:48:02.631] [exthost] [info] ExtensionService#_doActivateExtension betterthantomorrow.calva {"startup":false,"extensionId":{"value":"betterthantomorrow.calva","_lower":"betterthantomorrow.calva"},"activationEvent":"onLanguage:clojure"}
[2020-05-04 11:48:02.631] [exthost] [info] ExtensionService#loadCommonJSModule file:///Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension
[2020-05-04 11:48:02.867] [exthost] [error] Activating extension betterthantomorrow.calva failed due to an error:
[2020-05-04 11:48:02.873] [exthost] [error] TypeError: Cannot read property '0' of undefined
	at b (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:1:7163)
	at Object.e.resolvePath (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:1:8259)
	at Object.e.content (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:135:58578)
	at /Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:9:257345
	at Object.e.getConfig (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:9:257533)
	at Object.e.activate (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:135:86665)
	at e.activate (/Users/gerred/.vscode/extensions/betterthantomorrow.calva-2.0.97/out/extension.js:135:31656)
	at Function._callActivateOptional (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:788:244)
	at Function._callActivate (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:787:909)
	at /Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:786:960
	at async Promise.all (index 0)
[2020-05-04 11:48:06.977] [exthost] [info] ExtensionService#_doActivateExtension vscode.configuration-editing {"startup":false,"extensionId":{"value":"vscode.configuration-editing","_lower":"vscode.configuration-editing"},"activationEvent":"onLanguage:json"}
[2020-05-04 11:48:06.977] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/configuration-editing/dist/extension
[2020-05-04 11:48:06.983] [exthost] [info] ExtensionService#_doActivateExtension vscode.extension-editing {"startup":false,"extensionId":{"value":"vscode.extension-editing","_lower":"vscode.extension-editing"},"activationEvent":"onLanguage:json"}
[2020-05-04 11:48:06.983] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/extension-editing/dist/extension
[2020-05-04 11:48:06.995] [exthost] [info] ExtensionService#_doActivateExtension vscode.json-language-features {"startup":false,"extensionId":{"value":"vscode.json-language-features","_lower":"vscode.json-language-features"},"activationEvent":"onLanguage:json"}
[2020-05-04 11:48:06.995] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/json-language-features/client/dist/jsonMain
[2020-05-04 11:48:07.022] [exthost] [info] ExtensionService#_doActivateExtension vscode.npm {"startup":false,"extensionId":{"value":"vscode.npm","_lower":"vscode.npm"},"activationEvent":"onLanguage:json"}
[2020-05-04 11:48:07.022] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/npm/dist/main
[2020-05-04 11:48:07.031] [exthost] [info] ExtensionService#_doActivateExtension vscode.typescript-language-features {"startup":false,"extensionId":{"value":"vscode.typescript-language-features","_lower":"vscode.typescript-language-features"},"activationEvent":"onLanguage:javascript"}
[2020-05-04 11:48:07.031] [exthost] [info] ExtensionService#loadCommonJSModule file:///Applications/Visual Studio Code.app/Contents/Resources/app/extensions/typescript-language-features/dist/extension
[2020-05-04 11:48:10.602] [exthost] [info] ExtensionService#_doActivateExtension ms-vscode.Go {"startup":false,"extensionId":{"value":"ms-vscode.Go","_lower":"ms-vscode.go"},"activationEvent":"workspaceContains:**/*.go"}
[2020-05-04 11:48:10.602] [exthost] [info] ExtensionService#loadCommonJSModule file:///Users/gerred/.vscode/extensions/ms-vscode.go-0.14.1/out/src/goMain

Console output looks the same, with:

Activating extension 'betterthantomorrow.calva' failed: Cannot read property '0' of undefined.

leading up the stack trace.

This is on MacOS Catalina, VSCode 1.44.2, Calva 2.0.97.

AFrancoB

comment created time in a month

startedlvh/caesium

started time in a month

created repositorygerred/kubecap

A Kubernetes controller aimed at brokering CAPABILITIES between applications that have REQUIREMENTS and platforms that have FULFILLMENTS

created time in a month

startedgoogle/go-cmp

started time in a month

pull request commentkudobuilder/kudo

KEP-27: Detailed pod restart control by dependencies hash

@ANeumann82 I'll get this reviewed, sorry for the delay!

ANeumann82

comment created time in a month

issue commentkudobuilder/kudo

re-schedule sts pods after node deletion/failure

One thought that springs to mind of something we'll have to solve:

This likely needs to be able to run a plan, especially in instances where the replaced STS node is treated as a "new" node from a UUID perspective.

Something like https://github.com/kudobuilder/kudo/issues/1338?

zmalik

comment created time in a month

startedalexedwards/argon2id

started time in a month

created repositorygerred/mtg

Mind the Gap

created time in a month

PR closed kudobuilder/kudo

Reviewers
Providing Repo Include Feature release/highlight

This provides the ability for 1 repo index to include another (or multiple). This changes in code is backward compatible... if the new field "include:" is missing, it works fine.

The new index file might look like:

apiVersion: v1
entries:
  flink:
    - appVersion: 0.7.0
      name: flink
      operatorVersion: 0.3.0
      urls:
        - http://kudo.dev/flink
includes:
  - https://kudo-repository.storage.googleapis.com/

The "includes" is a list of urls... for testing file locations are also possible. It is designed and tested so that duplicates are ignored and the root / parent repo entries take precedence. The includes are recursive so repoA could include repoB which includes repoC.

The value of this model is that a private repo user today is required to maintain all the versions of all the entries in the community repo in their private repo which is a burden. This will allow a private repo to have 1 entry of their operator and reference the community repo (with all it's updates and changes).

For the user searching or installing there is no perceived difference.

There are 2 things left todo (which are marked with todo in code.

  1. Error handling... I am hoping that we will agree that errors of includes will be ignored / clogged ... if a connection is down we don't want the whole operation to not be useful... although this creates odd edge cases.
  2. I want to pass a map to the recursive function and ignore (with clog messages) duplicate entries or urls which have already been processed... this is mainly to prevent infinite loops... but it will also be more efficient.
+233 -26

9 comments

9 changed files

kensipe

pr closed time in a month

pull request commentkudobuilder/kudo

Providing Repo Include Feature

As discussed and recommended to me, I'm closing this PR and am happy to re-open it once the contributing guidelines are met. I'm really excited for this capability to land so any user can benefit from the whole KUDO library.

kensipe

comment created time in a month

pull request commentkudobuilder/kudo

Providing Repo Include Feature

Awesome, that makes sense. Any public API changes have traditionally gone through a KEP, and this is a public API change. We should still consider the alternatives here that don't necessarily include that. Right now, this flips the relationship I've come to expect from installing packages in other environments (I install "my thing" from "a repo that resolves 'that thing'" given "multiple repos"). That approach is of course fraught with errors, and I think I'm onboard, but we should still be consistent and considerate of the possible ways we can do this before committing to a change that forces breaking changes in the future (not just to behavior, but API as well). I don't think it should take much time.

kensipe

comment created time in a month

pull request commentkudobuilder/kudo

Providing Repo Include Feature

Implementation looks good, what's the motivation for this particular implementation rather than a multi-repo approach (see: Debian, etc)? One is a public API change (adding in includes - it's backwards compatible now, but not backwards compatible to new implementations), the other is an internal feature enabling multiple repositories to be used at once.

Do we have a KEP or another conversation to point toward in adding this where we've considered that alternative?

kensipe

comment created time in a month

issue commentkudobuilder/kuttl

Contribs Doc

https://github.com/kudobuilder/kuttl/blob/master/CONTRIBUTING.md - can we make this the canonical source of this?

kensipe

comment created time in a month

pull request commentkudobuilder/kuttl

Test Clean up

lgtm!

kensipe

comment created time in a month

issue commentaws/containers-roadmap

[ECR]: Allow for alternate mediaTypes

Shipping kudo.dev bundles as artifacts is also on our roadmap.

jdolitsky

comment created time in a month

startedtailscale/tailscale

started time in 2 months

issue commentkudobuilder/kudo.dev

Create sections for KUDO concepts that have features. Also provides YAML and examples.

That's weird, we'll need to put a redirect in. Thanks @nrchakradhar - how'd you get linked to here?

https://kudo.dev/docs/what-is-kudo.html#main-concepts is the new page for that.

gerred

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-29: Operator dependencies

+---+kep-number: 29+title: Operator Dependencies+short-desc: Introducing operators depending on other operators+authors:+  - "@zen-dog"+  - "@porridge"+owners:+  - "@zen-dog"+editor: @zen-dog+creation-date: 2020-03-30+last-updated: 2020-04-16+status: provisional+---++# Operator Dependencies++## Table of Contents++* [Summary](#summary)+* [Motivation](#motivation)+  * [Goals](#goals)+  * [Non-Goals](#non-goals)+* [Proposal](#proposal)+  * [Implementation Details](#implementation-details)+    * [Operator Task](#operator-task)+    * [Deployment](#deployment)+      * [Client-Side](#client-side)+      * [Server-Side](#server-side)+      * [Parameterization](#dependencies-parametrization)+    * [Update](#update)+    * [Upgrade](#upgrade)+    * [Uninstalling](#uninstalling)+    * [Other Plans](#other-plans)+  * [Risks and Mitigation](#risks-and-mitigation)+* [Alternatives](#alternatives)++## Summary++This KEP aims to improve operator user and developer experience by introducing operator dependencies.++## Motivation++Recent operator development has shown that complex operators often depend on other operators to function. One workaround commonly seen can be found in the [Flink Demo](https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud) where an `Instance` of Flink needs an `Instance` of Kafka, which in turn needs an `Instance` of Zookeeper. There are two parts to the workaround:+1. the operator user needs to first **manually** install the `kafka` and `zookeeper` operators while skipping the instance creation (using the `kubectl kudo install ... --skip-instance` CLI option).+2. then, the user runs a regular `kubectl kudo install flink`, whose `deploy` plan includes `Instance` resources for Kafka and Zookeeper, which are bundled along other Flink operator templates in YAML format (rather than being created on-the-fly from an operator package).++This KEP is aiming at streamlining this experience for users and developers.++### Goals++Dependencies can be a complex topic. This KEP is not trying to boil the dependency ocean but rather limits itself to installation dependencies only, i.e. a set of `Operators/Instances` being installed together and removed together as a unit.++### Non-Goals++Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources.++## Proposal++KUDO operators **already have** a mechanism to deal with installation dependencies called [plans, phases, and steps](https://kudo.dev/docs/developing-operators/plans.html#overview) with serial or parallel execution strategy. This mechanism is already powerful enough to express any dependency hierarchy including transitive dependencies (see more about this in [implementation details](#implementation-details)). The core of this proposal is to reuse this mechanism and extend it with the ability to install operators.++### Implementation details++#### Operator Task++A new task kind `Operator` is introduced which extends `operator.yaml` with the ability to install dependencies. Let's take a look at the Kafka+Zookeeper example:++```yaml+apiVersion: kudo.dev/v1beta1+name: "kafka"+operatorVersion: "1.3.1"+kudoVersion: 0.12.0+kubernetesVersion: 1.14.8+appVersion: 2.5.0+url: https://kafka.apache.org/+tasks:+  - name: zookeeper-operator+    kind: Operator+    spec:+        package: zookeeper++  ...++plans:+  deploy:+    strategy: serial+    phases:+      - name: deploy-zookeeper+        strategy: serial+        steps:+          - name: zookeeper+            tasks:+              - zookeeper-operator+      - name: deploy-kafka+        strategy: serial+        steps:+          ...+```++The `zookeeper-operator` task specification is equivalent to `kudo install zookeeper` CLI command which installs the Zookeeper package from the official repo. Here is a complete `Operator` task specification:++```yaml+tasks:+- name: demo+  kind: Operator

We should talk about - but add it as a non-goal to this KEP - to support other Operator tools at this time. KOI KEP still needs to be made.

zen-dog

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-29: Operator dependencies

+---+kep-number: 29+title: Operator Dependencies+short-desc: Introducing operators depending on other operators+authors:+  - "@zen-dog"+  - "@porridge"+owners:+  - "@zen-dog"+editor: @zen-dog+creation-date: 2020-03-30+last-updated: 2020-04-16+status: provisional+---++# Operator Dependencies++## Table of Contents++* [Summary](#summary)+* [Motivation](#motivation)+  * [Goals](#goals)+  * [Non-Goals](#non-goals)+* [Proposal](#proposal)+  * [Implementation Details](#implementation-details)+    * [Operator Task](#operator-task)+    * [Deployment](#deployment)+      * [Client-Side](#client-side)+      * [Server-Side](#server-side)+      * [Parameterization](#dependencies-parametrization)+    * [Update](#update)+    * [Upgrade](#upgrade)+    * [Uninstalling](#uninstalling)+    * [Other Plans](#other-plans)+  * [Risks and Mitigation](#risks-and-mitigation)+* [Alternatives](#alternatives)++## Summary++This KEP aims to improve operator user and developer experience by introducing operator dependencies.++## Motivation++Recent operator development has shown that complex operators often depend on other operators to function. One workaround commonly seen can be found in the [Flink Demo](https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud) where an `Instance` of Flink needs an `Instance` of Kafka, which in turn needs an `Instance` of Zookeeper. There are two parts to the workaround:+1. the operator user needs to first **manually** install the `kafka` and `zookeeper` operators while skipping the instance creation (using the `kubectl kudo install ... --skip-instance` CLI option).+2. then, the user runs a regular `kubectl kudo install flink`, whose `deploy` plan includes `Instance` resources for Kafka and Zookeeper, which are bundled along other Flink operator templates in YAML format (rather than being created on-the-fly from an operator package).++This KEP is aiming at streamlining this experience for users and developers.++### Goals++Dependencies can be a complex topic. This KEP is not trying to boil the dependency ocean but rather limits itself to installation dependencies only, i.e. a set of `Operators/Instances` being installed together and removed together as a unit.++### Non-Goals++Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources.++## Proposal++KUDO operators **already have** a mechanism to deal with installation dependencies called [plans, phases, and steps](https://kudo.dev/docs/developing-operators/plans.html#overview) with serial or parallel execution strategy. This mechanism is already powerful enough to express any dependency hierarchy including transitive dependencies (see more about this in [implementation details](#implementation-details)). The core of this proposal is to reuse this mechanism and extend it with the ability to install operators.++### Implementation details++#### Operator Task++A new task kind `Operator` is introduced which extends `operator.yaml` with the ability to install dependencies. Let's take a look at the Kafka+Zookeeper example:++```yaml+apiVersion: kudo.dev/v1beta1+name: "kafka"+operatorVersion: "1.3.1"+kudoVersion: 0.12.0+kubernetesVersion: 1.14.8+appVersion: 2.5.0+url: https://kafka.apache.org/+tasks:+  - name: zookeeper-operator+    kind: Operator+    spec:+        package: zookeeper++  ...++plans:+  deploy:+    strategy: serial+    phases:+      - name: deploy-zookeeper+        strategy: serial+        steps:+          - name: zookeeper+            tasks:+              - zookeeper-operator+      - name: deploy-kafka+        strategy: serial+        steps:+          ...+```++The `zookeeper-operator` task specification is equivalent to `kudo install zookeeper` CLI command which installs the Zookeeper package from the official repo. Here is a complete `Operator` task specification:++```yaml+tasks:+- name: demo+  kind: Operator+  spec:+    package: # required, either repo package name, local package folder or an URL to package tarball+    repo: # optional, name of local repository configuration to use+    appVersion: # optional, a specific app version in the official repo, defaults to the most recent one+    operatorVersion: # optional, a specific operator version in the official repo, defaults to the most recent one+    instanceName: # optional, the instance name+```++As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization).++#### Deployment++##### Client-Side++Upon execution of `kudo install ...` command with the above operator definition, CLI will:++1. Collect all operator dependencies by analyzing the `deploy` plan of the top-level operator and compose a list of all operator dependencies (tasks with the kind `Operator`) including **transitive dependencies**+2. Install all collected packages **skipping the instances** (same as `kudo install ... --skip-instance`). This step creates `Operator` and `OperatorVersion` resources. Note that since we do not create instances here we can install them in any order+3. Proceed with the installation of the top-level operator as usual (create `Operator`, `OperatorVersion` and `Instance` resources)++Since we do this step on the client-side we have access to the full functionality of the `install` command including installing operators from the file system. This will come very handy during the development and debugging which arguably becomes more complex with dependencies.++##### Server-Side++Upon receiving a new operator Instance with dependencies KUDO mangers workflow engine will:++1. Build a [dependency graph](https://en.wikipedia.org/wiki/Dependency_graph) by transitively expanding top-level `deploy` plan using operator-tasks as vertices, and their execution order (`a` needs `b` to be installed first) as edges+2. Perform cycle detection and fail if circular dependencies found. We could additionally run this check on the client-side as part of the `kudo package verify` command to improve the UX+3. If we haven't found any cycles, start executing the top-level `deploy` plan. When encountering an operator-task, apply the corresponding `Instance` resource. Here it is the same as for any other resource that we create: we check if it is healthy and if not, end current plan execution and "wait" for it to become healthy. KUDO manager already has a [health check](https://github.com/kudobuilder/kudo/blob/master/pkg/engine/health/health.go#L78) for `Instance` resources implemented.++Let's take a look at an example. Here is a simplified operator `AA` with a few dependencies:++```text+AA+├── BB+│   ├── EE+│   │   ├── H+│   │   └── I+│   ├── F+│   └── GG+│       ├── J+│       └── K+├── CC+│   ├── L+│   └── M+└── D++Legend:+- Operators and operator-tasks are marked with double letters e.g. 'AA' or `BB`+- other tasks are marked with single letters e.g. 'D'+- direct children of an operator are the 'deploy' plan steps e.g. for 'AA' deploy steps are 'BB', 'CC' and 'D'+```++In the first step we build a dependency graph. A set of all graph vertices (which are task-operators) `S` is defined as `S = {AA, BB, CC, EE, GG}`. A transitive relationship `R` between the vertices is defined as `(a, b) ∈ S` meaning _`a` needs `b` deployed first_. The transitive relationship for the above example is: `R = { (AA,BB), (AA,CC), (BB,EE), (BB,GG) }`. The resulting topological order `O` is therefor `O = (EE, GG, BB, CC, AA)` which has no cycles.++The instance controller (IC) then starts with the execution of the top-level `deploy` plan of the operator `AA`. The first task is the `BB` operator-task. When executing it, IC creates the `Instance-BB` resource and ends current reconciliation. Next, IC notices new `Instance-BB` resource, starts new reconciliation, and executes the  `deploy` plan of the operator `BB` which then creates `Instance-EE` resource. This way we are basically performing the depth-first search for the dependency graph, executing each vertex in the right order e.g. `EE` has to be healthy before `BB` deploy plan can continue with the next step `F`.++We would additionally add the higher-level `Instance` reference (e.g. `AA`) to the `ownerReferences` list of its direct children `Instance`s (e.g. `BB` and `CC`). This would help with determining which `Instance` belongs to which operator and additionally help us with [operator uninstalling](#uninstalling).++The status of the execution can be seen as usual as part of the `Instance.Status`. We could additionally forward the status of a dependency `Instance` to the top-level `Instance.Status` to simplify the overview.++Note that in the above example if e.g. `EE` and `CC` task-operators reference the same operator package they must use distinct instance names `spec.instanceName` so that two separate `Instance`s are deployed. Otherwise, the dependency graph will have a cycle.++##### Dependencies Parametrization++We want to encourage operator composition by providing a way of operator encapsulation. In other words, operator users should not be allowed to arbitrarily modify the parameters of embedded operator instances. The higher-level operator should define all parameters that its **direct** dependency  operators need. Let's demonstrate this on an example of a simple operator `AA` that has operator `BB` as a dependency. ++```yaml+AA+└── BB+```+Operator `BB` has a required and empty parameter `PASSWORD`. To provide a way for the `AA` operator user to set the password we extend the operator-task with a new field `parameterFile`:++```yaml+tasks:+- name: deploy-bb+  kind: Operator+  spec:+    parameterFile: bb-params.yaml # optional, defines the parameter that will be set on the bb-instance  +```++The contents of the `bb-params.yaml` and the top-level `AA` `params.yaml`:++```yaml+# operator-aa/templates/bb-params.yaml+# Note that I placed it under templates mostly because it also uses templating+PASSWORD: {{ .Params.BB_PASSWORD }}+```++The `PASSWORD` value is computed on the server-side when IC executes the `deploy-bb` task. The `BB_PASSWORD` parameter is defined as usual in the top-level `params.yaml` file. ++```yaml+# operator-aa/params.yaml+apiVersion: kudo.dev/v1beta1+parameters:+  - name: BB_PASSWORD+    displayName: "BB password"+    description: "password for the underlying instance of BB"+    required: true+```++This is where we see the encapsulation is action. Every operator that incorporates other operators has to define all necessary parameters at the top-level. When installing the operator `AA` the user then has to define the `BB_PASSWORD` as usual:++```bash+$ kubectl kudo install AA -p BB_PASSWORD=secret+```  ++which will create `OperatorVersion-AA` ++```yaml+# /apis/kudo.dev/v1beta1/namespaces/default/operatorversions/aa-0.1.0+spec:+  operator:+    kind: Operator+    name: dummy+  parameters:+  - name: BB_PASSWORD+    displayName: "BB password"+    description: "password for the underlying instance of BB"+    required: true+  tasks:+  - name: deploy-bb+    kind: Operator+    spec:+      parameterFile: bb-params.yaml+  templates:+    bb-params.yaml: |+      PASSWORD: {{ .Params.BB_PASSWORD }}+  plans:+    deploy:+      ...  +```++and `Instance-AA` resources++```yaml+# /apis/kudo.dev/v1beta1/namespaces/default/instances/instance-aa+spec:+  parameters:+    BB_PASSWORD: secret+```++During the execution of the `deploy-bb` task, the `bb-params.yaml` is expanded the same way we expand templates during the apply-task execution. The `deploy-bb` operator-task then creates the `Instance-BB` resource and saves the expanded parameter `PASSWORD: secret` in it.++What happens if we have a deeper nested operator-tasks tree e.g.:+```yaml+AA+└── BB+    └── CC+        └── DD  +            └── EE+```+and it is the low-level `EE` operator that needs the password? It is like the dependency injection through constructor parameters: every higher-level operator has to encapsulate the password parameter so that `AA` has the `BB_PASSWORD`, `BB` the `CC_PASSWORD` and so on.++ #### Update++Updating parameters work the same way as deploying the operator. In most cases, the `deploy` plan is executed. Since all dependencies already exist, the KUDO manager will traverse the dependency graph, updating Instance parameters. This will then trigger a corresponding `deploy` plan on each affected `Instance`. If the `Instance` hasn't changed no plan will be triggered.++#### Upgrade++While an out-of-band upgrade of the individual dependency operators is possible (and practically impossible to prohibit until KUDO learns drift detection), operators, in general, should be upgraded as a whole to preserve compatibility between all dependencies. An `upgrade` plan execution is very similar to the `deploy` plan. CLI creates new `OperatorVersion` resources for all new dependency operator versions. KUDO manager builds a dependency graph by traversing the `upgrade` plans of the operators and executes them in a similar fashion.++#### Uninstalling++Current `kudo uninstall` CLI command only removes instances (with required `--instance` option) using [Background deletion propagation](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/util/kudo/kudo.go#L281). Remember that we've added top-level `Instance` reference to the dependency operators `ownerReferences` list during [deployment](#deployment). Now we can simply delete the top-level `Instance` and let the GC delete all the others.++#### Other Plans++It can be meaningful to allow [operator-tasks](#operator-task) outside of `deploy`, `update` and `upgrade` plans. A `monitoring` plan might install a monitoring operator package. We could even allow installation from a local disk by doing the same client-side steps for the `monitoring` plan when it is triggered. While the foundation provided by this KEP would make it easy, this KEP focuses on the installation dependencies, so we would probably forbid operator-tasks outside of `deploy`, `update` and `upgrade` in the beginning.++### Risks and Mitigation++The biggest risk is the increased complexity of the instance controller and the workflow engine. With the above approach, we can reuse much of the code and UX we have currently: plans and phases for flow control, local operators and custom operator repositories for easier development and deployment, and usual status reporting for debugging. The API footprint remains small as the only new API element is the [operator-task](#operator-task). Dependency graph building and traversal will require a graph library and there are a [few](https://github.com/yourbasic/graph) [out](https://godoc.org/github.com/twmb/algoimpl/go/graph) [there](https://godoc.org/gonum.org/v1/gonum/graph) so this will help mitigate some complexity.++## Alternatives++One alternative is to use terraform and the existing [KUDO terraform provider](https://kudo.dev/blog/blog-2020-02-07-kudo-terraform-provider-1.html#current-process) to outsource the burden of dealing with the dependency graphs. On the upside, we would avoid the additional implementation complexity in KUDO _itself_ (though the complexity of the terraform provider is not going anywhere) and get [output values](https://www.terraform.io/docs/configuration/outputs.html) and [resource referencing](https://www.terraform.io/docs/configuration/resources.html#referring-to-instances) on top. On the downside, terraform is a heavy dependency which will completely replace KUDO UI. It is hard to quantify the pros and cons of both approaches, so it is left up for discussion.

I would lean against using Terraform for this. Semantically, KUDO operators are their own "concept". The KUDO Terraform Provider is still very valuable even with a built-in dependency engine, as it allows KUDO to be a first class concept if a user is powering their IaC with Terraform.

However, there are others doing IaC with tools such as Pulumi, Cluster API, Crossplane, and others. Treating KUDO semantics at a separate level allows those semantics to be exposed as a single unit to any prevailing tool rather than tying a critical, core feature to TF specifics.

zen-dog

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-29: Operator dependencies

+---+kep-number: 29+title: Operator Dependencies+short-desc: Introducing operators depending on other operators+authors:+  - "@zen-dog"+  - "@porridge"+owners:+  - "@zen-dog"+editor: @zen-dog+creation-date: 2020-03-30+last-updated: 2020-04-16+status: provisional+---++# Operator Dependencies++## Table of Contents++* [Summary](#summary)+* [Motivation](#motivation)+  * [Goals](#goals)+  * [Non-Goals](#non-goals)+* [Proposal](#proposal)+  * [Implementation Details](#implementation-details)+    * [Operator Task](#operator-task)+    * [Deployment](#deployment)+      * [Client-Side](#client-side)+      * [Server-Side](#server-side)+      * [Parameterization](#dependencies-parametrization)+    * [Update](#update)+    * [Upgrade](#upgrade)+    * [Uninstalling](#uninstalling)+    * [Other Plans](#other-plans)+  * [Risks and Mitigation](#risks-and-mitigation)+* [Alternatives](#alternatives)++## Summary++This KEP aims to improve operator user and developer experience by introducing operator dependencies.++## Motivation++Recent operator development has shown that complex operators often depend on other operators to function. One workaround commonly seen can be found in the [Flink Demo](https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud) where an `Instance` of Flink needs an `Instance` of Kafka, which in turn needs an `Instance` of Zookeeper. There are two parts to the workaround:+1. the operator user needs to first **manually** install the `kafka` and `zookeeper` operators while skipping the instance creation (using the `kubectl kudo install ... --skip-instance` CLI option).+2. then, the user runs a regular `kubectl kudo install flink`, whose `deploy` plan includes `Instance` resources for Kafka and Zookeeper, which are bundled along other Flink operator templates in YAML format (rather than being created on-the-fly from an operator package).++This KEP is aiming at streamlining this experience for users and developers.++### Goals++Dependencies can be a complex topic. This KEP is not trying to boil the dependency ocean but rather limits itself to installation dependencies only, i.e. a set of `Operators/Instances` being installed together and removed together as a unit.++### Non-Goals++Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources.++## Proposal++KUDO operators **already have** a mechanism to deal with installation dependencies called [plans, phases, and steps](https://kudo.dev/docs/developing-operators/plans.html#overview) with serial or parallel execution strategy. This mechanism is already powerful enough to express any dependency hierarchy including transitive dependencies (see more about this in [implementation details](#implementation-details)). The core of this proposal is to reuse this mechanism and extend it with the ability to install operators.++### Implementation details++#### Operator Task++A new task kind `Operator` is introduced which extends `operator.yaml` with the ability to install dependencies. Let's take a look at the Kafka+Zookeeper example:++```yaml+apiVersion: kudo.dev/v1beta1+name: "kafka"+operatorVersion: "1.3.1"+kudoVersion: 0.12.0+kubernetesVersion: 1.14.8+appVersion: 2.5.0+url: https://kafka.apache.org/+tasks:+  - name: zookeeper-operator+    kind: Operator+    spec:+        package: zookeeper++  ...++plans:+  deploy:+    strategy: serial+    phases:+      - name: deploy-zookeeper+        strategy: serial+        steps:+          - name: zookeeper+            tasks:+              - zookeeper-operator+      - name: deploy-kafka+        strategy: serial+        steps:+          ...+```++The `zookeeper-operator` task specification is equivalent to `kudo install zookeeper` CLI command which installs the Zookeeper package from the official repo. Here is a complete `Operator` task specification:++```yaml+tasks:+- name: demo+  kind: Operator+  spec:+    package: # required, either repo package name, local package folder or an URL to package tarball+    repo: # optional, name of local repository configuration to use+    appVersion: # optional, a specific app version in the official repo, defaults to the most recent one+    operatorVersion: # optional, a specific operator version in the official repo, defaults to the most recent one+    instanceName: # optional, the instance name+```++As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization).++#### Deployment++##### Client-Side++Upon execution of `kudo install ...` command with the above operator definition, CLI will:++1. Collect all operator dependencies by analyzing the `deploy` plan of the top-level operator and compose a list of all operator dependencies (tasks with the kind `Operator`) including **transitive dependencies**+2. Install all collected packages **skipping the instances** (same as `kudo install ... --skip-instance`). This step creates `Operator` and `OperatorVersion` resources. Note that since we do not create instances here we can install them in any order+3. Proceed with the installation of the top-level operator as usual (create `Operator`, `OperatorVersion` and `Instance` resources)++Since we do this step on the client-side we have access to the full functionality of the `install` command including installing operators from the file system. This will come very handy during the development and debugging which arguably becomes more complex with dependencies.++##### Server-Side++Upon receiving a new operator Instance with dependencies KUDO mangers workflow engine will:++1. Build a [dependency graph](https://en.wikipedia.org/wiki/Dependency_graph) by transitively expanding top-level `deploy` plan using operator-tasks as vertices, and their execution order (`a` needs `b` to be installed first) as edges+2. Perform cycle detection and fail if circular dependencies found. We could additionally run this check on the client-side as part of the `kudo package verify` command to improve the UX+3. If we haven't found any cycles, start executing the top-level `deploy` plan. When encountering an operator-task, apply the corresponding `Instance` resource. Here it is the same as for any other resource that we create: we check if it is healthy and if not, end current plan execution and "wait" for it to become healthy. KUDO manager already has a [health check](https://github.com/kudobuilder/kudo/blob/master/pkg/engine/health/health.go#L78) for `Instance` resources implemented.++Let's take a look at an example. Here is a simplified operator `AA` with a few dependencies:++```text+AA+├── BB+│   ├── EE+│   │   ├── H+│   │   └── I+│   ├── F+│   └── GG+│       ├── J+│       └── K+├── CC+│   ├── L+│   └── M+└── D++Legend:+- Operators and operator-tasks are marked with double letters e.g. 'AA' or `BB`+- other tasks are marked with single letters e.g. 'D'+- direct children of an operator are the 'deploy' plan steps e.g. for 'AA' deploy steps are 'BB', 'CC' and 'D'+```++In the first step we build a dependency graph. A set of all graph vertices (which are task-operators) `S` is defined as `S = {AA, BB, CC, EE, GG}`. A transitive relationship `R` between the vertices is defined as `(a, b) ∈ S` meaning _`a` needs `b` deployed first_. The transitive relationship for the above example is: `R = { (AA,BB), (AA,CC), (BB,EE), (BB,GG) }`. The resulting topological order `O` is therefor `O = (EE, GG, BB, CC, AA)` which has no cycles.++The instance controller (IC) then starts with the execution of the top-level `deploy` plan of the operator `AA`. The first task is the `BB` operator-task. When executing it, IC creates the `Instance-BB` resource and ends current reconciliation. Next, IC notices new `Instance-BB` resource, starts new reconciliation, and executes the  `deploy` plan of the operator `BB` which then creates `Instance-EE` resource. This way we are basically performing the depth-first search for the dependency graph, executing each vertex in the right order e.g. `EE` has to be healthy before `BB` deploy plan can continue with the next step `F`.++We would additionally add the higher-level `Instance` reference (e.g. `AA`) to the `ownerReferences` list of its direct children `Instance`s (e.g. `BB` and `CC`). This would help with determining which `Instance` belongs to which operator and additionally help us with [operator uninstalling](#uninstalling).++The status of the execution can be seen as usual as part of the `Instance.Status`. We could additionally forward the status of a dependency `Instance` to the top-level `Instance.Status` to simplify the overview.++Note that in the above example if e.g. `EE` and `CC` task-operators reference the same operator package they must use distinct instance names `spec.instanceName` so that two separate `Instance`s are deployed. Otherwise, the dependency graph will have a cycle.++##### Dependencies Parametrization
##### Dependencies Parameterization
zen-dog

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-29: Operator dependencies

+---+kep-number: 29+title: Operator Dependencies+short-desc: Introducing operators depending on other operators+authors:+  - "@zen-dog"+  - "@porridge"+owners:+  - "@zen-dog"+editor: @zen-dog+creation-date: 2020-03-30+last-updated: 2020-04-16+status: provisional+---++# Operator Dependencies++## Table of Contents++* [Summary](#summary)+* [Motivation](#motivation)+  * [Goals](#goals)+  * [Non-Goals](#non-goals)+* [Proposal](#proposal)+  * [Implementation Details](#implementation-details)+    * [Operator Task](#operator-task)+    * [Deployment](#deployment)+      * [Client-Side](#client-side)+      * [Server-Side](#server-side)+      * [Parameterization](#dependencies-parametrization)+    * [Update](#update)+    * [Upgrade](#upgrade)+    * [Uninstalling](#uninstalling)+    * [Other Plans](#other-plans)+  * [Risks and Mitigation](#risks-and-mitigation)+* [Alternatives](#alternatives)++## Summary++This KEP aims to improve operator user and developer experience by introducing operator dependencies.++## Motivation++Recent operator development has shown that complex operators often depend on other operators to function. One workaround commonly seen can be found in the [Flink Demo](https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud) where an `Instance` of Flink needs an `Instance` of Kafka, which in turn needs an `Instance` of Zookeeper. There are two parts to the workaround:+1. the operator user needs to first **manually** install the `kafka` and `zookeeper` operators while skipping the instance creation (using the `kubectl kudo install ... --skip-instance` CLI option).+2. then, the user runs a regular `kubectl kudo install flink`, whose `deploy` plan includes `Instance` resources for Kafka and Zookeeper, which are bundled along other Flink operator templates in YAML format (rather than being created on-the-fly from an operator package).++This KEP is aiming at streamlining this experience for users and developers.++### Goals++Dependencies can be a complex topic. This KEP is not trying to boil the dependency ocean but rather limits itself to installation dependencies only, i.e. a set of `Operators/Instances` being installed together and removed together as a unit.++### Non-Goals++Dependency on an already running `Instance` is a non-goal. It is easy to imagine a situation when a new operator (e.g Kafka) may want to depend on the existing Zookeeper instance. However, such life-cycle dependency presents major challenges e.g. what happens when Zookeeper is removed? What happens when Zookeeper is upgraded, and the new version is incompatible with the current Kafka `Instance`? How can we ensure the compatibility? This KEP deliberately ignores this area and instead focuses on installation dependencies. Additionally, this KEP does not address output variables or referencing `Instance` resources.++## Proposal++KUDO operators **already have** a mechanism to deal with installation dependencies called [plans, phases, and steps](https://kudo.dev/docs/developing-operators/plans.html#overview) with serial or parallel execution strategy. This mechanism is already powerful enough to express any dependency hierarchy including transitive dependencies (see more about this in [implementation details](#implementation-details)). The core of this proposal is to reuse this mechanism and extend it with the ability to install operators.++### Implementation details++#### Operator Task++A new task kind `Operator` is introduced which extends `operator.yaml` with the ability to install dependencies. Let's take a look at the Kafka+Zookeeper example:++```yaml+apiVersion: kudo.dev/v1beta1+name: "kafka"+operatorVersion: "1.3.1"+kudoVersion: 0.12.0+kubernetesVersion: 1.14.8+appVersion: 2.5.0+url: https://kafka.apache.org/+tasks:+  - name: zookeeper-operator+    kind: Operator+    spec:+        package: zookeeper++  ...++plans:+  deploy:+    strategy: serial+    phases:+      - name: deploy-zookeeper+        strategy: serial+        steps:+          - name: zookeeper+            tasks:+              - zookeeper-operator+      - name: deploy-kafka+        strategy: serial+        steps:+          ...+```++The `zookeeper-operator` task specification is equivalent to `kudo install zookeeper` CLI command which installs the Zookeeper package from the official repo. Here is a complete `Operator` task specification:++```yaml+tasks:+- name: demo+  kind: Operator+  spec:+    package: # required, either repo package name, local package folder or an URL to package tarball+    repo: # optional, name of local repository configuration to use+    appVersion: # optional, a specific app version in the official repo, defaults to the most recent one+    operatorVersion: # optional, a specific operator version in the official repo, defaults to the most recent one+    instanceName: # optional, the instance name+```++As you can see, this closely mimics the `kudo install` CLI command [options](https://github.com/kudobuilder/kudo/blob/master/pkg/kudoctl/cmd/install.go#L56) because at the end the latter will be executed to install the operator. We omit `parameters` and `parameterFile` options at this point as they are discussed in detail [below](#dependencies-parametrization).++#### Deployment++##### Client-Side++Upon execution of `kudo install ...` command with the above operator definition, CLI will:++1. Collect all operator dependencies by analyzing the `deploy` plan of the top-level operator and compose a list of all operator dependencies (tasks with the kind `Operator`) including **transitive dependencies**+2. Install all collected packages **skipping the instances** (same as `kudo install ... --skip-instance`). This step creates `Operator` and `OperatorVersion` resources. Note that since we do not create instances here we can install them in any order+3. Proceed with the installation of the top-level operator as usual (create `Operator`, `OperatorVersion` and `Instance` resources)++Since we do this step on the client-side we have access to the full functionality of the `install` command including installing operators from the file system. This will come very handy during the development and debugging which arguably becomes more complex with dependencies.++##### Server-Side++Upon receiving a new operator Instance with dependencies KUDO mangers workflow engine will:++1. Build a [dependency graph](https://en.wikipedia.org/wiki/Dependency_graph) by transitively expanding top-level `deploy` plan using operator-tasks as vertices, and their execution order (`a` needs `b` to be installed first) as edges+2. Perform cycle detection and fail if circular dependencies found. We could additionally run this check on the client-side as part of the `kudo package verify` command to improve the UX+3. If we haven't found any cycles, start executing the top-level `deploy` plan. When encountering an operator-task, apply the corresponding `Instance` resource. Here it is the same as for any other resource that we create: we check if it is healthy and if not, end current plan execution and "wait" for it to become healthy. KUDO manager already has a [health check](https://github.com/kudobuilder/kudo/blob/master/pkg/engine/health/health.go#L78) for `Instance` resources implemented.++Let's take a look at an example. Here is a simplified operator `AA` with a few dependencies:++```text+AA+├── BB+│   ├── EE+│   │   ├── H+│   │   └── I+│   ├── F+│   └── GG+│       ├── J+│       └── K+├── CC+│   ├── L+│   └── M+└── D++Legend:+- Operators and operator-tasks are marked with double letters e.g. 'AA' or `BB`+- other tasks are marked with single letters e.g. 'D'+- direct children of an operator are the 'deploy' plan steps e.g. for 'AA' deploy steps are 'BB', 'CC' and 'D'+```++In the first step we build a dependency graph. A set of all graph vertices (which are task-operators) `S` is defined as `S = {AA, BB, CC, EE, GG}`. A transitive relationship `R` between the vertices is defined as `(a, b) ∈ S` meaning _`a` needs `b` deployed first_. The transitive relationship for the above example is: `R = { (AA,BB), (AA,CC), (BB,EE), (BB,GG) }`. The resulting topological order `O` is therefor `O = (EE, GG, BB, CC, AA)` which has no cycles.++The instance controller (IC) then starts with the execution of the top-level `deploy` plan of the operator `AA`. The first task is the `BB` operator-task. When executing it, IC creates the `Instance-BB` resource and ends current reconciliation. Next, IC notices new `Instance-BB` resource, starts new reconciliation, and executes the  `deploy` plan of the operator `BB` which then creates `Instance-EE` resource. This way we are basically performing the depth-first search for the dependency graph, executing each vertex in the right order e.g. `EE` has to be healthy before `BB` deploy plan can continue with the next step `F`.

One thing that Terraform does that I really like is add a root node to that dependency graph as the "permanent" start point. This is handy so that you can have multiple "roots" (still non-cyclical!) that you can execute in parallel. Do we want to include that concept here?

zen-dog

comment created time in 2 months

issue commentcncf/sig-app-delivery

Logo Ideas

I'm with @resouer. Joey is my favorite.

AloisReitbauer

comment created time in 2 months

push eventkudobuilder/generic-application-operator

Gerred Dillon

commit sha f2e750f70a8dbaa4126100d8c8625f8332d15184

Lower ingress equality check for robustness, make ENV optional, update version to 0.1.3

view details

push time in 2 months

push eventkudobuilder/generic-application-operator

Gerred Dillon

commit sha 6fd6f20a4fbcdd467b88d4d0fd771256dd81562f

Make optional params optional

view details

push time in 2 months

issue commentcncf/sig-app-delivery

KUDO Sandbox

Thanks @AloisReitbauer. I've been out sick, but am back and can jump on those next steps ASAP.

gerred

comment created time in 2 months

issue commentkudobuilder/kudo

Maps are marshalled to Go representation instead of YAML when using parameter file

ahh got it. thank you!

gerred

comment created time in 2 months

issue openedkudobuilder/kudo

Move to Scratch/Distroless image

I thought we were on FROM: scratch image, but when I looked at our Dockerfile I was a bit surprised we weren't. No need for a distro here. We should incorporate this into the next cycle, but no need to re-build old images.

created time in 2 months

push eventkudobuilder/generic-application-operator

Gerred Dillon

commit sha cd964f9c737511a20983dffb1614655e70f445e7

Update example-addon.yaml Signed-off-by: Gerred Dillon <hello@gerred.org>

view details

push time in 2 months

push eventkudobuilder/generic-application-operator

Gerred Dillon

commit sha 633fde84e5986b050b7eef41903f6d1d56175438

add example addon

view details

push time in 2 months

push eventkudobuilder/generic-application-operator

Gerred Dillon

commit sha d4220620145a83e51970ab33454141d4d7a91338

add cmd and args

view details

push time in 2 months

create barnchkudobuilder/generic-application-operator

branch : master

created branch time in 2 months

created repositorykudobuilder/generic-application-operator

Explorations in a generic operator

created time in 2 months

Pull request review commentkudobuilder/kudo

Waiting For a Plan to Finish

 func Status(options *Options, settings *env.Settings) error { }  func status(kc *kudo.Client, options *Options, ns string) error {-	tree := treeprint.New() -	instance, err := kc.GetInstance(options.Instance, ns)-	if err != nil {-		return err-	}-	if instance == nil {-		return fmt.Errorf("Instance %s/%s does not exist", ns, options.Instance)-	}+	firstPass := true+	start := time.Now() -	ov, err := kc.GetOperatorVersion(instance.Spec.OperatorVersion.Name, ns)-	if err != nil {-		return err-	}-	if ov == nil {-		return fmt.Errorf("OperatorVersion %s from instance %s/%s does not exist", instance.Spec.OperatorVersion.Name, ns, options.Instance)-	}+	// for loop breaks if Wait==false, or when active plan completes (or when user exits process)+	for {

A lot of this logic is really complex and there's some great Go constructs for doing this nicely. I'd prefer we do this with channels and select, which was well-designed for this and can be a lot more clear. Check this out:

https://www.sohamkamani.com/golang/2018-06-17-golang-using-context-cancellation/

Put the logic into a goroutine that takes a channel, and then select over that and a time.After(options.WaitTime * time.Second). The goroutine can use a for loop with time.Sleep (and then breaks), but there's also time.Ticker depending on what you want to do.

kensipe

comment created time in 2 months

pull request commentkudobuilder/kudo

Man's Search for Operators

This is nice. Something to consider for the future:

We should use https://blevesearch.com/ down the road, and store the search index DB alongside our operators (if it's not too large), and potentially cache it locally. Why, you ask?!

It'd be super nice to add some of the Bleve facets into this search (show me all 1.x operators for Kafka), and we can also facet on other interesting things that are either in the documentation, etc when we actually generate the index, but doesn't quite exist in the index.yaml. Tons of possibilities here to enhance this feature to the moon!

I've used Bleve a lot for this purpose in the past years - small, portable indexes that are a single file (same with BoltDB in Go) - and it's amazing what you can do with it without hosting any infrastructure.

kensipe

comment created time in 2 months

issue commentkudobuilder/kudo

Maps are marshalled to Go representation instead of YAML when using parameter file

Nevermind:

F0413 15:12:43.033417   44308 deepcopy.go:750] DeepCopy of "interface{}" is unsupported. Instead, use named interfaces with DeepCopy<named-interface> as one of the methods.

(╯°□°)╯︵ ┻━┻

gerred

comment created time in 2 months

issue commentkudobuilder/kudo

Maps are marshalled to Go representation instead of YAML when using parameter file

Awesome, this may now be fixable (whereas it wasn't before): https://github.com/kubernetes-sigs/kubebuilder/issues/528

gerred

comment created time in 2 months

issue commentkudobuilder/kudo

Maps are marshalled to Go representation instead of YAML when using parameter file

This is a pretty serious bug that will actually be pretty hard to fix, and is further complicated by client-gen not allowing us to change Parameters from map[string]string to map[string]interface{}.

gerred

comment created time in 2 months

issue openedkudobuilder/kudo

Maps are marshalled to Go representation instead of YAML when using parameter file

What happened:

An instance was marshalled with:

map[foo:bar baz:bam]

when installed using:

kubectl kudo install operator/ -P params-file.yaml

when the params file contains:

ENV:
  foo: "bar"
  bam: "baz"

What you expected to happen:

Instance map and array values are marshalled correctly as YAML.

How to reproduce it (as minimally and precisely as possible):

Install an operator with a map type using the -P flag

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kudo version (use kubectl kudo version): 1.11.1
  • Operator:
  • operatorVersion:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

created time in 2 months

pull request commentkudobuilder/kudo

Replace `next-release` with `post-vx.y.z`.

this is great, thank you!

porridge

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-22: Add initial design on diagnostics bundles

+---+kep-number: 22+title: KEP-22: Diagnostics Bundle+short-desc: Automatic collection of diagnostics data for KUDO operators+authors:+  - "@mpereira"+owners:+  - "@mpereira"+  - "@gerred"+  - "@zen-dog"+creation-date: 2020-01-24+last-updated: 2020-01-24+status: provisional+---++# [KEP-22: Diagnostics Bundle](https://github.com/kudobuilder/kudo/issues/1152)++## Table of contents++- [Summary](#summary)+- [Prior art, inspiration, resources](#prior-art-inspiration-resources)+  - [Diagnostics](#diagnostics)+- [Concepts](#concepts)+  - [Fault](#fault)+  - [Failure](#failure)+  - [Operator](#operator)+  - [Operator instance](#operator-instance)+  - [Application](#application)+  - [Operator developer](#operator-developer)+  - [Operator user](#operator-user)+  - [Diagnostic artifact](#diagnostic-artifact)+- [Goals](#goals)+  - [Functional](#functional)+  - [Non-functional](#non-functional)+- [Non-goals](#non-goals)+- [Requirements](#requirements)+- [Proposal](#proposal)+  - [Operator user experience](#operator-user-experience)+  - [Operator developer experience](#operator-developer-experience)+- [Resources](#resources)+- [Implementation history](#implementation-history)++## Summary++Software will malfunction. When it does, data is needed so that it can be+diagnosed, dealt with in the short term, and fixed for the long term. This KEP+is about creating programs that will automatically collect data and store them+in an easily distributable format.++These programs must be easy to use, given that they will potentially be used in+times of stress where faults or failures have already occurred. Secondarily, but+still importantly, these programs should be easily extensible so that the+collection of data related to new fault types can be quickly implemented and+released.++Applications managed by KUDO operators are very high in the stack (simplified+below):++| Layer               | Concepts                                                                                   |+| ------------------- | ------------------------------------------------------------------------------------------ |+| Application         | (Cassandra's `nodetool status`, Kafka's consumer lag, Elasticsearch's cluster state, etc.) |+| Operator instance   | (KUDO plans, KUDO tasks, etc.)                                                             |+| KUDO                | (controller-manager, k8s events, logs, objects in kudo-system, etc.)                       |+| Kubernetes workload | (Pods, controllers, services, secrets, etc.)                                               |+| Kubernetes          | (Docker, kubelet, scheduler, etcd, cloud networking/storage, Prometheus metrics, etc.)     |+| Operating system    | (Linux, networking, file system, etc.)                                                     |+| Hardware            |                                                                                            |++These layers aren't completely disjoint. This KEP will mostly focus on:++- Application+- Operator instance+- KUDO+- Kubernetes workload++## Prior art, inspiration, resources++### Diagnostics++1.  [replicatedhq/troubleshoot](https://github.com/replicatedhq/troubleshoot)++    Does preflight checks, diagnostics collection, and diagnostics analysis for+    Kubernetes applications.++2.  [mesosphere/dcos-sdk-service-diagnostics](https://github.com/mesosphere/dcos-sdk-service-diagnostics/tree/master/python)++    Does diagnostics collection for+    [DC/OS SDK services](https://github.com/mesosphere/dcos-commons).++    Diagnostics artifacts collected:++    - Mesos-related (Mesos state)+    - SDK-related (Pod status, plans statuses, offers matching, service+      configurations)+    - Application-related (e.g., Apache Cassandra's+      [=nodetool](http://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html)=+      commands, Elasticsearch's+      [HTTP API](https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html)+      responses, etc.)++3.  [dcos/dcos-diagnostics](https://github.com/dcos/dcos-diagnostics)++    Does diagnostics collection for [DC/OS](https://dcos.io/) clusters.++4.  [mesosphere/bun](https://github.com/mesosphere/bun)++    Does diagnostics analysis for archives created with `dcos/dcos-diagnostics`.++    It is also important to notice that some applications have existing tooling+    for application-level diagnostics collection, either built by the supporting+    organizations behind the applications or the community. A few examples:++    - [Elasticsearch's support-diagnostics](https://github.com/elastic/support-diagnostics)+    - [Apache Kafka's System Tools](https://cwiki.apache.org/confluence/display/KAFKA/System+Tools)++## Concepts++### Fault++One component of the system deviating from its specification.++### Failure++The system as a whole stops providing the required service to the user.++### Operator++A KUDO-based+[Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/),+e.g. [kudo-cassandra](https://github.com/mesosphere/kudo-cassandra-operator),+[kudo-kafka](https://github.com/mesosphere/kudo-kafka-operator).++### Operator instance++An operator instance.++### Application++Underlying software that is managed by an operator instance, e.g., Apache+Cassandra, Apache Kafka, Elasticsearch, etc.++### Operator developer++Someone who builds and maintains operators.++### Operator user++Someone who installs and maintains operator instances.++### Diagnostic artifact++A file, network response, or command output that contains information that is+potentially helpful for operator users to diagnose faults with their operator+instances, and for operator users to provide to operator developers and/or+people who support operators.++## Goals++### Functional++- Collect "Kubernetes workload"-specific diagnostics artifacts related to an+  operator instance+- Collect KUDO-specific diagnostics artifacts related to an operator instance+- Collect application-specific diagnostics artifacts related to an operator+  instance+- Bundle all diagnostics artifacts into an archive++### Non-functional++1.  Provide an **easy** experience for operator users to collect diagnostic+    artifact archives+2.  Be resilient to faults and failures. Collect as much diagnostics artifacts+    as possible and allow failed collections to be retried (idempotency) and+    incremented to (like `wget`'s `--continue` flag), in a way that collection+    is _resumable_+3.  Incorporate standard tools that are already provided by either organizations+    or the community behind applications as much as possible++## Non-goals++- Extensive collection of Kubernetes-related diagnostics artifacts+- At least not initially: collection of metrics from monitoring services (e.g.,+  Prometheus, Statsd, etc.).+- Automatic fixing of faults+- Preflight checks+- Analysis of collected artifacts+- Extending diagnostics bundle with custom artifact collectors++## Requirements++- MUST create an archive with diagnostics artifacts related specifically to an+  operator instance+- MUST include application-related diagnostics artifacts in the archive+- MUST include instance-related diagnostics artifacts in the archive+- MUST include KUDO-related diagnostics artifacts in the archive+- MUST include "Kubernetes workload"-related diagnostics artifacts in the+  archive+- MUST accept parameters and work without interactive prompts+- SHOULD work in airgapped environments+- SHOULD report the versions of every component and tool, in the archive (e.g.,+  the version of the collector, the application version, the operator version,+  the KUDO version, the Kubernetes version, etc.)+- SHOULD follow Kubernetes' ecosystem conventions and best practices+- MUST be published as a static binary+- SHOULD make it possible to publish archive to cloud object storage (AWS S3,+  etc.)+- MUST follow SemVer++## Proposal++### Operator user experience++The output from diagnostics collection is an archive containing all+diagnostics artifacts for the provided operator instance.++```bash+kubectl kudo diagnostics collect --instance=%instance% --namespace=%namespace%+```++### Operator developer experience++To configure diagnostics globally, this KEP introduces an optional top-level+`diagnostics` key in operator.yaml.++#### Diagnostics collection++The following diagnostics will be implicitly collected without any configuration+from the operator developer:++- Logs for deployed pods related to the KUDO Instance+- YAML for created resources, including both spec and status, related+  to the KUDO instance+- Output of `kubectl describe` for all deployed resources related to the KUDO+  Instance+- Current plan status, if one exists, or the KUDO Instance+- Information about the KUDO instance's Operator and OperatorVersion+- Logs for the KUDO controller manager+- Describe for the KUDO controller manager resources+- RBAC resources that are applicable to the KUDO controller manager+- Current settings and version information for KUDO+- Status of last preflight check run.+- k8s events (can we filter them for resources that the instance owns?)++Operator developer experience, then, focuses on customizing diagnostics+information to gather information about the running application. The following+forms are available, subject to change over time:++- **Copy**: Copy a file out of a running pod. This is useful for non-stdout+  logs, configuration files, and other artifacts generated by an application.+  Higher level resources can also be used, which will copy the file on all pods+  selected by that resource.+- **Command**: Run a command on a running pod and copy the stdout. Higher level+  resources can also be used, which will run the command on all pods selected by+  that resource.+- **Task**: Run a KUDO task and copy the stdout and other arbitrary files.+- **HTTP**: Make an HTTP request from the KUDO controller manager to a named+  service and port and copy the result of the request.++While some of these are redundant (HTTP can be a command or job), the intent+is to provide a high level experience where possible so that operator developers+don't necessarily need to maintain a `curl` container as part of their+application stack.++Operator-defined diagnostics collection is defined in a new `diagnostics.bundle.resources`+key in `operator.yaml`:++```yaml+diagnostics:+  bundle:+    resources:+      - name: Zookeeper Configuration File+        key: "zookeeper-configuration"+        kind: Copy+        spec:+          path: /opt/zookeeper/server.properties+          objectRef:+            kind: StatefulSet # Runs on ALL pods in the statefulset+            name: "{{ .InstanceName }}-zookeeper"+      - name: DNS information for running pod+        key: "dns-information"+        kind: Command+        spec:+          command: # Can be string or array+            - nslookup+            - google.com+          objectRef:+            kind: Pod+            name: "{{ .InstanceName }}-zookeeper-0"+    filters:+      - name: Authentication information+        spec:+          regex: "^host: %w+$"+```++This key is **OPTIONAL**. Default diagnostics collection will happen regardless+of the `diagnostics.bundle` key's presence. Note, moving to a graph-based engine+for KUDO will make selecting of resources much easier, rather than having to+use magical strings with templates. Future iterations of this will reduce the+complexity of selecting resources to run commands and files on.++Steps in a bundle run serially. To prevent the KUDO controller manager from+crashing, the collector process runs in another pod as a job. **TODO**:+Bundle collection, CRD, do we take a Velero-style approach? Where are files+stored? Might be time to introduce a KUDO-specific Minio instance.++Filtering is an important part of diagnostics collection. It enables diagnostics+to be portably sent to third parties that should not have sensitive information+that logs and files can contain.++By default, KUDO filters all resources (and custom resources) of values+contained within the KUDO Instance's secrets. This is configurable with the+`diagnostics.filterSecrets` key.++There may be other fields that need to be filtered. To solve for this, KUDO+introduces the `diagnostics.bundle.filters` key in `operator.yaml`, which+contains a list of filters that files pass through before writing to disk.+Custom filters use either a regular expression or an object reference and+JSONPath to derive values to filter.++All filtered values appear as `**FILTERED**` in relevant logs and files.++### More Notes++- Do we need to introduce a notion of the collector or controller manager+  signing and/or encrypting bundles? TBD.++#### Preflight Checks++## Resources++### bundle.resources++An individual bundle resource is represented as a list inside of the+`diagnostics.bundle.resources` key. Resources ALWAYS have the following keys:++- **name**: The human-readable name of the file.+- **key**: The machine-readable name of the file. This is used for both+  references (if needed in the future) and filenames. Extension is OPTIONAL,+  but may be useful for inferring mime types.+- **kind**: The kind of bundle item.+- **spec**: The attributes of a particular kind. This is different for every+  kind.++Also, specs may include an `objectRef`. It ALWAYS has the following keys:++- **kind**: The Kubernetes Kind referenced. For example, this may be a+  Deployment, Pod, StatefulSet, or other resource.+- **name**: The name of the object. This is a templated field, and has the same+  template environment as operator templates.++### bundle.resources.Copy++- **path**: Absolute path inside of the referenced pods.+- **objectRef**++### bundle.resources.Command++- **command**: Command to run. May be a string or an array.+- **objectRef**++### bundle.resources.Task++- **taskRef**: Name of the task to run. **NOTE**: We MAY need a Pause and Resume+  task to be able to copy files and run commands during the running of a task.+  Otherwise, we may want to make this an arbitrary job.++### bundle.resources.HTTP++- **serviceRef**: Object containing references to a Kubernetes service. This is+  scoped to KUDO-only services.+- **serviceRef.name**: Name of the service.+- **serviceRef.port**: Name of the service port. MUST be a named port, not an+  integer value.++### bundle.filters++Filters are a list of filters. They contain the following keys:++- **name**: The human readable name of the filter.+- **regex** (optional): Regular expression, not encased in slashes, to use.

Yeah, the original intent here was that after all collection occurred, everything passed through the filters / redaction. @vemelin-epm is correct on the second point.

mpereira

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-22: Add initial design on diagnostics bundles

+---+kep-number: 22+title: KEP-22: Diagnostics Bundle+short-desc: Automatic collection of diagnostics data for KUDO operators+authors:+  - "@mpereira"+owners:+  - "@mpereira"+  - "@gerred"+  - "@zen-dog"+creation-date: 2020-01-24+last-updated: 2020-01-24+status: provisional+---++# [KEP-22: Diagnostics Bundle](https://github.com/kudobuilder/kudo/issues/1152)++## Table of contents++- [Summary](#summary)+- [Prior art, inspiration, resources](#prior-art-inspiration-resources)+  - [Diagnostics](#diagnostics)+- [Concepts](#concepts)+  - [Fault](#fault)+  - [Failure](#failure)+  - [Operator](#operator)+  - [Operator instance](#operator-instance)+  - [Application](#application)+  - [Operator developer](#operator-developer)+  - [Operator user](#operator-user)+  - [Diagnostic artifact](#diagnostic-artifact)+- [Goals](#goals)+  - [Functional](#functional)+  - [Non-functional](#non-functional)+- [Non-goals](#non-goals)+- [Requirements](#requirements)+- [Proposal](#proposal)+  - [Operator user experience](#operator-user-experience)+  - [Operator developer experience](#operator-developer-experience)+- [Resources](#resources)+- [Implementation history](#implementation-history)++## Summary++Software will malfunction. When it does, data is needed so that it can be+diagnosed, dealt with in the short term, and fixed for the long term. This KEP+is about creating programs that will automatically collect data and store them+in an easily distributable format.++These programs must be easy to use, given that they will potentially be used in+times of stress where faults or failures have already occurred. Secondarily, but+still importantly, these programs should be easily extensible so that the+collection of data related to new fault types can be quickly implemented and+released.++Applications managed by KUDO operators are very high in the stack (simplified+below):++| Layer               | Concepts                                                                                   |+| ------------------- | ------------------------------------------------------------------------------------------ |+| Application         | (Cassandra's `nodetool status`, Kafka's consumer lag, Elasticsearch's cluster state, etc.) |+| Operator instance   | (KUDO plans, KUDO tasks, etc.)                                                             |+| KUDO                | (controller-manager, k8s events, logs, objects in kudo-system, etc.)                       |+| Kubernetes workload | (Pods, controllers, services, secrets, etc.)                                               |+| Kubernetes          | (Docker, kubelet, scheduler, etcd, cloud networking/storage, Prometheus metrics, etc.)     |+| Operating system    | (Linux, networking, file system, etc.)                                                     |+| Hardware            |                                                                                            |++These layers aren't completely disjoint. This KEP will mostly focus on:++- Application+- Operator instance+- KUDO+- Kubernetes workload++## Prior art, inspiration, resources++### Diagnostics

I like sonobuoy a lot and am open to using it. I've only used it for running conformance tests in the past. If the implementation makes sense more as a series of Sonobuoy plugins that are then just part of someone's spec, that's great. We should just balance this with adding it as an external dependency as another service we need to manage.

mpereira

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-22: Add initial design on diagnostics bundles

+---+kep-number: 22+title: KEP-22: Diagnostics Bundle+short-desc: Automatic collection of diagnostics data for KUDO operators+authors:+  - "@mpereira"+owners:+  - "@mpereira"+  - "@gerred"+  - "@zen-dog"+creation-date: 2020-01-24+last-updated: 2020-01-24+status: provisional+---++# [KEP-22: Diagnostics Bundle](https://github.com/kudobuilder/kudo/issues/1152)++## Table of contents++- [Summary](#summary)+- [Prior art, inspiration, resources](#prior-art-inspiration-resources)+  - [Diagnostics](#diagnostics)+- [Concepts](#concepts)+  - [Fault](#fault)+  - [Failure](#failure)+  - [Operator](#operator)+  - [Operator instance](#operator-instance)+  - [Application](#application)+  - [Operator developer](#operator-developer)+  - [Operator user](#operator-user)+  - [Diagnostic artifact](#diagnostic-artifact)+- [Goals](#goals)+  - [Functional](#functional)+  - [Non-functional](#non-functional)+- [Non-goals](#non-goals)+- [Requirements](#requirements)+- [Proposal](#proposal)+  - [Operator user experience](#operator-user-experience)+  - [Operator developer experience](#operator-developer-experience)+- [Resources](#resources)+- [Implementation history](#implementation-history)++## Summary++Software will malfunction. When it does, data is needed so that it can be+diagnosed, dealt with in the short term, and fixed for the long term. This KEP+is about creating programs that will automatically collect data and store them+in an easily distributable format.++These programs must be easy to use, given that they will potentially be used in+times of stress where faults or failures have already occurred. Secondarily, but+still importantly, these programs should be easily extensible so that the+collection of data related to new fault types can be quickly implemented and+released.++Applications managed by KUDO operators are very high in the stack (simplified+below):++| Layer               | Concepts                                                                                   |+| ------------------- | ------------------------------------------------------------------------------------------ |+| Application         | (Cassandra's `nodetool status`, Kafka's consumer lag, Elasticsearch's cluster state, etc.) |+| Operator instance   | (KUDO plans, KUDO tasks, etc.)                                                             |+| KUDO                | (controller-manager, k8s events, logs, objects in kudo-system, etc.)                       |+| Kubernetes workload | (Pods, controllers, services, secrets, etc.)                                               |+| Kubernetes          | (Docker, kubelet, scheduler, etcd, cloud networking/storage, Prometheus metrics, etc.)     |+| Operating system    | (Linux, networking, file system, etc.)                                                     |+| Hardware            |                                                                                            |++These layers aren't completely disjoint. This KEP will mostly focus on:++- Application+- Operator instance+- KUDO+- Kubernetes workload++## Prior art, inspiration, resources++### Diagnostics++1.  [replicatedhq/troubleshoot](https://github.com/replicatedhq/troubleshoot)++    Does preflight checks, diagnostics collection, and diagnostics analysis for+    Kubernetes applications.++2.  [mesosphere/dcos-sdk-service-diagnostics](https://github.com/mesosphere/dcos-sdk-service-diagnostics/tree/master/python)++    Does diagnostics collection for+    [DC/OS SDK services](https://github.com/mesosphere/dcos-commons).++    Diagnostics artifacts collected:++    - Mesos-related (Mesos state)+    - SDK-related (Pod status, plans statuses, offers matching, service+      configurations)+    - Application-related (e.g., Apache Cassandra's+      [=nodetool](http://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html)=+      commands, Elasticsearch's+      [HTTP API](https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html)+      responses, etc.)++3.  [dcos/dcos-diagnostics](https://github.com/dcos/dcos-diagnostics)++    Does diagnostics collection for [DC/OS](https://dcos.io/) clusters.++4.  [mesosphere/bun](https://github.com/mesosphere/bun)++    Does diagnostics analysis for archives created with `dcos/dcos-diagnostics`.++    It is also important to notice that some applications have existing tooling+    for application-level diagnostics collection, either built by the supporting+    organizations behind the applications or the community. A few examples:++    - [Elasticsearch's support-diagnostics](https://github.com/elastic/support-diagnostics)+    - [Apache Kafka's System Tools](https://cwiki.apache.org/confluence/display/KAFKA/System+Tools)++## Concepts++### Fault++One component of the system deviating from its specification.++### Failure++The system as a whole stops providing the required service to the user.++### Operator++A KUDO-based+[Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/),+e.g. [kudo-cassandra](https://github.com/mesosphere/kudo-cassandra-operator),+[kudo-kafka](https://github.com/mesosphere/kudo-kafka-operator).++### Operator instance++An operator instance.++### Application++Underlying software that is managed by an operator instance, e.g., Apache+Cassandra, Apache Kafka, Elasticsearch, etc.++### Operator developer++Someone who builds and maintains operators.++### Operator user++Someone who installs and maintains operator instances.++### Diagnostic artifact++A file, network response, or command output that contains information that is+potentially helpful for operator users to diagnose faults with their operator+instances, and for operator users to provide to operator developers and/or+people who support operators.++## Goals++### Functional++- Collect "Kubernetes workload"-specific diagnostics artifacts related to an+  operator instance+- Collect KUDO-specific diagnostics artifacts related to an operator instance+- Collect application-specific diagnostics artifacts related to an operator+  instance+- Bundle all diagnostics artifacts into an archive++### Non-functional++1.  Provide an **easy** experience for operator users to collect diagnostic+    artifact archives+2.  Be resilient to faults and failures. Collect as much diagnostics artifacts+    as possible and allow failed collections to be retried (idempotency) and+    incremented to (like `wget`'s `--continue` flag), in a way that collection+    is _resumable_+3.  Incorporate standard tools that are already provided by either organizations+    or the community behind applications as much as possible++## Non-goals++- Extensive collection of Kubernetes-related diagnostics artifacts+- At least not initially: collection of metrics from monitoring services (e.g.,+  Prometheus, Statsd, etc.).+- Automatic fixing of faults+- Preflight checks+- Analysis of collected artifacts+- Extending diagnostics bundle with custom artifact collectors++## Requirements++- MUST create an archive with diagnostics artifacts related specifically to an+  operator instance+- MUST include application-related diagnostics artifacts in the archive+- MUST include instance-related diagnostics artifacts in the archive+- MUST include KUDO-related diagnostics artifacts in the archive+- MUST include "Kubernetes workload"-related diagnostics artifacts in the+  archive+- MUST accept parameters and work without interactive prompts+- SHOULD work in airgapped environments+- SHOULD report the versions of every component and tool, in the archive (e.g.,+  the version of the collector, the application version, the operator version,+  the KUDO version, the Kubernetes version, etc.)+- SHOULD follow Kubernetes' ecosystem conventions and best practices+- MUST be published as a static binary+- SHOULD make it possible to publish archive to cloud object storage (AWS S3,+  etc.)+- MUST follow SemVer++## Proposal++### Operator user experience++The output from diagnostics collection is an archive containing all+diagnostics artifacts for the provided operator instance.++```bash+kubectl kudo diagnostics collect --instance=%instance% --namespace=%namespace%+```++### Operator developer experience++To configure diagnostics globally, this KEP introduces an optional top-level+`diagnostics` key in operator.yaml.++#### Diagnostics collection++The following diagnostics will be implicitly collected without any configuration+from the operator developer:++- Logs for deployed pods related to the KUDO Instance+- YAML for created resources, including both spec and status, related+  to the KUDO instance+- Output of `kubectl describe` for all deployed resources related to the KUDO+  Instance+- Current plan status, if one exists, or the KUDO Instance+- Information about the KUDO instance's Operator and OperatorVersion+- Logs for the KUDO controller manager+- Describe for the KUDO controller manager resources+- RBAC resources that are applicable to the KUDO controller manager+- Current settings and version information for KUDO+- Status of last preflight check run.+- k8s events (can we filter them for resources that the instance owns?)++Operator developer experience, then, focuses on customizing diagnostics+information to gather information about the running application. The following+forms are available, subject to change over time:++- **Copy**: Copy a file out of a running pod. This is useful for non-stdout+  logs, configuration files, and other artifacts generated by an application.+  Higher level resources can also be used, which will copy the file on all pods+  selected by that resource.+- **Command**: Run a command on a running pod and copy the stdout. Higher level+  resources can also be used, which will run the command on all pods selected by+  that resource.+- **Task**: Run a KUDO task and copy the stdout and other arbitrary files.+- **HTTP**: Make an HTTP request from the KUDO controller manager to a named+  service and port and copy the result of the request.++While some of these are redundant (HTTP can be a command or job), the intent+is to provide a high level experience where possible so that operator developers+don't necessarily need to maintain a `curl` container as part of their+application stack.++Operator-defined diagnostics collection is defined in a new `diagnostics.bundle.resources`+key in `operator.yaml`:++```yaml+diagnostics:+  bundle:+    resources:+      - name: Zookeeper Configuration File+        key: "zookeeper-configuration"+        kind: Copy+        spec:+          path: /opt/zookeeper/server.properties+          objectRef:+            kind: StatefulSet # Runs on ALL pods in the statefulset+            name: "{{ .InstanceName }}-zookeeper"+      - name: DNS information for running pod+        key: "dns-information"+        kind: Command+        spec:+          command: # Can be string or array+            - nslookup+            - google.com+          objectRef:+            kind: Pod+            name: "{{ .InstanceName }}-zookeeper-0"+    filters:+      - name: Authentication information+        spec:+          regex: "^host: %w+$"+```++This key is **OPTIONAL**. Default diagnostics collection will happen regardless+of the `diagnostics.bundle` key's presence. Note, moving to a graph-based engine+for KUDO will make selecting of resources much easier, rather than having to+use magical strings with templates. Future iterations of this will reduce the+complexity of selecting resources to run commands and files on.++Steps in a bundle run serially. To prevent the KUDO controller manager from+crashing, the collector process runs in another pod as a job. **TODO**:+Bundle collection, CRD, do we take a Velero-style approach? Where are files+stored? Might be time to introduce a KUDO-specific Minio instance.++Filtering is an important part of diagnostics collection. It enables diagnostics+to be portably sent to third parties that should not have sensitive information+that logs and files can contain.++By default, KUDO filters all resources (and custom resources) of values+contained within the KUDO Instance's secrets. This is configurable with the+`diagnostics.filterSecrets` key.++There may be other fields that need to be filtered. To solve for this, KUDO+introduces the `diagnostics.bundle.filters` key in `operator.yaml`, which+contains a list of filters that files pass through before writing to disk.+Custom filters use either a regular expression or an object reference and+JSONPath to derive values to filter.++All filtered values appear as `**FILTERED**` in relevant logs and files.++### More Notes++- Do we need to introduce a notion of the collector or controller manager+  signing and/or encrypting bundles? TBD.++#### Preflight Checks++## Resources++### bundle.resources++An individual bundle resource is represented as a list inside of the+`diagnostics.bundle.resources` key. Resources ALWAYS have the following keys:++- **name**: The human-readable name of the file.+- **key**: The machine-readable name of the file. This is used for both+  references (if needed in the future) and filenames. Extension is OPTIONAL,+  but may be useful for inferring mime types.+- **kind**: The kind of bundle item.+- **spec**: The attributes of a particular kind. This is different for every+  kind.++Also, specs may include an `objectRef`. It ALWAYS has the following keys:++- **kind**: The Kubernetes Kind referenced. For example, this may be a+  Deployment, Pod, StatefulSet, or other resource.+- **name**: The name of the object. This is a templated field, and has the same+  template environment as operator templates.++### bundle.resources.Copy++- **path**: Absolute path inside of the referenced pods.+- **objectRef**++### bundle.resources.Command++- **command**: Command to run. May be a string or an array.+- **objectRef**++### bundle.resources.Task++- **taskRef**: Name of the task to run. **NOTE**: We MAY need a Pause and Resume+  task to be able to copy files and run commands during the running of a task.+  Otherwise, we may want to make this an arbitrary job.++### bundle.resources.HTTP

ha you're right, we definitely need a path here.

mpereira

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-22: Add initial design on diagnostics bundles

+---+kep-number: 22+title: KEP-22: Diagnostics Bundle+short-desc: Automatic collection of diagnostics data for KUDO operators+authors:+  - "@mpereira"+owners:+  - "@mpereira"+  - "@gerred"+  - "@zen-dog"+creation-date: 2020-01-24+last-updated: 2020-01-24+status: provisional+---++# [KEP-22: Diagnostics Bundle](https://github.com/kudobuilder/kudo/issues/1152)++## Table of contents++- [Summary](#summary)+- [Prior art, inspiration, resources](#prior-art-inspiration-resources)+  - [Diagnostics](#diagnostics)+- [Concepts](#concepts)+  - [Fault](#fault)+  - [Failure](#failure)+  - [Operator](#operator)+  - [Operator instance](#operator-instance)+  - [Application](#application)+  - [Operator developer](#operator-developer)+  - [Operator user](#operator-user)+  - [Diagnostic artifact](#diagnostic-artifact)+- [Goals](#goals)+  - [Functional](#functional)+  - [Non-functional](#non-functional)+- [Non-goals](#non-goals)+- [Requirements](#requirements)+- [Proposal](#proposal)+  - [Operator user experience](#operator-user-experience)+  - [Operator developer experience](#operator-developer-experience)+- [Resources](#resources)+- [Implementation history](#implementation-history)++## Summary++Software will malfunction. When it does, data is needed so that it can be+diagnosed, dealt with in the short term, and fixed for the long term. This KEP+is about creating programs that will automatically collect data and store them+in an easily distributable format.++These programs must be easy to use, given that they will potentially be used in+times of stress where faults or failures have already occurred. Secondarily, but+still importantly, these programs should be easily extensible so that the+collection of data related to new fault types can be quickly implemented and+released.++Applications managed by KUDO operators are very high in the stack (simplified+below):++| Layer               | Concepts                                                                                   |+| ------------------- | ------------------------------------------------------------------------------------------ |+| Application         | (Cassandra's `nodetool status`, Kafka's consumer lag, Elasticsearch's cluster state, etc.) |+| Operator instance   | (KUDO plans, KUDO tasks, etc.)                                                             |+| KUDO                | (controller-manager, k8s events, logs, objects in kudo-system, etc.)                       |+| Kubernetes workload | (Pods, controllers, services, secrets, etc.)                                               |+| Kubernetes          | (Docker, kubelet, scheduler, etcd, cloud networking/storage, Prometheus metrics, etc.)     |+| Operating system    | (Linux, networking, file system, etc.)                                                     |+| Hardware            |                                                                                            |++These layers aren't completely disjoint. This KEP will mostly focus on:++- Application+- Operator instance+- KUDO+- Kubernetes workload++## Prior art, inspiration, resources++### Diagnostics++1.  [replicatedhq/troubleshoot](https://github.com/replicatedhq/troubleshoot)++    Does preflight checks, diagnostics collection, and diagnostics analysis for+    Kubernetes applications.++2.  [mesosphere/dcos-sdk-service-diagnostics](https://github.com/mesosphere/dcos-sdk-service-diagnostics/tree/master/python)++    Does diagnostics collection for+    [DC/OS SDK services](https://github.com/mesosphere/dcos-commons).++    Diagnostics artifacts collected:++    - Mesos-related (Mesos state)+    - SDK-related (Pod status, plans statuses, offers matching, service+      configurations)+    - Application-related (e.g., Apache Cassandra's+      [=nodetool](http://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html)=+      commands, Elasticsearch's+      [HTTP API](https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html)+      responses, etc.)++3.  [dcos/dcos-diagnostics](https://github.com/dcos/dcos-diagnostics)++    Does diagnostics collection for [DC/OS](https://dcos.io/) clusters.++4.  [mesosphere/bun](https://github.com/mesosphere/bun)++    Does diagnostics analysis for archives created with `dcos/dcos-diagnostics`.++    It is also important to notice that some applications have existing tooling+    for application-level diagnostics collection, either built by the supporting+    organizations behind the applications or the community. A few examples:++    - [Elasticsearch's support-diagnostics](https://github.com/elastic/support-diagnostics)+    - [Apache Kafka's System Tools](https://cwiki.apache.org/confluence/display/KAFKA/System+Tools)++## Concepts++### Fault++One component of the system deviating from its specification.++### Failure++The system as a whole stops providing the required service to the user.++### Operator++A KUDO-based+[Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/),+e.g. [kudo-cassandra](https://github.com/mesosphere/kudo-cassandra-operator),+[kudo-kafka](https://github.com/mesosphere/kudo-kafka-operator).++### Operator instance++An operator instance.++### Application++Underlying software that is managed by an operator instance, e.g., Apache+Cassandra, Apache Kafka, Elasticsearch, etc.++### Operator developer++Someone who builds and maintains operators.++### Operator user++Someone who installs and maintains operator instances.++### Diagnostic artifact++A file, network response, or command output that contains information that is+potentially helpful for operator users to diagnose faults with their operator+instances, and for operator users to provide to operator developers and/or+people who support operators.++## Goals++### Functional++- Collect "Kubernetes workload"-specific diagnostics artifacts related to an+  operator instance+- Collect KUDO-specific diagnostics artifacts related to an operator instance+- Collect application-specific diagnostics artifacts related to an operator+  instance+- Bundle all diagnostics artifacts into an archive++### Non-functional++1.  Provide an **easy** experience for operator users to collect diagnostic+    artifact archives+2.  Be resilient to faults and failures. Collect as much diagnostics artifacts+    as possible and allow failed collections to be retried (idempotency) and+    incremented to (like `wget`'s `--continue` flag), in a way that collection+    is _resumable_+3.  Incorporate standard tools that are already provided by either organizations+    or the community behind applications as much as possible++## Non-goals++- Extensive collection of Kubernetes-related diagnostics artifacts+- At least not initially: collection of metrics from monitoring services (e.g.,+  Prometheus, Statsd, etc.).+- Automatic fixing of faults+- Preflight checks+- Analysis of collected artifacts+- Extending diagnostics bundle with custom artifact collectors++## Requirements++- MUST create an archive with diagnostics artifacts related specifically to an+  operator instance+- MUST include application-related diagnostics artifacts in the archive+- MUST include instance-related diagnostics artifacts in the archive+- MUST include KUDO-related diagnostics artifacts in the archive+- MUST include "Kubernetes workload"-related diagnostics artifacts in the+  archive+- MUST accept parameters and work without interactive prompts+- SHOULD work in airgapped environments+- SHOULD report the versions of every component and tool, in the archive (e.g.,+  the version of the collector, the application version, the operator version,+  the KUDO version, the Kubernetes version, etc.)+- SHOULD follow Kubernetes' ecosystem conventions and best practices+- MUST be published as a static binary+- SHOULD make it possible to publish archive to cloud object storage (AWS S3,+  etc.)+- MUST follow SemVer++## Proposal++### Operator user experience++The output from diagnostics collection is an archive containing all+diagnostics artifacts for the provided operator instance.++```bash+kubectl kudo diagnostics collect --instance=%instance% --namespace=%namespace%+```++### Operator developer experience++To configure diagnostics globally, this KEP introduces an optional top-level+`diagnostics` key in operator.yaml.++#### Diagnostics collection++The following diagnostics will be implicitly collected without any configuration+from the operator developer:++- Logs for deployed pods related to the KUDO Instance+- YAML for created resources, including both spec and status, related+  to the KUDO instance+- Output of `kubectl describe` for all deployed resources related to the KUDO+  Instance+- Current plan status, if one exists, or the KUDO Instance+- Information about the KUDO instance's Operator and OperatorVersion+- Logs for the KUDO controller manager+- Describe for the KUDO controller manager resources+- RBAC resources that are applicable to the KUDO controller manager+- Current settings and version information for KUDO+- Status of last preflight check run.+- k8s events (can we filter them for resources that the instance owns?)++Operator developer experience, then, focuses on customizing diagnostics+information to gather information about the running application. The following+forms are available, subject to change over time:++- **Copy**: Copy a file out of a running pod. This is useful for non-stdout+  logs, configuration files, and other artifacts generated by an application.+  Higher level resources can also be used, which will copy the file on all pods+  selected by that resource.+- **Command**: Run a command on a running pod and copy the stdout. Higher level+  resources can also be used, which will run the command on all pods selected by+  that resource.+- **Task**: Run a KUDO task and copy the stdout and other arbitrary files.+- **HTTP**: Make an HTTP request from the KUDO controller manager to a named+  service and port and copy the result of the request.++While some of these are redundant (HTTP can be a command or job), the intent+is to provide a high level experience where possible so that operator developers+don't necessarily need to maintain a `curl` container as part of their+application stack.++Operator-defined diagnostics collection is defined in a new `diagnostics.bundle.resources`+key in `operator.yaml`:++```yaml+diagnostics:+  bundle:+    resources:+      - name: Zookeeper Configuration File+        key: "zookeeper-configuration"+        kind: Copy+        spec:+          path: /opt/zookeeper/server.properties+          objectRef:+            kind: StatefulSet # Runs on ALL pods in the statefulset+            name: "{{ .InstanceName }}-zookeeper"+      - name: DNS information for running pod+        key: "dns-information"+        kind: Command+        spec:+          command: # Can be string or array+            - nslookup+            - google.com+          objectRef:+            kind: Pod+            name: "{{ .InstanceName }}-zookeeper-0"+    filters:+      - name: Authentication information+        spec:+          regex: "^host: %w+$"+```++This key is **OPTIONAL**. Default diagnostics collection will happen regardless+of the `diagnostics.bundle` key's presence. Note, moving to a graph-based engine+for KUDO will make selecting of resources much easier, rather than having to+use magical strings with templates. Future iterations of this will reduce the+complexity of selecting resources to run commands and files on.++Steps in a bundle run serially. To prevent the KUDO controller manager from+crashing, the collector process runs in another pod as a job. **TODO**:+Bundle collection, CRD, do we take a Velero-style approach? Where are files+stored? Might be time to introduce a KUDO-specific Minio instance.++Filtering is an important part of diagnostics collection. It enables diagnostics+to be portably sent to third parties that should not have sensitive information+that logs and files can contain.++By default, KUDO filters all resources (and custom resources) of values+contained within the KUDO Instance's secrets. This is configurable with the+`diagnostics.filterSecrets` key.++There may be other fields that need to be filtered. To solve for this, KUDO+introduces the `diagnostics.bundle.filters` key in `operator.yaml`, which+contains a list of filters that files pass through before writing to disk.+Custom filters use either a regular expression or an object reference and+JSONPath to derive values to filter.++All filtered values appear as `**FILTERED**` in relevant logs and files.++### More Notes++- Do we need to introduce a notion of the collector or controller manager+  signing and/or encrypting bundles? TBD.++#### Preflight Checks++## Resources++### bundle.resources++An individual bundle resource is represented as a list inside of the+`diagnostics.bundle.resources` key. Resources ALWAYS have the following keys:++- **name**: The human-readable name of the file.+- **key**: The machine-readable name of the file. This is used for both+  references (if needed in the future) and filenames. Extension is OPTIONAL,+  but may be useful for inferring mime types.+- **kind**: The kind of bundle item.+- **spec**: The attributes of a particular kind. This is different for every+  kind.++Also, specs may include an `objectRef`. It ALWAYS has the following keys:++- **kind**: The Kubernetes Kind referenced. For example, this may be a+  Deployment, Pod, StatefulSet, or other resource.+- **name**: The name of the object. This is a templated field, and has the same+  template environment as operator templates.++### bundle.resources.Copy++- **path**: Absolute path inside of the referenced pods.+- **objectRef**++### bundle.resources.Command++- **command**: Command to run. May be a string or an array.+- **objectRef**++### bundle.resources.Task++- **taskRef**: Name of the task to run. **NOTE**: We MAY need a Pause and Resume+  task to be able to copy files and run commands during the running of a task.+  Otherwise, we may want to make this an arbitrary job.++### bundle.resources.HTTP++- **serviceRef**: Object containing references to a Kubernetes service. This is+  scoped to KUDO-only services.

We're just assuming that the only Kubernetes services we can reached were made by KUDO.

mpereira

comment created time in 2 months

Pull request review commentkudobuilder/kudo

KEP-22: Add initial design on diagnostics bundles

+---+kep-number: 22+title: KEP-22: Diagnostics Bundle+short-desc: Automatic collection of diagnostics data for KUDO operators+authors:+  - "@mpereira"+owners:+  - "@mpereira"+  - "@gerred"+  - "@zen-dog"+creation-date: 2020-01-24+last-updated: 2020-01-24+status: provisional+---++# [KEP-22: Diagnostics Bundle](https://github.com/kudobuilder/kudo/issues/1152)++## Table of contents++- [Summary](#summary)+- [Prior art, inspiration, resources](#prior-art-inspiration-resources)+  - [Diagnostics](#diagnostics)+- [Concepts](#concepts)+  - [Fault](#fault)+  - [Failure](#failure)+  - [Operator](#operator)+  - [Operator instance](#operator-instance)+  - [Application](#application)+  - [Operator developer](#operator-developer)+  - [Operator user](#operator-user)+  - [Diagnostic artifact](#diagnostic-artifact)+- [Goals](#goals)+  - [Functional](#functional)+  - [Non-functional](#non-functional)+- [Non-goals](#non-goals)+- [Requirements](#requirements)+- [Proposal](#proposal)+  - [Operator user experience](#operator-user-experience)+  - [Operator developer experience](#operator-developer-experience)+- [Resources](#resources)+- [Implementation history](#implementation-history)++## Summary++Software will malfunction. When it does, data is needed so that it can be+diagnosed, dealt with in the short term, and fixed for the long term. This KEP+is about creating programs that will automatically collect data and store them+in an easily distributable format.++These programs must be easy to use, given that they will potentially be used in+times of stress where faults or failures have already occurred. Secondarily, but+still importantly, these programs should be easily extensible so that the+collection of data related to new fault types can be quickly implemented and+released.++Applications managed by KUDO operators are very high in the stack (simplified+below):++| Layer               | Concepts                                                                                   |+| ------------------- | ------------------------------------------------------------------------------------------ |+| Application         | (Cassandra's `nodetool status`, Kafka's consumer lag, Elasticsearch's cluster state, etc.) |+| Operator instance   | (KUDO plans, KUDO tasks, etc.)                                                             |+| KUDO                | (controller-manager, k8s events, logs, objects in kudo-system, etc.)                       |+| Kubernetes workload | (Pods, controllers, services, secrets, etc.)                                               |+| Kubernetes          | (Docker, kubelet, scheduler, etcd, cloud networking/storage, Prometheus metrics, etc.)     |+| Operating system    | (Linux, networking, file system, etc.)                                                     |+| Hardware            |                                                                                            |++These layers aren't completely disjoint. This KEP will mostly focus on:++- Application+- Operator instance+- KUDO+- Kubernetes workload++## Prior art, inspiration, resources++### Diagnostics++1.  [replicatedhq/troubleshoot](https://github.com/replicatedhq/troubleshoot)++    Does preflight checks, diagnostics collection, and diagnostics analysis for+    Kubernetes applications.++2.  [mesosphere/dcos-sdk-service-diagnostics](https://github.com/mesosphere/dcos-sdk-service-diagnostics/tree/master/python)++    Does diagnostics collection for+    [DC/OS SDK services](https://github.com/mesosphere/dcos-commons).++    Diagnostics artifacts collected:++    - Mesos-related (Mesos state)+    - SDK-related (Pod status, plans statuses, offers matching, service+      configurations)+    - Application-related (e.g., Apache Cassandra's+      [=nodetool](http://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html)=+      commands, Elasticsearch's+      [HTTP API](https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-apis.html)+      responses, etc.)++3.  [dcos/dcos-diagnostics](https://github.com/dcos/dcos-diagnostics)++    Does diagnostics collection for [DC/OS](https://dcos.io/) clusters.++4.  [mesosphere/bun](https://github.com/mesosphere/bun)++    Does diagnostics analysis for archives created with `dcos/dcos-diagnostics`.++    It is also important to notice that some applications have existing tooling+    for application-level diagnostics collection, either built by the supporting+    organizations behind the applications or the community. A few examples:++    - [Elasticsearch's support-diagnostics](https://github.com/elastic/support-diagnostics)+    - [Apache Kafka's System Tools](https://cwiki.apache.org/confluence/display/KAFKA/System+Tools)++## Concepts++### Fault++One component of the system deviating from its specification.++### Failure++The system as a whole stops providing the required service to the user.++### Operator++A KUDO-based+[Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/),+e.g. [kudo-cassandra](https://github.com/mesosphere/kudo-cassandra-operator),+[kudo-kafka](https://github.com/mesosphere/kudo-kafka-operator).++### Operator instance++An operator instance.++### Application++Underlying software that is managed by an operator instance, e.g., Apache+Cassandra, Apache Kafka, Elasticsearch, etc.++### Operator developer++Someone who builds and maintains operators.++### Operator user++Someone who installs and maintains operator instances.++### Diagnostic artifact++A file, network response, or command output that contains information that is+potentially helpful for operator users to diagnose faults with their operator+instances, and for operator users to provide to operator developers and/or+people who support operators.++## Goals++### Functional++- Collect "Kubernetes workload"-specific diagnostics artifacts related to an+  operator instance+- Collect KUDO-specific diagnostics artifacts related to an operator instance+- Collect application-specific diagnostics artifacts related to an operator+  instance+- Bundle all diagnostics artifacts into an archive++### Non-functional++1.  Provide an **easy** experience for operator users to collect diagnostic+    artifact archives+2.  Be resilient to faults and failures. Collect as much diagnostics artifacts+    as possible and allow failed collections to be retried (idempotency) and+    incremented to (like `wget`'s `--continue` flag), in a way that collection+    is _resumable_+3.  Incorporate standard tools that are already provided by either organizations+    or the community behind applications as much as possible++## Non-goals++- Extensive collection of Kubernetes-related diagnostics artifacts+- At least not initially: collection of metrics from monitoring services (e.g.,+  Prometheus, Statsd, etc.).+- Automatic fixing of faults+- Preflight checks+- Analysis of collected artifacts+- Extending diagnostics bundle with custom artifact collectors++## Requirements++- MUST create an archive with diagnostics artifacts related specifically to an+  operator instance+- MUST include application-related diagnostics artifacts in the archive+- MUST include instance-related diagnostics artifacts in the archive+- MUST include KUDO-related diagnostics artifacts in the archive+- MUST include "Kubernetes workload"-related diagnostics artifacts in the+  archive+- MUST accept parameters and work without interactive prompts+- SHOULD work in airgapped environments+- SHOULD report the versions of every component and tool, in the archive (e.g.,+  the version of the collector, the application version, the operator version,+  the KUDO version, the Kubernetes version, etc.)+- SHOULD follow Kubernetes' ecosystem conventions and best practices+- MUST be published as a static binary+- SHOULD make it possible to publish archive to cloud object storage (AWS S3,+  etc.)+- MUST follow SemVer++## Proposal++### Operator user experience++The output from diagnostics collection is an archive containing all+diagnostics artifacts for the provided operator instance.++```bash+kubectl kudo diagnostics collect --instance=%instance% --namespace=%namespace%+```++### Operator developer experience++To configure diagnostics globally, this KEP introduces an optional top-level+`diagnostics` key in operator.yaml.++#### Diagnostics collection++The following diagnostics will be implicitly collected without any configuration+from the operator developer:++- Logs for deployed pods related to the KUDO Instance+- YAML for created resources, including both spec and status, related+  to the KUDO instance+- Output of `kubectl describe` for all deployed resources related to the KUDO+  Instance+- Current plan status, if one exists, or the KUDO Instance+- Information about the KUDO instance's Operator and OperatorVersion+- Logs for the KUDO controller manager+- Describe for the KUDO controller manager resources+- RBAC resources that are applicable to the KUDO controller manager+- Current settings and version information for KUDO+- Status of last preflight check run.+- k8s events (can we filter them for resources that the instance owns?)++Operator developer experience, then, focuses on customizing diagnostics+information to gather information about the running application. The following+forms are available, subject to change over time:++- **Copy**: Copy a file out of a running pod. This is useful for non-stdout+  logs, configuration files, and other artifacts generated by an application.+  Higher level resources can also be used, which will copy the file on all pods+  selected by that resource.+- **Command**: Run a command on a running pod and copy the stdout. Higher level+  resources can also be used, which will run the command on all pods selected by+  that resource.+- **Task**: Run a KUDO task and copy the stdout and other arbitrary files.+- **HTTP**: Make an HTTP request from the KUDO controller manager to a named+  service and port and copy the result of the request.++While some of these are redundant (HTTP can be a command or job), the intent+is to provide a high level experience where possible so that operator developers+don't necessarily need to maintain a `curl` container as part of their+application stack.++Operator-defined diagnostics collection is defined in a new `diagnostics.bundle.resources`+key in `operator.yaml`:++```yaml+diagnostics:+  bundle:+    resources:+      - name: Zookeeper Configuration File+        key: "zookeeper-configuration"+        kind: Copy+        spec:+          path: /opt/zookeeper/server.properties+          objectRef:+            kind: StatefulSet # Runs on ALL pods in the statefulset+            name: "{{ .InstanceName }}-zookeeper"+      - name: DNS information for running pod+        key: "dns-information"+        kind: Command+        spec:+          command: # Can be string or array+            - nslookup+            - google.com+          objectRef:+            kind: Pod+            name: "{{ .InstanceName }}-zookeeper-0"+    filters:+      - name: Authentication information+        spec:+          regex: "^host: %w+$"+```++This key is **OPTIONAL**. Default diagnostics collection will happen regardless+of the `diagnostics.bundle` key's presence. Note, moving to a graph-based engine+for KUDO will make selecting of resources much easier, rather than having to+use magical strings with templates. Future iterations of this will reduce the+complexity of selecting resources to run commands and files on.++Steps in a bundle run serially. To prevent the KUDO controller manager from

:+1:

mpereira

comment created time in 2 months

pull request commentkudobuilder/kudo

Moving to KUTTL

I agree with using CachedDiscoveryInterface here and not in kuttl.

kensipe

comment created time in 2 months

delete branch gerred/toc

delete branch : patch-4

delete time in 2 months

pull request commentcncf/toc

Update CONTRIBUTORS.md

no worries @caniszczyk everything's crazy busy right now. :) I just wasn't sure if there was another step.

gerred

comment created time in 2 months

pull request commentcncf/toc

Update CONTRIBUTORS.md

This has sat for a few days. @caniszczyk or @amye is there anything I can do to move this forward?

gerred

comment created time in 2 months

issue commentcncf/sig-contributor-strategy

[Discovery Umbrella] Inventory and Things

@parispittman I will (it's Friday night, so not thinking of new things) own meeting with Matt and Michelle, and put together the list of dev mailing lists.

parispittman

comment created time in 2 months

startedcncf/sig-contributor-strategy

started time in 3 months

pull request commentcncf/sig-contributor-strategy

sig-contributor-strategy charter

Hi! I'm also interested in stepping forward as a chair for this. Have the time, and really interested in the mission of this SIG and helped with some of the charter. :)

parispittman

comment created time in 3 months

pull request commentkudobuilder/kuttl

Support Background Processes

lgtm!

kensipe

comment created time in 3 months

issue commentkudobuilder/kudo

Create extension interface for Helm

Given some other things going on, I really don't see a use for this and I talked with @michelleN and I she had an idea I really like, and I want to champion. I want to explore actually embedding KUDO plans into Helm charts in the future, so that we can have a really strong story around progressively enhancing Helm charts.

In retrospect, I think trying to tangle together both Helm and KUDO's templating engines in the same codebase is a mistake. Yes, we have a branch for it - but we add a web of dependency there that's at the wrong level, by choosing to go for a code-level dependency rather than strive for data and building an interoperable ecosystem on that.

I get everyone wants to import a Helm chart, but let's please try to use KUDO's templating engine for KUDO-y things and make that also as good as possible.

gerred

comment created time in 3 months

issue commentkudobuilder/kudo

isNamespaced check in Enhancer

made a 0.11.1 milestone to this, feel free to add any other issues to 0.11.1

ANeumann82

comment created time in 3 months

issue commentkudobuilder/kudo

isNamespaced check in Enhancer

Assigning this to critical based on feedback. All work stops until we ship 0.11.1 with a fix for this in place.

ANeumann82

comment created time in 3 months

issue commentkudobuilder/kudo.dev

KUDO Comparison for Operator Developers

We have this now on the site.

Sent from my iPhone

On Mar 13, 2020, at 10:57 AM, Ken Sipe notifications@github.com wrote:

@tbaums is this something you or other SE/SA could provide some support on?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

gerred

comment created time in 3 months

pull request commentkudobuilder/kuttl

Add CI using Dispatch

i like that. we might not have time to integrate that tomorrow but i'll throw CA in there

jbarrick-mesosphere

comment created time in 3 months

pull request commentkudobuilder/kuttl

Add CI using Dispatch

could we make a service account so that a dispatch job could spin those up as preemptible to keep costs low?

jbarrick-mesosphere

comment created time in 3 months

pull request commentkudobuilder/kuttl

Add CI using Dispatch

What do you need req/limit wise on a GKE node?

jbarrick-mesosphere

comment created time in 3 months

pull request commentkudobuilder/kuttl

Add CI using Dispatch

@runyontr and I are working on infrastructure tomorrow under the kudobuilder/infrastructure repo and this will be divested infrastructure for use by the KUDO project -- so yes.

jbarrick-mesosphere

comment created time in 3 months

pull request commentkudobuilder/kuttl

Add CI using Dispatch

@jbarrick-mesosphere Is this a public instance of Dispatch, of which the license is donated for use by the KUDO and kuttl project? If that is true, can you additionally add this to the README to make it clear?

jbarrick-mesosphere

comment created time in 3 months

pull request commentkubernetes-sigs/controller-runtime

✨ Add helpers to configure logger options via pflags

@hasbro17 Yeah, this ended up being a little lax.

@shawn-hurley maybe we reverse the relationship in the future as well. As an approver I'm typically looking for those squashed commits when I approve, but I didn't take the due diligence to look back and see that it had already been approved before I /lgtmed. I'll do so in the future, but we should work together to make sure the reviewer is looking at the changes in and of itself and the approver is doing the same work, but also making sure it meets the criteria for being merged. I think it'd nbd but we can always stand to improve project hygiene!

bharathi-tenneti

comment created time in 3 months

Pull request review commentcncf/toc

Adding Initial Charter for Contributor Strategy

+# CNCF SIG Contributor Strategy Charter++Primary Authors: Paris Pittman, Josh Berkus  ++Not approved // needs a TOC vote++Reviewed and/or contributed to by:  +* Matt Klein+* Matt Farina  +* Carolyn Van Slyck  +* April Nassi+* Matt Jarvis+* Gerred Dillon+* Ken Owens+* Cheryl Hung+* Amye Scavarda Perrin+* Ihor Dvoretskyi+++## Introduction+This charter describes the operations of the CNCF Special Interest Group (SIG)+Contributor Strategy. This SIG is responsible for contributor experience,+sustainability, governance, and openness guidance to help CNCF community groups+and projects with their own contributor strategies for a healthy project.++Our initial three stakeholders:  +1 - CNCF projects and their contributors/maintainers,  +2 - End Users in the broader community and member companies,  +3 - TOC++## Mission+Consistent with the CNCF SIG definition, the mission of CNCF SIG Contributor+Strategy is to collaborate on strategies related to building, scaling, and+retaining contributor communities, including (people) governance, communications,+operations, and tools. To do that we will:+* Create intentional space. Form a "Maintainers Circle" (name may+  change) comprised of those interested in growing their projects and joining+  fellow maintainers in related cross project discussions.+* Listen and Advise. Create informational and training resources including+guides, tutorials and templates of best practices, trade-offs, strategies,+building and participating in scalable contributor communities.+* Evaluate and Foster. Helping the TOC with assessments and due diligence of+prospective new projects by developing community graduation criteria check+points for rolling feedback and guidance.+* Educate and Engage. Providing guidance to end users on how to engage with+ contributors and vice versa.++#### In scope:+The following, non exhaustive, bootstrap list of activities and deliverables are+in-scope for the SIG:+* Definition of a contributor. This is helpful across projects for metrics and+establishing guidelines, programs, and workflows.+* “Contributing health checks”/’community health checks’ (name tbd) for project+evaluations at graduation time.+* Webinars, meetings, and other events to engage with the end user community on+upstream contributing trainings and engagement programs.  +* Development of guidelines, documents for project governance, recruiting and+retaining contributor communities, mentorship, and project maturity.+* Collection of current state of contributor strategies and governance models+via surveys, GB reps, and Maintainers Circle (Example: what is the project doing+  now, challenges, gaps)++#### Out of scope+* The day to day operations of CNCF SIGs, Kubernetes SIGs, or any community group+of CNCF or its respective projects of any graduation level.+* The creation and approval of CNCF SIGs or other community groups; we will offer advice but the responsibility lies on the TOC for those matters.+* CNCF operations and marketing initiatives such as: product review/demo+webinars, kubecon event planning, branding, stickers, swag, etc+* Licensing and legal matters+* Testing+++## Roadmap+###1) SIG Formation:+Role creation+Stakeholder reps recruited and identified  ++###2) Discovery:+with CNCF SIGs, projects, and end user community for contributor best practices,+gaps/needs, current operations/programs, what hasn’t worked, etc.  ++###3) Establish working groups+Create them from discovery or already known gaps while #2 is ongoing. Examples+include:  +1. Maintainers Circle+2. Contributor growth and outreach: docs, diversity, recruitment, retention+ * includes modern mentoring, succession planning, and staffing contributor role strategies+3. "Community/Contributor health check"+  * evaluation criteria  +  * check-in/review/consulting process  +4. Open governance guidelines and governance operations best practices+  * the why, how, and where, your contributors make decisions  +  * contributor diversity++*Possible future roadmap projects*  +If you see something here that interests you, join us and start it:  +* Training: leadership, code of conduct, code reviewing, etc  +* Contributor metrics and definitions  +* Automation and self service for contributors, community GitOps++## Governance+This SIGs topic requires cross collaboration between end users, CNCF SIGs, and+CNCF projects of all graduation levels.++This SIG should be populated and governed by reps from CNCF projects that want

:+1:

parispittman

comment created time in 3 months

startedcncf/toc

started time in 3 months

pull request commentcncf/toc

Update CONTRIBUTORS.md

@caniszczyk

gerred

comment created time in 3 months

PR opened cncf/toc

Update CONTRIBUTORS.md

I'm active in the TOC, SIG App Delivery, SIG Contributor Strategy, and working to donate KUDO to the CNCF Sandbox. I look forward to working more actively on further TOC activities. Thank you!

Signed-off-by: Gerred Dillon hello@gerred.org

+1 -0

0 comment

1 changed file

pr created time in 3 months

push eventgerred/toc

Gerred Dillon

commit sha 1eebfdc883a58a64b360408d35b2e03db2c83b6f

Update CONTRIBUTORS.md I'm active in the TOC, SIG App Delivery, SIG Contributor Strategy, and working to donate KUDO to the CNCF Sandbox. I look forward to working more actively on further TOC activities. Thank you! Signed-off-by: Gerred Dillon <hello@gerred.org>

view details

push time in 3 months

push eventgerred/toc

Gerred Dillon

commit sha cee9eb754edce695117f5a4bbdc9ee6a7f6a4db4

Update CONTRIBUTORS.md I'm active in the TOC, SIG App Delivery, SIG Contributor Strategy, and working to donate KUDO to the CNCF Sandbox. I look forward to working more actively on further TOC activities. Thank you!

view details

push time in 3 months

push eventgerred/toc

Gerred Dillon

commit sha 022ef0b1256b658f78d2e1513ff84f5d44b7f3b8

Update CONTRIBUTORS.md I'm active in the TOC, SIG App Delivery, SIG Contributor Strategy, and working to donate KUDO to the CNCF Sandbox. I look forward to working more actively on further TOC activities. Thank you!

view details

push time in 3 months

startedbottlerocket-os/bottlerocket

started time in 3 months

pull request commentkudobuilder/kudo

Add test cases for apis.

@kensipe of course you don't, but it's an easy way to just add it in real quick. :)

harryge00

comment created time in 3 months

pull request commentkudobuilder/kudo

Add test cases for apis.

@harryge00 please at some point confirm you can meet the https://developercertificate.org/ requirements with posting your sign off in this PR, otherwise we will need to re-write your work. thank you!

harryge00

comment created time in 3 months

pull request commentkudobuilder/kudo

Add test cases for apis.

understood. squash and merge with Signed-off-by - this can be done with the Chrome extension. yay for being an admin!

harryge00

comment created time in 3 months

issue closedkudobuilder/kudo

Document Plans API

closed time in 3 months

gerred

issue commentkudobuilder/kudo

Document Plans API

@kensipe We have it now as of the docathon.

gerred

comment created time in 3 months

issue closedkudobuilder/kudo

Application Operator Guide

closed time in 3 months

gerred

issue commentkudobuilder/kudo

Application Operator Guide

I think we can close, this is pushed out to operator docs now.

gerred

comment created time in 3 months

issue commentkudobuilder/kudo

Document Instance API

@kensipe will you assign any pings to me? :) I'll look through assigned issues tomorrow

gerred

comment created time in 3 months

created repositorykudobuilder/community

KUDO community

created time in 3 months

created repositorykudobuilder/infrastructure

KUDO Infrastructure as Code

created time in 3 months

issue commentkudobuilder/kudo

Proposal: Switch to Conventional Commits

I do like that second option too! Could we combine these two things, and have that block used for our longer form release notes?

gerred

comment created time in 3 months

issue openedkudobuilder/kudo

Proposal: Switch to Conventional Commits

We've been looking at a few ideas and tools around shoring up our commits, making breaking changes more clear, and generally making our release, versioning, and documentation process more automatable.

Introducing: https://www.conventionalcommits.org/en/v1.0.0/

We can use that with tools like [commitlint(https://github.com/conventional-changelog/commitlint) and other Conventional Changelog tooling. We'll still want to write separate notes, but this could help make things a LOT more clear.

I'd like to get the core team's thoughts on this (@alenkacz, @kensipe, @zen-dog especially), and if so, I will PR updating the CONTRIBUTING process and add commitlint as a check and a hook.

created time in 3 months

Pull request review commentcncf/toc

Adding Initial Charter for Contributor Strategy

+# CNCF SIG Contributor Strategy Charter++Primary Authors: Paris Pittman, Josh Berkus  ++Not approved // needs a TOC vote++Reviewed and/or contributed to by:  +* Matt Klein+* Matt Farina  +* Carolyn Van Slyck  +* April Nassi+* Matt Jarvis+* Gerred Dillon+* Ken Owens+* Cheryl Hung+* Amye Scavarda Perrin+* Ihor Dvoretskyi+++## Introduction+This charter describes the operations of the CNCF Special Interest Group (SIG)+Contributor Strategy. This SIG is responsible for contributor experience,+sustainability, governance, and openness guidance to help CNCF community groups+and projects with their own contributor strategies for a healthy project.++Our initial three stakeholders:  +1 - CNCF projects and their contributors/maintainers,  +2 - End Users in the broader community and member companies,  +3 - TOC++## Mission+Consistent with the CNCF SIG definition, the mission of CNCF SIG Contributor+Strategy is to collaborate on strategies related to building, scaling, and+retaining contributor communities, including (people) governance, communications,+operations, and tools. To do that we will:+* Create intentional space. Form a "Maintainers Circle" (name may+  change) comprised of those interested in growing their projects and joining+  fellow maintainers in related cross project discussions.+* Listen and Advise. Create informational and training resources including+guides, tutorials and templates of best practices, trade-offs, strategies,+building and participating in scalable contributor communities.+* Evaluate and Foster. Helping the TOC with assessments and due diligence of+prospective new projects by developing community graduation criteria check+points for rolling feedback and guidance.+* Educate and Engage. Providing guidance to end users on how to engage with+ contributors and vice versa.++#### In scope:+The following, non exhaustive, bootstrap list of activities and deliverables are+in-scope for the SIG:+* Definition of a contributor. This is helpful across projects for metrics and+establishing guidelines, programs, and workflows.+* “Contributing health checks”/’community health checks’ (name tbd) for project+evaluations at graduation time.+* Webinars, meetings, and other events to engage with the end user community on+upstream contributing trainings and engagement programs.  +* Development of guidelines, documents for project governance, recruiting and+retaining contributor communities, mentorship, and project maturity.+* Collection of current state of contributor strategies and governance models+via surveys, GB reps, and Maintainers Circle (Example: what is the project doing+  now, challenges, gaps)++#### Out of scope+* The day to day operations of CNCF SIGs, Kubernetes SIGs, or any community group+of CNCF or its respective projects of any graduation level.+* The creation and approval of CNCF SIGs or other community groups; we will offer advice but the responsibility lies on the TOC for those matters.+* CNCF operations and marketing initiatives such as: product review/demo+webinars, kubecon event planning, branding, stickers, swag, etc+* Licensing and legal matters+* Testing+++## Roadmap+###1) SIG Formation:+Role creation+Stakeholder reps recruited and identified  ++###2) Discovery:+with CNCF SIGs, projects, and end user community for contributor best practices,+gaps/needs, current operations/programs, what hasn’t worked, etc.  ++###3) Establish working groups+Create them from discovery or already known gaps while #2 is ongoing. Examples+include:  +1. Maintainers Circle+2. Contributor growth and outreach: docs, diversity, recruitment, retention+ * includes modern mentoring, succession planning, and staffing contributor role strategies+3. "Community/Contributor health check"+  * evaluation criteria  +  * check-in/review/consulting process  +4. Open governance guidelines and governance operations best practices+  * the why, how, and where, your contributors make decisions  +  * contributor diversity++*Possible future roadmap projects*  +If you see something here that interests you, join us and start it:  +* Training: leadership, code of conduct, code reviewing, etc  +* Contributor metrics and definitions  +* Automation and self service for contributors, community GitOps++## Governance+This SIGs topic requires cross collaboration between end users, CNCF SIGs, and+CNCF projects of all graduation levels.++This SIG should be populated and governed by reps from CNCF projects that want

I just want to make sure that this SIG is focused on empowering CNCF projects, as is from what I understood of the charter. Not to be kingmakers - but part of this SIG would be to help projects and maintainers on their journey from (this may be different depending on project):

  • enter the CNCF at the sandbox stage
  • move to incubating
  • move to graduated

and have resources along the way. while I think anyone should be able to join, we need to keep our focus clear.

parispittman

comment created time in 3 months

Pull request review commentcncf/toc

Adding Initial Charter for Contributor Strategy

+# CNCF SIG Contributor Strategy Charter++Primary Authors: Paris Pittman, Josh Berkus  ++Not approved // needs a TOC vote++Reviewed and/or contributed to by:  +* Matt Klein+* Matt Farina  +* Carolyn Van Slyck  +* April Nassi+* Matt Jarvis+* Gerred Dillon+* Ken Owens+* Cheryl Hung+* Amye Scavarda Perrin+* Ihor Dvoretskyi+++## Introduction+This charter describes the operations of the CNCF Special Interest Group (SIG)+Contributor Strategy. This SIG is responsible for contributor experience,+sustainability, governance, and openness guidance to help CNCF community groups+and projects with their own contributor strategies for a healthy project.++Our initial three stakeholders:  +1 - CNCF projects and their contributors/maintainers,  +2 - End Users in the broader community and member companies,  +3 - TOC++## Mission+Consistent with the CNCF SIG definition, the mission of CNCF SIG Contributor+Strategy is to collaborate on strategies related to building, scaling, and+retaining contributor communities, including (people) governance, communications,+operations, and tools. To do that we will:+* Create intentional space. Form a "Maintainers Circle" (name may+  change) comprised of those interested in growing their projects and joining+  fellow maintainers in related cross project discussions.+* Listen and Advise. Create informational and training resources including+guides, tutorials and templates of best practices, trade-offs, strategies,+building and participating in scalable contributor communities.+* Evaluate and Foster. Helping the TOC with assessments and due diligence of+prospective new projects by developing community graduation criteria check+points for rolling feedback and guidance.+* Educate and Engage. Providing guidance to end users on how to engage with+ contributors and vice versa.++#### In scope:+The following, non exhaustive, bootstrap list of activities and deliverables are+in-scope for the SIG:+* Definition of a contributor. This is helpful across projects for metrics and+establishing guidelines, programs, and workflows.+* “Contributing health checks”/’community health checks’ (name tbd) for project+evaluations at graduation time.+* Webinars, meetings, and other events to engage with the end user community on+upstream contributing trainings and engagement programs.  +* Development of guidelines, documents for project governance, recruiting and+retaining contributor communities, mentorship, and project maturity.+* Collection of current state of contributor strategies and governance models+via surveys, GB reps, and Maintainers Circle (Example: what is the project doing+  now, challenges, gaps)++#### Out of scope+* The day to day operations of CNCF SIGs, Kubernetes SIGs, or any community group+of CNCF or its respective projects of any graduation level.+* The creation and approval of CNCF SIGs or other community groups; we will offer advice but the responsibility lies on the TOC for those matters.+* CNCF operations and marketing initiatives such as: product review/demo+webinars, kubecon event planning, branding, stickers, swag, etc+* Licensing and legal matters+* Testing+++## Roadmap+###1) SIG Formation:+Role creation+Stakeholder reps recruited and identified  ++###2) Discovery:+with CNCF SIGs, projects, and end user community for contributor best practices,+gaps/needs, current operations/programs, what hasn’t worked, etc.  ++###3) Establish working groups+Create them from discovery or already known gaps while #2 is ongoing. Examples+include:  +1. Maintainers Circle+2. Contributor growth and outreach: docs, diversity, recruitment, retention+ * includes modern mentoring, succession planning, and staffing contributor role strategies+3. "Community/Contributor health check"+  * evaluation criteria  +  * check-in/review/consulting process  +4. Open governance guidelines and governance operations best practices+  * the why, how, and where, your contributors make decisions  +  * contributor diversity++*Possible future roadmap projects*  +If you see something here that interests you, join us and start it:  +* Training: leadership, code of conduct, code reviewing, etc  +* Contributor metrics and definitions  +* Automation and self service for contributors, community GitOps++## Governance+This SIGs topic requires cross collaboration between end users, CNCF SIGs, and+CNCF projects of all graduation levels.++This SIG should be populated and governed by reps from CNCF projects that want

Although, I do agree @ultrasaurus that leadership shouldn't be dependent on existing CNCF project leadership.

parispittman

comment created time in 3 months

Pull request review commentcncf/toc

Adding Initial Charter for Contributor Strategy

+# CNCF SIG Contributor Strategy Charter++Primary Authors: Paris Pittman, Josh Berkus  ++Not approved // needs a TOC vote++Reviewed and/or contributed to by:  +* Matt Klein+* Matt Farina  +* Carolyn Van Slyck  +* April Nassi+* Matt Jarvis+* Gerred Dillon+* Ken Owens+* Cheryl Hung+* Amye Scavarda Perrin+* Ihor Dvoretskyi+++## Introduction+This charter describes the operations of the CNCF Special Interest Group (SIG)+Contributor Strategy. This SIG is responsible for contributor experience,+sustainability, governance, and openness guidance to help CNCF community groups+and projects with their own contributor strategies for a healthy project.++Our initial three stakeholders:  +1 - CNCF projects and their contributors/maintainers,  +2 - End Users in the broader community and member companies,  +3 - TOC++## Mission+Consistent with the CNCF SIG definition, the mission of CNCF SIG Contributor+Strategy is to collaborate on strategies related to building, scaling, and+retaining contributor communities, including (people) governance, communications,+operations, and tools. To do that we will:+* Create intentional space. Form a "Maintainers Circle" (name may+  change) comprised of those interested in growing their projects and joining+  fellow maintainers in related cross project discussions.+* Listen and Advise. Create informational and training resources including+guides, tutorials and templates of best practices, trade-offs, strategies,+building and participating in scalable contributor communities.+* Evaluate and Foster. Helping the TOC with assessments and due diligence of+prospective new projects by developing community graduation criteria check+points for rolling feedback and guidance.+* Educate and Engage. Providing guidance to end users on how to engage with+ contributors and vice versa.++#### In scope:+The following, non exhaustive, bootstrap list of activities and deliverables are+in-scope for the SIG:+* Definition of a contributor. This is helpful across projects for metrics and+establishing guidelines, programs, and workflows.+* “Contributing health checks”/’community health checks’ (name tbd) for project+evaluations at graduation time.+* Webinars, meetings, and other events to engage with the end user community on+upstream contributing trainings and engagement programs.  +* Development of guidelines, documents for project governance, recruiting and+retaining contributor communities, mentorship, and project maturity.+* Collection of current state of contributor strategies and governance models+via surveys, GB reps, and Maintainers Circle (Example: what is the project doing+  now, challenges, gaps)++#### Out of scope+* The day to day operations of CNCF SIGs, Kubernetes SIGs, or any community group+of CNCF or its respective projects of any graduation level.+* The creation and approval of CNCF SIGs or other community groups; we will offer advice but the responsibility lies on the TOC for those matters.+* CNCF operations and marketing initiatives such as: product review/demo+webinars, kubecon event planning, branding, stickers, swag, etc+* Licensing and legal matters+* Testing+++## Roadmap+###1) SIG Formation:+Role creation+Stakeholder reps recruited and identified  ++###2) Discovery:+with CNCF SIGs, projects, and end user community for contributor best practices,+gaps/needs, current operations/programs, what hasn’t worked, etc.  ++###3) Establish working groups+Create them from discovery or already known gaps while #2 is ongoing. Examples+include:  +1. Maintainers Circle+2. Contributor growth and outreach: docs, diversity, recruitment, retention+ * includes modern mentoring, succession planning, and staffing contributor role strategies+3. "Community/Contributor health check"+  * evaluation criteria  +  * check-in/review/consulting process  +4. Open governance guidelines and governance operations best practices+  * the why, how, and where, your contributors make decisions  +  * contributor diversity++*Possible future roadmap projects*  +If you see something here that interests you, join us and start it:  +* Training: leadership, code of conduct, code reviewing, etc  +* Contributor metrics and definitions  +* Automation and self service for contributors, community GitOps++## Governance+This SIGs topic requires cross collaboration between end users, CNCF SIGs, and+CNCF projects of all graduation levels.++This SIG should be populated and governed by reps from CNCF projects that want

I think from a scope and TOC perspective, we may still want to focus on CNCF projects from the scope of SIG Contributor Strategy. The TOC's mission is still oriented around those projects, and the SIG is a body of the TOC. I'd defer to the TOC on that scope, but while we shouldn't be kingmakers, we should still assist the TOC in their mission -- and SIG contributor strategy in that part is about projects that are part of the CNCF and aiding the TOC and GB in supporting them.

parispittman

comment created time in 3 months

more