profile
viewpoint

jacobstr/confer 97

Configuration management with extra protein.

jacobstr/django-dynamic-formset 3

Git mirror of django-dynamic-formset: http://code.google.com/p/django-dynamic-formset (updates every hour)

derekdowling/bursa 2

Go/React Webstack for doing Bitcoin Wallet Provisioning

jacobstr/connect-mongo 2

MongoDB session store for Connect.

jacobstr/dumpling 2

Very simple PHP object dumper.

jacobstr/crusher 1

Slack bot for campsite reservations.

jacobstr/django-filter 1

A generic system for filtering Django QuerySets based on user selections

jacobstr/django-haystack 1

Modular search for Django. Currently v1.1.0-alpha.

issue commentplanetlabs/draino

Missing role permissions needed in example manifest

Yep! Open to an MR!

cmagorian

comment created time in 8 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha b796c8587bef1823930cffaf8e633a1e0ec6193b

run 2to3 script

view details

push time in 9 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha f879ff003f293c6c3cc0b20993636e61bc0b4b20

update dependenicies, use pipfiles

view details

push time in 9 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 9da0d4d2acdfb0ff9b33ed8156bd5912c8da1b6a

hmac comparison changes

view details

push time in 9 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 6cd6560c3fc9a4ce60fbc9a9a3aa0d96025394e9

hmac comparison changes

view details

push time in 9 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 578cf4ceff175706af1362a64a62c9111ca3bec1

hmac comparison changes

view details

push time in 9 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 83110f6d8ca5c48f584bb4bdd8ea41ef73fd91dd

hmac comparison changes

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha a6b7cb164049660226fee109e818dc1a7a03bda8

hmac comparison changes

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha bc5ccd3e19d1750c0a2c2a38a412ad85a5cb0cdb

hmac comparison changes

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 6e1ca2a508f605660a7455c7f6220f8f10d6594c

another python3 fix

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 0884595f6fbe5d11dbeb974d53b2c62f81e348de

that's a derp

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 1f7cd87edd26aff73b76b04061fe2a9f699cecf7

cat the kustomization in ci

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 8fc5d07b2c27db80dfc0c840de64df7bf6653ea8

throwing crap at the wall to make kustomize work in ci It generates these erroneous image tags in ci: ``` The command "kustomize edit set image "koobz/crusher-worker=koobz/crusher-worker:${WORKER_IMAGE}"" exited with 0. 0.03s $ kustomize edit set image "koobz/crusher-server=koobz/crusher-server:${SERVER_IMAGE}" The command "kustomize edit set image "koobz/crusher-server=koobz/crusher-server:${SERVER_IMAGE}"" exited with 0. 0.09s $ kustomize build . | grep image image: koobz/crusher-server:koobz/crusher-server:f30bdc01 imagePullPolicy: Always image: koobz/crusher-worker:koobz/crusher-worker:f30bdc01 imagePullPolicy: Always ``` A quick reading of https://github.com/kubernetes-sigs/kustomize/blob/master/kustomize/internal/commands/edit/set/setimage.go shows no obvious way for this to happen.

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 765b0fa83beb8ae320f15221f4292df946025e89

set kustomization version

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 0caa6d1ad1f038cbb641f0d78bdaad810a489a5d

set the encoding for bytes conversion in hmac

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha f30bdc01378424e421a439228f12de7f2c36da65

fix hmac for python 3.7

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 0e37ac751ee1e544d99de44dd9032933d3bf15b9

bump kustomize again

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 34ca09cc62bb7ec4f61c19b18a8a3f845b7972e0

downgrade slack client

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha 1f2b856d564954938e0b3d45a05a5f0542d4bdce

log the image in ci

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha e5bf2559deef95f17d5f95b638f194252011a55b

update kustomize

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha d09dbd439d73ba0f46904d489f1a13f6895f2d8b

fix CI even more!

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha e87fd777a5cfa1e8630736fba7496f65276d0539

notify the channel if a request fails

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha e9d1f6d63b6d17d909398ef9add4361c9410a565

fix ci autodeploy

view details

push time in 10 days

push eventjacobstr/crusher

Jacob Straszynski

commit sha ea0c5e57ac230555383db7e012d2f8cc79870bbb

python 3.7, heartbeat checks... possibly more

view details

push time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 func UnprotectedPodFilter(annotations ...string) PodFilterFunc {  // NewPodFilters returns a FilterFunc that returns true if all of the supplied // FilterFuncs return true.+

Spurious whitepspace.

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 func main() { 		leaderElectionRetryPeriod   = app.Flag("leader-election-retry-period", "Leader election retry period.").Default(DefaultLeaderElectionRetryPeriod.String()).Duration() 		leaderElectionTokenName     = app.Flag("leader-election-token-name", "Leader election token name.").Default(kubernetes.Component).String() -		skipDrain             = app.Flag("skip-drain", "Whether to skip draining nodes after cordoning.").Default("false").Bool()-		evictDaemonSetPods    = app.Flag("evict-daemonset-pods", "Evict pods that were created by an extant DaemonSet.").Bool()-		evictStatefulSetPods  = app.Flag("evict-statefulset-pods", "Evict pods that were created by an extant StatefulSet.").Bool()

I've asked folks to preserve these flags in the past. It looks like it could be done here - we've got one in the case of --node-label already.

Given we no longer publish draino:latest, I'm less slightly less concerned about breaking compatibility at the moment.

This does largely make me feel like a changelog, semantic versioning scheme, and deprecation policy for the project would be helpful.

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 import ( 	meta "k8s.io/apimachinery/pkg/apis/meta/v1" 	"k8s.io/apimachinery/pkg/runtime" 	"k8s.io/apimachinery/pkg/runtime/schema"+	"k8s.io/client-go/dynamic"+	dynamicfake "k8s.io/client-go/dynamic/fake" 	"k8s.io/client-go/kubernetes" 	"k8s.io/client-go/kubernetes/fake"++	//"k8s.io/client-go/kubernetes/fake"

Spurious commented out line - looks like it's duplicated above.

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 Keep the following in mind before deploying Draino:   See annotation `"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"` in   [cluster-autoscaler documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node) +

spurious whitespace change

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 An example of `--node-label-expr`: (metadata.labels.region == 'us-west-1' && metadata.labels.app == 'nginx') || (metadata.labels.region == 'us-west-2' && metadata.labels.app == 'nginx') ``` +### Ignore pod controlled by ...+It is possible to prevent eviction of pods that are under control of:+- daemonset+- statefulset+- Custom Resource+- ...++or not even on control of anything. For this, use the flag `do-not-evict-pod-controlled-by`; it can be repeated. An empty value means that we block eviction on pods that are uncontrolled.+The value can be a `kind` or a `kind.group` or a `kind.version.group` to designate the owner resource type. If the `version` or/and the `group` are omitted it acts as a wildcard (any version, any group). It is case-sensitive and must match the API Resource definition.

A link to https://godoc.org/k8s.io/apimachinery/pkg/runtime/schema#ParseKindArg might be helpful here as well to provide a reference "forthis.v1.syntax."

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 func main() { 	if !*evictLocalStoragePods { 		pf = append(pf, kubernetes.LocalStoragePodFilter) 	}-	if !*evictUnreplicatedPods {-		pf = append(pf, kubernetes.UnreplicatedPodFilter)-	}-	if !*evictDaemonSetPods {-		pf = append(pf, kubernetes.NewDaemonSetPodFilter(cs))++	apiResources, err := kubernetes.GetAPIResourcesForGVK(cs, *doNotEvictPodControlledBy)+	if err != nil {+		kingpin.FatalIfError(err, "can't get resources for controlby filtering")

Can't get resources for controlled-by filtering

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 An example of `--node-label-expr`: (metadata.labels.region == 'us-west-1' && metadata.labels.app == 'nginx') || (metadata.labels.region == 'us-west-2' && metadata.labels.app == 'nginx') ``` +### Ignore pod controlled by ...+It is possible to prevent eviction of pods that are under control of:+- daemonset+- statefulset+- Custom Resource+- ...++or not even on control of anything. For this, use the flag `do-not-evict-pod-controlled-by`; it can be repeated. An empty value means that we block eviction on pods that are uncontrolled.
  • s/or not even on/or not even under the control of any controller
dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 An example of `--node-label-expr`: (metadata.labels.region == 'us-west-1' && metadata.labels.app == 'nginx') || (metadata.labels.region == 'us-west-2' && metadata.labels.app == 'nginx') ``` +### Ignore pod controlled by ...+It is possible to prevent eviction of pods that are under control of:+- daemonset

Can we use the camelcase form of the resource names here?

dbenque

comment created time in 10 days

Pull request review commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

 func main() { 	if !*evictLocalStoragePods { 		pf = append(pf, kubernetes.LocalStoragePodFilter) 	}-	if !*evictUnreplicatedPods {-		pf = append(pf, kubernetes.UnreplicatedPodFilter)-	}-	if !*evictDaemonSetPods {-		pf = append(pf, kubernetes.NewDaemonSetPodFilter(cs))++	apiResources, err := kubernetes.GetAPIResourcesForGVK(cs, *doNotEvictPodControlledBy)+	if err != nil {+		kingpin.FatalIfError(err, "can't get resources for controlby filtering") 	}-	if !*evictStatefulSetPods {-		pf = append(pf, kubernetes.NewStatefulSetPodFilter(cs))+	if len(apiResources) > 0 {+		for _, apiResource := range apiResources {+			if apiResource == nil {+				log.Info("Filtering pod the are uncontrolled")

Wording is a bit terse here - Maybe:

Pod filtering is unconstrained by controller
dbenque

comment created time in 10 days

PullRequestReviewEvent
PullRequestReviewEvent

push eventjacobstr/crusher

Jacob Straszynski

commit sha 4c9e96f9c531e19fd8d6413ae043dd94d37546c8

make date format compliant with updated API

view details

push time in 15 days

pull request commentplanetlabs/draino

filter out pod in eviction using pod ownerreference kind

@dbenque there seem to be no permissions related to view access outside of making in a collaborator - which I can't do unilaterally. Perhaps there's some workaround e.g. setting it up for your fork?

dbenque

comment created time in 22 days

pull request commentplanetlabs/draino

cordon limiter

@dbenque pardon the delay looking this over. Had a storm of merges a few weeks ago and haven't given draino the ❤️ since then. This looks like a worthy contribution and I'm happy to give it a look over and mainline it. Ya'll are probably running a lot of draino forks over at DataDog :)

dbenque

comment created time in 24 days

startedmbucc/shmig

started time in 25 days

push eventplanetlabs/draino

david.benque

commit sha 231ab8c4ad1ba421edd3652756218365c2091284

fix label key expression formating

view details

Jacob Straszynski

commit sha 450a85319ef708540617d2079688872168af881d

Merge pull request #97 from DataDog/david.benque/label-filter-fix fix label key expression formating

view details

push time in a month

PR merged planetlabs/draino

fix label key expression formating

Fix #96

+13 -2

2 comments

2 changed files

dbenque

pr closed time in a month

issue closedplanetlabs/draino

Parsing of parameter `nodeLabels` broken in case of domain in the labelkey

Following PR #75 we cannot use anymore labels with keys that have a domain name. (cc @cpoole )

The unittest have for example:

		{
			name: "SingleMatchingLabel",
			obj: &core.Node{
				ObjectMeta: meta.ObjectMeta{
					Name:   nodeName,
					Labels: map[string]string{"cool": "very"},
				},
			},
			labels:       map[string]string{"cool": "very"},
			passesFilter: true,
		},

but the following would fail:

		{
			name: "SingleMatchingLabel",
			obj: &core.Node{
				ObjectMeta: meta.ObjectMeta{
					Name:   nodeName,
					Labels: map[string]string{"planetlabs.com/cool": "very"},
				},
			},
			labels:       map[string]string{"planetlabs.com/cool": "very"},
			passesFilter: true,
		},

This result in a panic when launching draino with flags having domain in the key.

closed time in a month

dbenque

pull request commentgrafana/grafana

Cloud Logging support

We've been inspired for something like this actually. The stackdriver logging UI can be a bit finicky and it would be lovely to build a dashboard with multiple parameterized stackdriver log queries that react to any dashboard filters that have been applied.

E.g. suppose I filter by a kubernetes pod and the panels respond by showing the kubernetes events (stackdriver logging), pod logs (stackdriver logging) pod resource utilization (in-cluster thanos), for that specific pod. Particularly, constructing these queries tends to be tedious (was it jsonPayload.labels or jsonPayload.metadata.labels) so the stackdriver experience is creating multiple saved searches opening multiple tabs, substituting in values...

This rings true:

Cloud Logging is core feature of GCP observability. AWS CloudWatch datasource already support CloudWatch Logs. I think Cloud Monitoring should support Cloud Logging.

Pardon the "me too" - just wanted to lend some context because I'm coming hot off the heels of thinking this was part of the GrafanaDatasource and was not.

mtanda

comment created time in a month

push eventplanetlabs/draino

Cezar Sa Espinola

commit sha 0f4b2e12f148aee91899b6537d334dd7caf76b28

Add support for uncordoning nodes

view details

Jacob Straszynski

commit sha d92f02ba8f801760b0fa2388be3dd630129d21ea

Merge pull request #84 from cezarsa/uncordon Add support for uncordoning nodes

view details

push time in a month

PR merged planetlabs/draino

Add support for uncordoning nodes

Following discussions in https://github.com/planetlabs/draino/issues/27#issuecomment-458327126 it can be useful to draino to detect that a condition that triggered a node being cordoned is no longer present.

This PR introduces the ability for draino to track which conditions triggered the cordon+drain process in an annotation named draino.planet.com/conditions. Whenever this annotation is present draino will check if the conditions are still present in the node, if they are not present anymore draino will try to uncordon the node and possibly skip draining the node if it wasn't scheduled yet.

~I'm marking this as a draft because I'm still going to write a few unit tests, but the functionality is mostly ready and I've been able to test it on a real cluster.~

One question, would the maintainers like for me to put this feature behind a flag (eg: --allow-uncordon)? I don't think it's dangerous to allow uncordoning but it can be unexpected for users upgrading draino.

Fixes #27

+616 -218

2 comments

7 changed files

cezarsa

pr closed time in a month

issue closedplanetlabs/draino

Support for uncordon?

Hi, we're trying out draino and it seems to work well when testing. Our use case looks something like this:

  • We have an asset management system that tracks the status of our bare metal assets. We can move a server manually through various states, ex. unallocated, maintenance, allocated.
  • We want to be able to place a server (kubelet) in maintenance mode, and use NPD with a custom plugin to check this status, and then use draino to cordon and drain kubelets.
  • When we move the server (kubelet) back to allocated, NPD will update the condition, but draino doesn't do anything about it.

Is the expectation that we use some other system to uncordon kubelets and make them available again to the cluster?

closed time in a month

gtorre
PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Add support for uncordoning nodes

 func TestCordon(t *testing.T) { 	cases := []struct { 		name      string 		node      *core.Node+		mutators  []nodeMutatorFn+		expected  *core.Node 		reactions []reactor 	}{ 		{ 			name: "CordonSchedulableNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},-			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: false},-					},-				},-				reactor{-					verb:     "update",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: true},-					},-				},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 		{ 			name: "CordonUnschedulableNode",-			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},-			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: true},-					},-				},+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 		{ 			name: "CordonNonExistentNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}, 			reactions: []reactor{-				reactor{verb: "get", resource: "nodes", err: errors.New("nope")},+				{verb: "get", resource: "nodes", err: errors.New("nope")}, 			}, 		}, 		{ 			name: "ErrorCordoningSchedulableNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}, 			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: false},-					},-				},-				reactor{verb: "update", resource: "nodes", err: errors.New("nope")},+				{verb: "update", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "CordonSchedulableNodeWithMutator",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName, Annotations: map[string]string{"foo": "1"}},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+		},+		{+			name: "CordonUnschedulableNodeWithMutator",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 	}  	for _, tc := range cases { 		t.Run(tc.name, func(t *testing.T) {-			c := &fake.Clientset{}+			c := fake.NewSimpleClientset(tc.node) 			for _, r := range tc.reactions {-				c.AddReactor(r.verb, r.resource, r.Fn())+				c.PrependReactor(r.verb, r.resource, r.Fn()) 			}- 			d := NewAPICordonDrainer(c)-			if err := d.Cordon(tc.node); err != nil {+			if err := d.Cordon(tc.node, tc.mutators...); err != nil { 				for _, r := range tc.reactions { 					if errors.Cause(err) == r.err { 						return 					} 				} 				t.Errorf("d.Cordon(%v): %v", tc.node.Name, err) 			}+			{+				n, err := c.CoreV1().Nodes().Get(tc.node.GetName(), meta.GetOptions{})+				if err != nil {+					t.Errorf("node.Get(%v): %v", tc.node.Name, err)+				}+				if !reflect.DeepEqual(tc.expected, n) {+					t.Errorf("node.Get(%v): want %#v, got %#v", tc.node.Name, tc.expected, n)+				}+			}+		})+	}+}++func TestUncordon(t *testing.T) {+	cases := []struct {+		name      string+		node      *core.Node+		mutators  []nodeMutatorFn+		expected  *core.Node+		reactions []reactor+	}{+		{+			name:     "UncordonSchedulableNode",+			node:     &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonUnschedulableNode",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonNonExistentNode",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			reactions: []reactor{+				{verb: "get", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "ErrorUncordoningUnschedulableNode",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			reactions: []reactor{+				{verb: "update", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "UncordonSchedulableNodeWithMutator",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonUnschedulableNodeWithMutator",

...And I found it here: https://github.com/planetlabs/draino/pull/84/files#diff-45cd5412f9ec9294054951666bce4620R159

cezarsa

comment created time in a month

Pull request review commentplanetlabs/draino

Add support for uncordoning nodes

 func TestCordon(t *testing.T) { 	cases := []struct { 		name      string 		node      *core.Node+		mutators  []nodeMutatorFn+		expected  *core.Node 		reactions []reactor 	}{ 		{ 			name: "CordonSchedulableNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},-			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: false},-					},-				},-				reactor{-					verb:     "update",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: true},-					},-				},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 		{ 			name: "CordonUnschedulableNode",-			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},-			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: true},-					},-				},+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 		{ 			name: "CordonNonExistentNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}, 			reactions: []reactor{-				reactor{verb: "get", resource: "nodes", err: errors.New("nope")},+				{verb: "get", resource: "nodes", err: errors.New("nope")}, 			}, 		}, 		{ 			name: "ErrorCordoningSchedulableNode", 			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}, 			reactions: []reactor{-				reactor{-					verb:     "get",-					resource: "nodes",-					ret: &core.Node{-						ObjectMeta: meta.ObjectMeta{Name: nodeName},-						Spec:       core.NodeSpec{Unschedulable: false},-					},-				},-				reactor{verb: "update", resource: "nodes", err: errors.New("nope")},+				{verb: "update", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "CordonSchedulableNodeWithMutator",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName, Annotations: map[string]string{"foo": "1"}},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+		},+		{+			name: "CordonUnschedulableNodeWithMutator",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true}, 			}, 		}, 	}  	for _, tc := range cases { 		t.Run(tc.name, func(t *testing.T) {-			c := &fake.Clientset{}+			c := fake.NewSimpleClientset(tc.node) 			for _, r := range tc.reactions {-				c.AddReactor(r.verb, r.resource, r.Fn())+				c.PrependReactor(r.verb, r.resource, r.Fn()) 			}- 			d := NewAPICordonDrainer(c)-			if err := d.Cordon(tc.node); err != nil {+			if err := d.Cordon(tc.node, tc.mutators...); err != nil { 				for _, r := range tc.reactions { 					if errors.Cause(err) == r.err { 						return 					} 				} 				t.Errorf("d.Cordon(%v): %v", tc.node.Name, err) 			}+			{+				n, err := c.CoreV1().Nodes().Get(tc.node.GetName(), meta.GetOptions{})+				if err != nil {+					t.Errorf("node.Get(%v): %v", tc.node.Name, err)+				}+				if !reflect.DeepEqual(tc.expected, n) {+					t.Errorf("node.Get(%v): want %#v, got %#v", tc.node.Name, tc.expected, n)+				}+			}+		})+	}+}++func TestUncordon(t *testing.T) {+	cases := []struct {+		name      string+		node      *core.Node+		mutators  []nodeMutatorFn+		expected  *core.Node+		reactions []reactor+	}{+		{+			name:     "UncordonSchedulableNode",+			node:     &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonUnschedulableNode",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonNonExistentNode",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			reactions: []reactor{+				{verb: "get", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "ErrorUncordoningUnschedulableNode",+			node: &core.Node{+				ObjectMeta: meta.ObjectMeta{Name: nodeName},+				Spec:       core.NodeSpec{Unschedulable: true},+			},+			reactions: []reactor{+				{verb: "update", resource: "nodes", err: errors.New("nope")},+			},+		},+		{+			name: "UncordonSchedulableNodeWithMutator",+			node: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+			mutators: []nodeMutatorFn{func(n *core.Node) {+				n.Annotations = map[string]string{"foo": "1"}+			}},+			expected: &core.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}},+		},+		{+			name: "UncordonUnschedulableNodeWithMutator",

I was looking for a test case where a node remains unschedulable even if draino has no reason to uncordon it.

cezarsa

comment created time in a month

PR merged planetlabs/draino

use apps/v1 apigroup for DaemonSets

Kubernetes stopped serving (removed) extensions apigroup for DaemonSet resource, since v1.16.0 release, which is almost since a year.

DaemonSets are available under apps/v1 groups since v1.9.0 release. So, this should work on both older clusters and also newer clusters without extensions/v1 group.

@jacobstr

+3 -3

1 comment

3 changed files

ravilr

pr closed time in a month

push eventplanetlabs/draino

ravilr

commit sha d6d56ff8618d771e12fb5e00277a5dd709a26fb4

use apps/v1 apigroup for DaemonSets

view details

ravilr

commit sha ffb6cb4b1fd1f69ef44562525b7863efed19ebb6

update rbac role in helm templates

view details

Jacob Straszynski

commit sha 46799e2ba926b5b25e21bbd319027b14cf1e5925

Merge pull request #91 from ravilr/apigroup_update use apps/v1 apigroup for DaemonSets

view details

push time in a month

push eventplanetlabs/draino

Jacob Straszynski

commit sha 2c214f9865c2a9823203ad2e9af73c9121940493

add a blurb to README.md regarding node-label-expr

view details

push time in a month

push eventplanetlabs/draino

Jacob Straszynski

commit sha 2fb2e9b86b91a6f2101d0a72b797c9ce4924e311

logging and fmting nits

view details

Jacob Straszynski

commit sha a90c520b10adfa9d4e0697cbf9f4b38b17f62a8b

fix tagging derp

view details

push time in a month

PR merged planetlabs/draino

Allow node label to be a flexible expression

This PR addresses a new feature:

I do not see a way to provide a list of labels that can be OR'd together to select a node. Right now if you provide multiple labels it must match ALL of them.

This PR makes node labels into a user specifiable expression:

(node_labels['region'] == 'us-west-2' && node_labels['app'] == 'nginx') || (node_labels['region'] == 'us-west-2' && node_labels['foo'] == 'bar') || (node_labels['type'] == 'toolbox')

This is achieved through the use of https://github.com/antonmedv/expr

~This PR adds the ability to flip draino into OR mode.~

~While doing this I realized when multiple labels are supplied that have the same key but different values only the last provided is used in the app: eg. --node-label=foo=bar --node-label=foo=elephant results in the *nodeLabels map only including {foo: elephant}~

~In order to fix this I parse them as an array of independent map[string]string and then let the Filter functions determine how to handle the deduping (required in the AND case to maintain backwards compatible behavior)~

+334 -23

22 comments

9 changed files

cpoole

pr closed time in a month

push eventplanetlabs/draino

Connor Poole

commit sha 689c180ae86e9b74221e52952fde6ec2f25055ae

Add label expression for complex node selection

view details

Jacob Straszynski

commit sha 6d1581d746f90a3b5907b8bdd2fa2e2f6dcba656

Merge pull request #75 from cpoole/master Allow node label to be a flexible expression

view details

push time in a month

push eventplanetlabs/draino

Jacob Straszynski

commit sha dcabfb1937e19ea7b194a304aedebfaf5d182522

remove draino:latest builds Using `:latest` is often an anti-pattern so lets nip it now. There are a few MR's that have been in the bike-shed due to concerns of regressions. Given what `draino` does, a bug that makes it through review, testing, and lands on an unsuspecting cluster, would be painful.

view details

Jacob Straszynski

commit sha 800bc71263716abd2572fc9d9c30f0647f1d5ac4

Merge pull request #93 from planetlabs/koobz/remove-latest-builds remove draino:latest builds

view details

push time in a month

PR merged planetlabs/draino

remove draino:latest builds

Using :latest is often an anti-pattern so lets nip it now. There are a few MR's that have been in the bike-shed due to concerns of regressions. Given what draino does, a bug that makes it through review, testing, and lands on an unsuspecting cluster, would be painful.

+7 -3

1 comment

3 changed files

jacobstr

pr closed time in a month

PR opened planetlabs/draino

remove draino:latest builds

Using :latest is often an anti-pattern so lets nip it now. There are a few MR's that have been in the bike-shed due to concerns of regressions. Given what draino does, a bug that makes it through review, testing, and lands on an unsuspecting cluster, would be painful.

+7 -3

0 comment

3 changed files

pr created time in a month

create barnchplanetlabs/draino

branch : koobz/remove-latest-builds

created branch time in a month

issue openedGoogleCloudPlatform/k8s-config-connector

Output Module: emit metadata from Config Connector resources into a secret.

Describe the feature or the resource that you want.

Suppose I create a SQLInstance, and subsequently I want to set a database connection string for my newly created database as an environment variable in my application. I might do that using a secretKeyRef + secret, but I currently have to manually construct that secret by introspecting the resulting SQLInstance status field to get the IP address.

I think generally, I might want to emit something from the status field of other resource into a secret. The dependent deployment will fail until the secret is created.

created time in 2 months

issue commentplanetlabs/draino

draino and kured

Can you walk this out a bit for me? Maybe kured could set a NodeCondition for us and we sit back and relax in the bliss of generality that NodeConditions :)

limberger

comment created time in 2 months

push eventplanetlabs/draino

Jacob Straszynski

commit sha 6c0e7488bf177f75b43458b743f0e6a6f778b9f9

tune golang-ci lint for travis * Increase GC frequency: https://golangci-lint.run/usage/performance/#memory-usage * Halve the concurrency from its default of 8.

view details

Jacob Straszynski

commit sha cbce4e387ea3e01424da353a39d3e9f20b68ec0d

Merge pull request #88 from planetlabs/koobz/tune-golangci-lint tune golang-ci lint for travis

view details

push time in 2 months

PR merged planetlabs/draino

tune golang-ci lint for travis
  • Increase GC frequency: https://golangci-lint.run/usage/performance/#memory-usage
  • Halve the concurrency from its default of 8.
+1 -1

1 comment

1 changed file

jacobstr

pr closed time in 2 months

pull request commentplanetlabs/draino

Allow node label to be a flexible expression

I'll merge this in a moment assuming tests pass and a subsequent rebase on your branch should hopefully resolve it.

cpoole

comment created time in 2 months

PR opened planetlabs/draino

tune golang-ci lint for travis
  • Increase GC frequency: https://golangci-lint.run/usage/performance/#memory-usage
  • Halve the concurrency from its default of 8.
+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchplanetlabs/draino

branch : koobz/tune-golangci-lint

created branch time in 2 months

push eventplanetlabs/draino

david.benque

commit sha 97e808354ab9b489e2d0611c065e617313a5b481

leaderElection token name as parameter

view details

Jacob Straszynski

commit sha fca82190f042103582e51347aeb8e0905c63bceb

Merge pull request #87 from DataDog/david.benque/leader-election-token-name-as-param leaderElection token name as parameter

view details

push time in 2 months

PR merged planetlabs/draino

leaderElection token name as parameter

In some cases we may have to run multiple draino because we need different configuration. (See #62) In case the different instances run in the same namespace they may conflict on the leader election token.

This PR allows to specify a name for the leaderElection lock resource via program parameter.

+3 -1

0 comment

2 changed files

dbenque

pr closed time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 and limitations under the License. package kubernetes  import (+	"fmt"+	"log" 	"strings" 	"time"  	core "k8s.io/api/core/v1" 	"k8s.io/apimachinery/pkg/types"++	"github.com/antonmedv/expr" ) -// NewNodeLabelFilter returns a filter that returns true if the supplied object-// is a node with all of the supplied labels.-func NewNodeLabelFilter(labels map[string]string) func(o interface{}) bool {+// NewNodeLabelFilter returns a filter that returns true if the supplied node satisfies the boolean expression+func NewNodeLabelFilter(expressionStr *string) (func(o interface{}) bool, error) {+ 	return func(o interface{}) bool {+		//This feels wrong but this is how the previous behavior worked so I'm only keeping it to maintain compatibility.+		if *expressionStr == "" {+			return true+		}+ 		n, ok := o.(*core.Node) 		if !ok { 			return false 		}-		for k, v := range labels {-			if value, ok := n.GetLabels()[k]; value != v || !ok {-				return false-			}++		nodeLabels := n.GetLabels()++		parameters := map[string]interface{}{+			"metadata": map[string]map[string]string{+				"labels": nodeLabels,+			}, 		}-		return true-	}++		expression, err := expr.Compile(*expressionStr, expr.Env(parameters))+		if err != nil {+			log.Fatalf("Could not compile the node label expression: %v", err)

Re: the last point, saw you have a test for that.

cpoole

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 and limitations under the License. package kubernetes  import (+	"fmt"+	"log" 	"strings" 	"time"  	core "k8s.io/api/core/v1" 	"k8s.io/apimachinery/pkg/types"++	"github.com/antonmedv/expr" ) -// NewNodeLabelFilter returns a filter that returns true if the supplied object-// is a node with all of the supplied labels.-func NewNodeLabelFilter(labels map[string]string) func(o interface{}) bool {+// NewNodeLabelFilter returns a filter that returns true if the supplied node satisfies the boolean expression+func NewNodeLabelFilter(expressionStr *string) (func(o interface{}) bool, error) {+ 	return func(o interface{}) bool {+		//This feels wrong but this is how the previous behavior worked so I'm only keeping it to maintain compatibility.+		if *expressionStr == "" {+			return true+		}+ 		n, ok := o.(*core.Node) 		if !ok { 			return false 		}-		for k, v := range labels {-			if value, ok := n.GetLabels()[k]; value != v || !ok {-				return false-			}++		nodeLabels := n.GetLabels()++		parameters := map[string]interface{}{+			"metadata": map[string]map[string]string{+				"labels": nodeLabels,+			}, 		}-		return true-	}++		expression, err := expr.Compile(*expressionStr, expr.Env(parameters))+		if err != nil {+			log.Fatalf("Could not compile the node label expression: %v", err)
  • Looks like we could compile the program with no env outside of the callback function body and fail early there.
  • Saw your test case with a field that does not exist.
  • This may be useful to ensure the Expr provided evaluates to a boolean: https://godoc.org/github.com/antonmedv/expr#AsBool
cpoole

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 import ( 	"testing" 	"time" +	"gotest.tools/assert" 	core "k8s.io/api/core/v1" 	meta "k8s.io/apimachinery/pkg/apis/meta/v1" )  func TestNodeLabelFilter(t *testing.T) {

Would prefer the original test cases intact since we still support the old syntax.

cpoole

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 and limitations under the License. package kubernetes  import (+	"fmt"+	"log" 	"strings" 	"time"  	core "k8s.io/api/core/v1" 	"k8s.io/apimachinery/pkg/types"++	"github.com/antonmedv/expr" ) -// NewNodeLabelFilter returns a filter that returns true if the supplied object-// is a node with all of the supplied labels.-func NewNodeLabelFilter(labels map[string]string) func(o interface{}) bool {+// NewNodeLabelFilter returns a filter that returns true if the supplied node satisfies the boolean expression+func NewNodeLabelFilter(expressionStr *string) (func(o interface{}) bool, error) {+ 	return func(o interface{}) bool {+		//This feels wrong but this is how the previous behavior worked so I'm only keeping it to maintain compatibility.+		if *expressionStr == "" {+			return true+		}+ 		n, ok := o.(*core.Node) 		if !ok { 			return false 		}-		for k, v := range labels {-			if value, ok := n.GetLabels()[k]; value != v || !ok {-				return false-			}++		nodeLabels := n.GetLabels()++		parameters := map[string]interface{}{+			"metadata": map[string]map[string]string{+				"labels": nodeLabels,+			}, 		}-		return true-	}++		expression, err := expr.Compile(*expressionStr, expr.Env(parameters))+		if err != nil {+			log.Fatalf("Could not compile the node label expression: %v", err)

log.Fatalf here is a bit unfortunate - it'll cause the program to exit but it's kind of buried in a function that shouldn't have this kind of side effect.

Can we error up front if the expression is invalid? It looks like we don't get the parameters until we've obtained a particular node.

Curious what happens if the metadata.labels.<key> does not exist for a specific expression as well.

cpoole

comment created time in 2 months

PullRequestReviewEvent

pull request commentplanetlabs/draino

Allow node label to be a flexible expression

@cpoole I'm going to check in around 9:30-3:00 PST tomorrow to drive this to a conclusion. Happy to get this merged vs continuing to linger in the bike shed.

cpoole

comment created time in 2 months

push eventplanetlabs/draino

Cezar Sa Espinola

commit sha 8a2492000a547a9182754e5b53e62bbb3c110c76

Add test case to expose race conditions in drain scheduler The new test helps exposing a current race condition while also helping preventing similar races in calls `HasSchedule` interacting with in-progress draining operations. The race condition also happens during normal draino operation because the event handler may call `HasSchedule` multiple times for an already scheduled node. Data race sample: ``` $ go test -race Now: 2020-08-20T16:22:10-03:00 ================== WARNING: DATA RACE Write at 0x00c0000c8098 by goroutine 16: github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).newSchedule.func1() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:125 +0x235 Previous read at 0x00c0000c8098 by goroutine 12: github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).HasSchedule() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:59 +0x142 github.com/planetlabs/draino/internal/kubernetes.TestDrainSchedules_HasSchedule_Polling() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule_test.go:108 +0x54a testing.tRunner() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb Goroutine 16 (running) created at: time.goFunc() /usr/local/Cellar/go/1.14.5/libexec/src/time/sleep.go:168 +0x51 Goroutine 12 (running) created at: testing.(*T).Run() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1042 +0x660 testing.runTests.func1() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1284 +0xa6 testing.tRunner() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb testing.runTests() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1282 +0x527 testing.(*M).Run() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1199 +0x2ff main.main() _testmain.go:66 +0x223 ================== ================== WARNING: DATA RACE Write at 0x00c0000c8099 by goroutine 16: github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).newSchedule.func1() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:131 +0x65b Previous read at 0x00c0000c8099 by goroutine 12: github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).HasSchedule() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:59 +0x15f github.com/planetlabs/draino/internal/kubernetes.TestDrainSchedules_HasSchedule_Polling() /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule_test.go:108 +0x54a testing.tRunner() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb Goroutine 16 (running) created at: time.goFunc() /usr/local/Cellar/go/1.14.5/libexec/src/time/sleep.go:168 +0x51 Goroutine 12 (running) created at: testing.(*T).Run() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1042 +0x660 testing.runTests.func1() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1284 +0xa6 testing.tRunner() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb testing.runTests() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1282 +0x527 testing.(*M).Run() /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1199 +0x2ff main.main() _testmain.go:66 +0x223 ================== --- FAIL: TestDrainSchedules_HasSchedule_Polling (11.04s) testing.go:906: race detected during execution of test FAIL exit status 1 FAIL github.com/planetlabs/draino/internal/kubernetes 13.829s ```

view details

Cezar Sa Espinola

commit sha 49918a3b24e4feb73107fbe2dd7ec615f3408f00

Fix data race in drain scheduler This commit fixes a data race when calling HasSchedule. This happens because the `time.AfterFunc` function is called by a new goroutine and it updates fields which may be read in parallel by `HasSchedule`. The `running` return value from `HasSchedule` was removed to simplify the code as it wasn't used anywhere.

view details

Jacob Straszynski

commit sha ead5f1c5ec5142e84a2b7d93bece3736e60fbd58

Merge pull request #86 from cezarsa/fixrace Fix data race in drain scheduler

view details

push time in 2 months

PR merged planetlabs/draino

Fix data race in drain scheduler

This commit fixes a data race when calling HasSchedule. This happens because the time.AfterFunc function is called by a new goroutine and it updates fields which may be read in parallel by HasSchedule.

A new test was added to more easily trigger the data race condition however it also happens during normal draino operation because the event handler may call HasSchedule multiple times for an already scheduled node.

The running return value from HasSchedule was removed to simplify the code as it wasn't used anywhere.

Data race sample:

$ go test -race
Now: 2020-08-20T11:00:04-03:00
==================
WARNING: DATA RACE
Write at 0x00c00026c098 by goroutine 15:
  github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).newSchedule.func1()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:125 +0x235

Previous read at 0x00c00026c098 by goroutine 12:
  github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).HasSchedule()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:59 +0x142
  github.com/planetlabs/draino/internal/kubernetes.TestDrainSchedules_Schedule_Polling()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule_test.go:106 +0x54a
  testing.tRunner()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb

Goroutine 15 (running) created at:
  time.goFunc()
      /usr/local/Cellar/go/1.14.5/libexec/src/time/sleep.go:168 +0x51

Goroutine 12 (running) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1042 +0x660
  testing.runT  testing.tRunner()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb
  testing.runTests()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1282 +0x527
  testing.(*M).Run()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1199 +0x2ff
  main.main()
      _testmain.go:66 +0x223
==================
==================
WARNING: DATA RACE
Write at 0x00c00026c099 by goroutine 15:
  github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).newSchedule.func1()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:131 +0x65b

Previous read at 0x00c00026c099 by goroutine 12:
  github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).HasSchedule()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule.go:59 +0x15f
  github.com/planetlabs/draino/internal/kubernetes.TestDrainSchedules_Schedule_Polling()
      /Users/cezarsa/code/draino/internal/kubernetes/drainSchedule_test.go:106 +0x54a
  testing.tRunner()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb

Goroutine 15 (running) created at:
  time.goFunc()
      /usr/local/Cellar/go/1.14.5/libexec/src/time/sleep.go:168 +0x51

Goroutine 12 (running) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1042 +0x660
  testing.runTests.func1()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1284 +0xa6
  testing.tRunner()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:991 +0x1eb
  testing.runTests()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1282 +0x527
  testing.(*M).Run()
      /usr/local/Cellar/go/1.14.5/libexec/src/testing/testing.go:1199 +0x2ff
  main.main()
      _testmain.go:66 +0x223
==================
--- FAIL: TestDrainSchedules_Schedule_Polling (11.04s)
    testing.go:906: race detected during execution of test
FAIL
exit status 1
FAIL	github.com/planetlabs/draino/internal/kubernetes	13.921s
+66 -17

4 comments

3 changed files

cezarsa

pr closed time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 func main() { 	}  	cf := cache.FilteringResourceEventHandler{FilterFunc: kubernetes.NewNodeConditionFilter(*conditions), Handler: h}-	lf := cache.FilteringResourceEventHandler{FilterFunc: kubernetes.NewNodeLabelFilter(*nodeLabels), Handler: cf}-	nodes := kubernetes.NewNodeWatch(cs, lf)++	var nodeLabelFilter cache.ResourceEventHandler++	log.Sugar().Infof("label expression: %v", nodeLabelsExpr)+	nodeLabelFilterFunc, err := kubernetes.NewNodeLabelFilter(nodeLabelsExpr)

It feels like we could retain nodeLabels by e.g. using the expression builder to convert the old-style syntax to the new. I hacked together a little demo: https://play.golang.org/p/rgS2rPm1SEP

cpoole

comment created time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 and limitations under the License. package kubernetes  import (+	"log" 	"strings" 	"time"  	core "k8s.io/api/core/v1" 	"k8s.io/apimachinery/pkg/types"++	"github.com/antonmedv/expr" ) -// NewNodeLabelFilter returns a filter that returns true if the supplied object-// is a node with all of the supplied labels.-func NewNodeLabelFilter(labels map[string]string) func(o interface{}) bool {+// NewNodeLabelFilter returns a filter that returns true if the supplied node satisfies the boolean expression+func NewNodeLabelFilter(expressionStr *string) (func(o interface{}) bool, error) {+ 	return func(o interface{}) bool {+		//This feels wrong but this is how the previous behavior worked so I'm only keeping it to maintain compatibility.+		if *expressionStr == "" {+			return true+		}+ 		n, ok := o.(*core.Node) 		if !ok { 			return false 		}-		for k, v := range labels {-			if value, ok := n.GetLabels()[k]; value != v || !ok {-				return false-			}++		nodeLabels := n.GetLabels()++		parameters := make(map[string]interface{}, 8)++		parameters["node_labels"] = nodeLabels

I could see someone coming along with a use-case where anything in metadata is fair game. Heck, why not any part of the spec? :trollface: So I don't think that needs to be implemented up front, but doing it in a forwards compatible manner would it be possible to have:

parameters["metadata"]["labels"] = nodeLabels

I threw this up in a go playground: https://play.golang.org/p/jrrQfyOH5P7

The goal would be that as use-cases come up for okaylisting parts of the node spec, there'd be an intuitive mapping between the shape of kubectl get node <node> -o yaml and our syntax.

cpoole

comment created time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 Flags:       --max-grace-period=8m0s    Maximum time evicted pods will be given to terminate gracefully.       --eviction-headroom=30s    Additional time to wait after a pod's termination grace period for it to have been deleted.       --drain-buffer=10m0s       Minimum time between starting each drain. Nodes are always cordoned immediately.-      --node-label=KEY=VALUE ...

Can we retain this (and relevant implementation) and update the help text to (Deprecated) Only nodes with this label...

cpoole

comment created time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 func (d *DrainSchedules) newSchedule(node *v1.Node, when time.Time) *schedule { 			SetConditionRetryPeriod, 			SetConditionTimeout, 		); err != nil {-			log.Error("Failed to place condition following drain success")+			d.eventRecorder.Eventf(nr, core.EventTypeWarning, eventReasonDrainFailed, "Failed to place drain condition: %v", err)+			log.Sugar().Errorf("Faile to place condition following drain success : %v", err)

Generally prefer the structured logging to the sugared logger.

cpoole

comment created time in 2 months

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

+#!/bin/bash+./draino --kubeconfig ~/.kube/config --node-label=node-role=default --node-label=node-role=default-core --node-label=node-role=default-compute --node-label=node-role=default-memory --node-label-logic=OR --evict-unreplicated-pods --evict-emptydir-pods --evict-daemonset-pods AMIProblem KernelDeadlock ReadonlyFilesystem OutOfDisk 

Appears outdated now.

cpoole

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Allow node label to be a flexible expression

 func (d *DrainSchedules) newSchedule(node *v1.Node, when time.Time) *schedule { 			SetConditionRetryPeriod, 			SetConditionTimeout, 		); err != nil {-			log.Error("Failed to place condition following drain success")+			d.eventRecorder.Eventf(nr, core.EventTypeWarning, eventReasonDrainFailed, "Failed to place drain condition: %v", err)+			log.Sugar().Errorf("Faile to place condition following drain success : %v", err)

s/Faile/Failed

cpoole

comment created time in 2 months

PullRequestReviewEvent

pull request commentplanetlabs/draino

Allow node label to be a flexible expression

I hadn't looked at the push script before and was hoping that we didn't publish the latest tag - might have made me bolder here. I suppose the land shifting and breaking beneath your feet is to be expected for folks relying on latest. Unfortunately, I don't have a mailing list of draino users to even attempt to provide the courtesy of a heads up here.

I'll find some time to review things. I suspect we might have some way to put this new syntax behind a new command line flag and have the old-style flag "converted" to your expression compiler to reduce duplication. That'd let us avoid any migration mess.

cpoole

comment created time in 2 months

pull request commentplanetlabs/draino

Fix data race in drain scheduler

To confirm: the data race you saw occurred after adding the test case, but before making the material changes to the implementation of setFailed?

cezarsa

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentplanetlabs/draino

Fix data race in drain scheduler

 func TestDrainSchedules_Schedule(t *testing.T) { 			// Deleting schedule 			scheduler.DeleteSchedule(tt.node.Name) 			// Check that node is no more scheduled for drain-			hasSchedule, _, _ = scheduler.HasSchedule(tt.node.Name)+			hasSchedule, _ = scheduler.HasSchedule(tt.node.Name) 			if hasSchedule { 				t.Errorf("Node %v should not been scheduled anymore", tt.node.Name) 			} 		}) 	} }++type failDrainer struct {+	NoopCordonDrainer+}++func (d *failDrainer) Drain(n *v1.Node) error { return errors.New("myerr") }++func TestDrainSchedules_Schedule_Polling(t *testing.T) {+	scheduler := NewDrainSchedules(&failDrainer{}, &record.FakeRecorder{}, 0, zap.NewNop())+	node := &v1.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}++	when, err := scheduler.Schedule(node)+	if err != nil {+		t.Fatalf("DrainSchedules.Schedule() error = %v", err)+	}++	timeout := time.After(time.Until(when) + time.Minute)+	for {+		hasSchedule, failed := scheduler.HasSchedule(node.Name)+		if !hasSchedule {+			t.Fatalf("Missing schedule record for node %v", node.Name)+		}+		if failed {

I'd suggest just adding a brief comment before the break statement.

cezarsa

comment created time in 2 months

Pull request review commentplanetlabs/draino

Fix data race in drain scheduler

 func TestDrainSchedules_Schedule(t *testing.T) { 			// Deleting schedule 			scheduler.DeleteSchedule(tt.node.Name) 			// Check that node is no more scheduled for drain-			hasSchedule, _, _ = scheduler.HasSchedule(tt.node.Name)+			hasSchedule, _ = scheduler.HasSchedule(tt.node.Name) 			if hasSchedule { 				t.Errorf("Node %v should not been scheduled anymore", tt.node.Name) 			} 		}) 	} }++type failDrainer struct {+	NoopCordonDrainer+}++func (d *failDrainer) Drain(n *v1.Node) error { return errors.New("myerr") }++func TestDrainSchedules_Schedule_Polling(t *testing.T) {+	scheduler := NewDrainSchedules(&failDrainer{}, &record.FakeRecorder{}, 0, zap.NewNop())+	node := &v1.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}++	when, err := scheduler.Schedule(node)+	if err != nil {+		t.Fatalf("DrainSchedules.Schedule() error = %v", err)+	}++	timeout := time.After(time.Until(when) + time.Minute)+	for {+		hasSchedule, failed := scheduler.HasSchedule(node.Name)+		if !hasSchedule {+			t.Fatalf("Missing schedule record for node %v", node.Name)+		}+		if failed {+			break+		}+		select {+		case <-time.After(time.Second):+		case <-timeout:

When do we hit the timeout condition vs. falling through via the initial case statement on 114?

cezarsa

comment created time in 2 months

Pull request review commentplanetlabs/draino

Fix data race in drain scheduler

 func TestDrainSchedules_Schedule(t *testing.T) { 			// Deleting schedule 			scheduler.DeleteSchedule(tt.node.Name) 			// Check that node is no more scheduled for drain-			hasSchedule, _, _ = scheduler.HasSchedule(tt.node.Name)+			hasSchedule, _ = scheduler.HasSchedule(tt.node.Name) 			if hasSchedule { 				t.Errorf("Node %v should not been scheduled anymore", tt.node.Name) 			} 		}) 	} }++type failDrainer struct {+	NoopCordonDrainer+}++func (d *failDrainer) Drain(n *v1.Node) error { return errors.New("myerr") }++func TestDrainSchedules_Schedule_Polling(t *testing.T) {+	scheduler := NewDrainSchedules(&failDrainer{}, &record.FakeRecorder{}, 0, zap.NewNop())+	node := &v1.Node{ObjectMeta: meta.ObjectMeta{Name: nodeName}}++	when, err := scheduler.Schedule(node)+	if err != nil {+		t.Fatalf("DrainSchedules.Schedule() error = %v", err)+	}++	timeout := time.After(time.Until(when) + time.Minute)+	for {+		hasSchedule, failed := scheduler.HasSchedule(node.Name)+		if !hasSchedule {+			t.Fatalf("Missing schedule record for node %v", node.Name)+		}+		if failed {

So - we want failed when the test passes (counter-intuitive on the first read) because we're inducing an error with the failDrainer. The reason we're doing this is because we specifically want to exercise the atomic setFailed.

cezarsa

comment created time in 2 months

push eventplanetlabs/draino

Cezar Sa Espinola

commit sha 4d0cdaaf5a096bb7b171106fba0f7dd7f89da68d

helm: Move some securityContext fields to container's securityContext The `privileged` and `readOnlyRootFilesystem` fields are not valid in the pod's security context and can only be used in the container's security context. Trying to apply the generated resource would cause the error: ``` error validating data: [ValidationError(Deployment.spec.template.spec.securityContext): unknown field "privileged" in io.k8s.api.core.v1.PodSecurityContext, ValidationError(Deployment.spec.template.spec.securityContext): unknown field "readOnlyRootFilesystem" in io.k8s.api.core.v1.PodSecurityContext]; if you choose to ignore these errors, turn validation off with --validate=false ```

view details

Cezar Sa Espinola

commit sha a822269a1949878f5034308b3ba38ff5bf620eba

Refactor CI in parallel jobs and add helm validation check

view details

Jacob Straszynski

commit sha 14fd977c453f0b337b0aa63225bcbb1fe670cada

Merge pull request #83 from cezarsa/fix-security-context Move invalid pod security context fields in helm chart to container security context

view details

push time in 2 months

PR merged planetlabs/draino

Move invalid pod security context fields in helm chart to container security context

The privileged and readOnlyRootFilesystem fields recently added to the Helm chart are not valid in the pod's security context and can only be used in the container's security context. Trying to apply the generated resource would cause the error:

error validating data: [ValidationError(Deployment.spec.template.spec.securityContext): unknown field "privileged" in io.k8s.api.core.v1.PodSecurityContext, ValidationError(Deployment.spec.template.spec.securityContext): unknown field "readOnlyRootFilesystem" in io.k8s.api.core.v1.PodSecurityContext]; if you choose to ignore these errors, turn validation off with --validate=false

This PR creates a new section in the helm values for container specific security context flags. It also includes a new CI job to validate helm generated resources to help prevent this kind of error in the future.

+21 -3

2 comments

3 changed files

cezarsa

pr closed time in 2 months

pull request commentplanetlabs/draino

Move invalid pod security context fields in helm chart to container security context

Thanks for this @cezarsa !

cezarsa

comment created time in 2 months

push eventplanetlabs/draino

Idan

commit sha 0a7429a95d32d136a87c9f575674d00b3f79e5c6

Remove privileged and make immutable Now that https://github.com/planetlabs/draino/pull/80 is merged, we can harden the deployment even more

view details

Jacob Straszynski

commit sha f32fd0b94a22a41e90d1bdb7b2110ab9a5047b18

Merge pull request #82 from idanlevin/patch-1 Make unprivileged and immutable

view details

push time in 2 months

PR merged planetlabs/draino

Make unprivileged and immutable

Now that https://github.com/planetlabs/draino/pull/80 is merged, we can harden the deployment even more

+2 -0

1 comment

1 changed file

idanlevin

pr closed time in 2 months

more