profile
viewpoint

ulisse31/starter-web 0

Simple starting point website project based upon Initializr

issue commentaws/aws-node-termination-handler

Dedupe Webhooks for queue-processor

When commenting on #353 I realized I forgot to write down some thoughts on mitigating this:

First, we could dedupe locally within the event store. If an event comes into the event store acting on a node that has already been successfully drained within a 10 minute time period, the event store could immediately mark it as completed. This would only help within a single replica deployment of NTH.

Another approach could run the Event Store as a separate deployment. NTH worker pods could be scaled independently and the same dedupe logic discussed above would scale to more workers. I'm not sure the approach is warranted though considering the relatively low traffic and the increased complexity this would add to NTH.

A third approach could be to try to handle events with a more graceful degradation. If the node doesn't exist or has already been cordoned and drained, then don't send the webhook. Since a single NTH pod can now process events in parallel, it's possible that events could be processed in parallel where the cordon and drain check fails, so a cordon and drain call is initiated while another workers is in the middle of a cordon drain call. In that situation the webhooks would be duplicated. To get around the concurrency issue, we could lock event processing for an individual node, but this would only be easily done within a single pod, so duplicate webhooks could still fire if two different NTH pods are concurrently processing two events for the same node.

Another valid approach is to do nothing and accept duplicate webhooks. In practice, no one has complained about them so far If you're out there and find them frustrating, let us know here!

bwagner5

comment created time in 8 hours

issue commentaws/aws-node-termination-handler

High Availability for queue processor mode

Multiple replicas should at least be configurable. The reason it is 1 right now (it used to be 2), is because of https://github.com/aws/aws-node-termination-handler/issues/297. Duplicate webhooks are more difficult to handle on separate replicas. For example when a Spot ITN is triggered, you'll get the Spot ITN webhook, the ASG termination webhook, and the EC2 instance status change webhook.

The plan to mitigate some of those dupes is to track nodes in-memory and not send hooks on the same node within some time period if they've already drained successfully.

gabegorelick

comment created time in 9 hours

issue commentaws/aws-node-termination-handler

High Availability for queue processor mode

Thinking about this more, for ASG lifecycle hooks I think the lack of HA is mitigated by the fact that the ASG will wait for HeartbeatTimeout seconds before proceeding. So as long as Kubernetes reschedules the pod before then, you should be fine.

But spot instance termination notifications are more time sensitive, so perhaps there's more of a need for HA there?

gabegorelick

comment created time in 4 days

issue openedaws/aws-node-termination-handler

High Availability for queue processor mode

The deployment for queue processor mode only has a single replica, and it's not configurable via Helm.

https://github.com/aws/aws-node-termination-handler/blob/23b97f40987f52a28d533bd7bba7a391d46e7e3b/config/helm/aws-node-termination-handler/templates/deployment.yaml#L10

Is running more replicas possible, or will they conflict?

created time in 4 days

issue closedaws/aws-node-termination-handler

How can we prevent from NTH to receive the events from different clusters?

The context

We deploy NTH v1.10 using the "Queue Processor" flavor. For this purpose we followed the instructions listed in the README.md - AWS Node Termination Handler - Queue Processor (requires AWS IAM Permissions) file.

Given that we have two clusters (staging and production) we created both queues StgQueue and ProdQueue, respectively. Additionally, we configure the rule associated with the aws.ec2event to send events to the appropriate queue depending on the cluster target. The staging rule sends the events to StgQueue while the production rule sends to ProdQueue.

The staging and production clusters were created in the same aws account, and it can be changed.

Here is the thing:

The NTH from staging/production cluster keeps receiving the aws.ec2 events from production/staging cluster (mind the exchange of clusters).

The question:

Are there a way to prevent NTH from receiving events from different clusters?

Disclaimer:

We know it is not a NTH bug but a limitation of the Spot Instance interruption notices. We just would like to known if some of you faced this problem before and how you come up with a solution/workarround.

closed time in 4 days

diegosanchez

issue commentaws/aws-node-termination-handler

How can we prevent from NTH to receive the events from different clusters?

@bwagner5 We implemented the suggested solution successfully. Thanks for the advice about rate limit, as well.

diegosanchez

comment created time in 4 days

issue openedaws/aws-node-termination-handler

Move pods to other nodes

In Kubernetes, draining a node can get stuck due to pod disruption budgets. When this happens, the only way to resolve it is to "surge" a deployment, i.e. add extra pods on uncordoned nodes. In an ideal world, Kubernetes would do this for you, but it doesn't due to https://github.com/kubernetes/kubernetes/issues/66811.

Barring Kubernetes implementing this natively, it would be nice if NTH had a solution. A simple implementation could even be to initiate a rolling restart for every deployment with pods on the draining node. A more complicated solution would be to add pods subject to the deployment's maxSurge setting (and related deployment update settings).

I recognize that this may be complicated to implement correctly. This is the kind of thing that probably should be implemented by a dedicated library that everything that drains nodes can leverage, but I'm not aware of any existing solutions (but it's very possible I just haven't found them yet!).

created time in 5 days

issue commentaws/aws-node-termination-handler

Queue process docs don't describe non-Helm config

Are you planning on consuming NTH from the manifest files in the release assets?

Yep.

If so, what transform tool are you using?

It's complicated. 😄 It's a mix of kustomize and custom templating to fill in the gaps.

We used to have instructions for kustomize, maybe it's time to bring that back for this configuration.

Since kubectl apply -k has become somewhat of a standard, I think it makes sense to document that.

Most people are consuming NTH w/ helm.

You may want to consider making helm the preferred method of installation in the docs (moving it above kubectl apply, maybe adding a sentence about how it's most commonly used). Especially since all the config is only documented in the helm section of the codebase (which is a separate issue).

gabegorelick

comment created time in 5 days

issue commentaws/aws-node-termination-handler

checkASGTagBeforeDraining ignores instances that don't belong to any ASG

I'm not opposed to using the instance tag rather than the ASG tag and changing those parameters. It doesn't change too much on the user's side for configuring it since they can just propagate the ASG tag to the instances.

Checking instance tags would also enable NTH to only manage a subset of instances in an ASG. I don't have a good use case for this, but it's certainly more general than only checking the ASG tags.

As far the detach workflow for spot, I think the best approach today is to use ASGs support for ec2 instance rebalance recommendations.

I haven't used AWS's capacity rebalance feature yet, but my understanding is that it can hang indefinitely if no spot instances can be allocated:

If the new instances fail to launch or they launch but the health check fails, Amazon EC2 Auto Scaling keeps trying to relaunch them. While it is trying to launch new instances, your old ones will eventually be interrupted and forcibly terminated.

Depending on how worried you are about that, I think there's still potentially a use case for an on-demand ASG that you attach and detach spot instances from (basically how AutoSpotting works).

Anyway, on-demand instances without ASGs is also possible in Kubernetes (albeit not typically recommended today). See the "Instances without Auto Scaling groups" section of https://aws.amazon.com/blogs/containers/amazon-eks-cluster-multi-zone-auto-scaling-groups/ for example.

gabegorelick

comment created time in 5 days

issue commentaws/aws-node-termination-handler

Queue process docs don't describe non-Helm config

that's a good point. Are you planning on consuming NTH from the manifest files in the release assets? If so, what transform tool are you using? We used to have instructions for kustomize, maybe it's time to bring that back for this configuration.

Most people are consuming NTH w/ helm.

gabegorelick

comment created time in 5 days

issue commentaws/aws-node-termination-handler

checkASGTagBeforeDraining ignores instances that don't belong to any ASG

I'm not opposed to using the instance tag rather than the ASG tag and changing those parameters. It doesn't change too much on the user's side for configuring it since they can just propagate the ASG tag to the instances.

As far the detach workflow for spot, I think the best approach today is to use ASGs support for ec2 instance rebalance recommendations. EC2 will send a EventBridge event when a capacity rebalance is recommended for your spot instance (before the 2 minute interruption warning), and ASG will automatically launch a new instance to replace it and then terminate the old one. https://docs.aws.amazon.com/autoscaling/ec2/userguide/capacity-rebalance.html

gabegorelick

comment created time in 5 days

push eventaws/aws-node-termination-handler

Gabe Gorelick

commit sha 23b97f40987f52a28d533bd7bba7a391d46e7e3b

Clarify podTerminationGracePeriod docs (#351) Fixes #348

view details

push time in 5 days

PR merged aws/aws-node-termination-handler

Clarify podTerminationGracePeriod docs

Issue #, if available: Fixes #348

Description of changes: Clarify that grace period defaults to -1, not 30. But Kubernetes defaults to 30.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. ✅

@bwagner5 Let me know if you want to tweak the language.

+2 -2

2 comments

1 changed file

gabegorelick

pr closed time in 5 days

issue closedaws/aws-node-termination-handler

Docs claim podTerminationGracePeriod default is 30 but it's actually -1

https://github.com/aws/aws-node-termination-handler/tree/e0260bd31fc6267fb04b32a0adf920983c436644/config/helm/aws-node-termination-handler has the following: image

But looking at the code, the default is actually -1: https://github.com/aws/aws-node-termination-handler/blob/d8460d24b2f06b1b7ce6f8f108e8adc9dbd725a0/pkg/config/config.go#L39

I think the confusion stems from the fact that Kubernetes' default is 30, so transitively NTH's default is 30. But if your pod has a custom value, NTH will not override that by default. This could be made clearer in the docs.

closed time in 5 days

gabegorelick

pull request commentaws/aws-node-termination-handler

Clarify podTerminationGracePeriod docs

No worries, looks good!

gabegorelick

comment created time in 5 days

pull request commentaws/aws-node-termination-handler

Clarify podTerminationGracePeriod docs

@bwagner5 Sorry, I rebased right after you merged.

gabegorelick

comment created time in 5 days

PR opened aws/aws-node-termination-handler

Clarify podTerminationGracePeriod docs

Issue #, if available: Fixes #348

Description of changes: Clarify that grace period defaults to -1, not 30. But Kubernetes defaults to 30.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. ✅

@bwagner5 Let me know if you want to tweak the language.

+2 -2

0 comment

1 changed file

pr created time in 5 days

push eventaws/aws-node-termination-handler

Gabe Gorelick

commit sha 8b6d84362cd8ecfe39cf2f2b091c39c40c4f9c49

Document that deleteLocalData defaults to true, not false (#350) Fixes #349

view details

push time in 5 days

PR merged aws/aws-node-termination-handler

Document that deleteLocalData defaults to true, not false

Issue #, if available: Fixes #349

Description of changes: Fix incorrect default in documentation for deleteLocalData

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. ✅

+1 -1

0 comment

1 changed file

gabegorelick

pr closed time in 5 days

issue closedaws/aws-node-termination-handler

Docs incorrectly say deleteLocalData defaults to false

Helm documentation says deleteLocalData defaults to false: image

But in the code, it looks like it defaults to true: https://github.com/aws/aws-node-termination-handler/blob/d8460d24b2f06b1b7ce6f8f108e8adc9dbd725a0/pkg/config/config.go#L140

Furthermore, https://github.com/aws/aws-node-termination-handler/issues/20#issuecomment-558309920 says the default should be true.

closed time in 5 days

gabegorelick

issue commentaws/aws-node-termination-handler

Docs incorrectly say deleteLocalData defaults to false

@bwagner5 https://github.com/aws/aws-node-termination-handler/pull/350

gabegorelick

comment created time in 5 days

PR opened aws/aws-node-termination-handler

Document that deleteLocalData defaults to true, not false

Issue #, if available: Fixes #349

Description of changes: Fix incorrect default in documentation for deleteLocalData

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. ✅

+1 -1

0 comment

1 changed file

pr created time in 5 days

issue commentaws/aws-node-termination-handler

Docs claim podTerminationGracePeriod default is 30 but it's actually -1

that's a fair point. Maybe changing that column to -1 and then adding a note that kubernetes defaults to 30 seconds?

Would you care to open a PR for that change too?

gabegorelick

comment created time in 5 days

issue commentaws/aws-node-termination-handler

Docs incorrectly say deleteLocalData defaults to false

ah yes, you are right the helm docs are incorrect. Would you like to open a PR to change it to true ?

gabegorelick

comment created time in 5 days

issue openedaws/aws-node-termination-handler

Docs incorrectly say deleteLocalData defaults to false

Helm documentation says deleteLocalData defaults to false: image

But in the code, it looks like it defaults to true: https://github.com/aws/aws-node-termination-handler/blob/d8460d24b2f06b1b7ce6f8f108e8adc9dbd725a0/pkg/config/config.go#L140

Furthermore, https://github.com/aws/aws-node-termination-handler/issues/20#issuecomment-558309920 says the default should be true.

created time in 5 days

issue openedaws/aws-node-termination-handler

Docs claim podTerminationGracePeriod default is 30 but it's actually -1

https://github.com/aws/aws-node-termination-handler/tree/e0260bd31fc6267fb04b32a0adf920983c436644/config/helm/aws-node-termination-handler has the following: image

But looking at the code, the default is actually -1: https://github.com/aws/aws-node-termination-handler/blob/d8460d24b2f06b1b7ce6f8f108e8adc9dbd725a0/pkg/config/config.go#L39

I think the confusion stems from the fact that Kubernetes' default is 30, so transitively NTH's default is 30. But if your pod has a custom value, NTH will not override that by default. This could be made clearer in the docs.

created time in 6 days

issue openedaws/aws-node-termination-handler

checkASGTagBeforeDraining ignores instances that don't belong to any ASG

When checkASGTagBeforeDraining is enabled (the default), NTH's queue processor will ignore events for instances that don't belong to an ASG with the configured tag. However, it also appears [1] to ignore events for instances that don't belong to any ASG.

Obviously, ASG lifecycle event handling isn't relevant if an instance doesn't belong to an ASG. But spot termination event handling is. Furthermore, detaching soon-to-be-terminated spot instances from an ASG is actually a pretty common workflow [2].

Relevant code: https://github.com/aws/aws-node-termination-handler/blob/e0260bd31fc6267fb04b32a0adf920983c436644/pkg/monitor/sqsevent/sqs-monitor.go#L207-L222

Solution: check the tags on any instance not in an ASG, although then managedAsgTag arguably becomes misleading. IMHO, it would be cleaner if NTH only checked instance tags (not ASG tags), and renamed the settings to checkInstanceTagBeforeDraining and managedInstanceTag, but that ship may have sailed.

[1] I haven't tested this end-to-end on AWS yet, so this is only based on my reading of the code. Apologies in advance if my understanding is incorrect!

[2] Detaching instances isn't supported by NTH (https://github.com/aws/aws-node-termination-handler/issues/141), but it is used by other tools like https://github.com/AutoSpotting/AutoSpotting which can be used in parallel with NTH.

created time in 6 days

issue openedaws/aws-node-termination-handler

Queue process docs don't describe non-Helm config

The docs for setting up NTH in queue processor mode don't make it clear that patches must be applied to the YAML to set queue URL and enableSqsTerminationDraining. You have to read the Helm section further down to realize that these settings must be applied, and then figure out where in the manifests those changes should be made.

image

I think the "You can use kubectl to directly add all of the above resources with the default configuration into your cluster" section was copied from the IMDS docs. But the difference is that IMDS mode doesn't require any config.

created time in 6 days

issue commentaws/aws-node-termination-handler

Feature Request: Detatch Instance from ASG upon Termination Event

First, the bad news. The current design of Node Termination Handler does not allow us to manipulate AWS resources like ASGs. In order for NTH to do that, we would need a more complicated setup process to get credentials. We've discussed at length in this issue #14 about why we chose to do this. Basically, we would need to create a kubernetes controller or operator rather than a daemonset, which is what we have now.

Is this still applicable now that queue processor mode is a thing?

jtcressy

comment created time in 6 days

issue commentaws/aws-node-termination-handler

Add Functionality to trigger a drain and cordon by creating a custom resource

Yes did consider this approach... but probably the final piece in the puzzle is to have the scheduler check if all pods are scheduled before terminating the node and if they are not to wait for the cluster autoscaler to scale up a node and then to terminate the instance in the asg with the 'should-decrement-desired-capacity' set so that the asg is scaled back down.

dgr237

comment created time in 7 days

more