Ask questionsNon-leader task in taskgroup not being sent interrupt

Nomad version

Nomad v0.9.3 (c5e8b66c3789e4e7f9a83b4e188e9a937eea43ce)

Operating system and Environment details

Amazon Linux 2


Non-leader task in taskgroup receives no interrupt when Nomad decides an allocation should be stopped, say, from a count decrement.

The leader task, which is the task where all the services are registered to Consul, properly handles the shutdown_delay and kill_timeout, but the secondary task receives no interrupt until the leader task dies. Then it receives an interrupt because the leader task is dead.

This is a problem because the leader task is the sidecar proxy which is the source network for the secondary task, the app itself. Since it receives no interrupts, it is unable to perform graceful shutdown actions until it receives its interrupt, which is when the network is gone since it was attached to the leader task.

Reproduction steps

  1. Create a TaskGroup with two tasks, both using the docker driver
  2. Set one task as leader
  3. Set "network_mode": "container:<task_name>-${NOMAD_ALLOC_ID}" for the secondary
  4. Register the healthchecks on the leader
  5. Submit the job
  6. Reduce the count of the job
  7. Verify the leader task gets an interrupt, verify the secondary task gets no interrupt
  8. After the shutdown_delay and leader terminates, verify the secondary task detects leader is gone
  9. Verify the secondary task is sent its interrupt

Leader Task/Proxy: Screen Shot 2019-09-18 at 11 33 13 PM

Secondary Task/App Screen Shot 2019-09-18 at 11 33 27 PM

I believe the expected behavior is every task in a taskgroup should receive an interrupt at the same time.


Answer questions jf

I am seeing this as well - with the latest Nomad version too (Nomad v0.12.4 (8efaee4ba5e9727ab323aaba2ac91c2d7b572d84)). This issue should not have been closed by the bot. Is anybody looking at this?


Related questions

using `exec` driver result into RPC error hot 1
Cannot download private git repo with artifact stanza hot 1
[BUG] cannot allocate memory hot 1
Start Job button in UI breaks with Vault policies and allow_unauthenticated = false hot 1
[question]/[bug]? Consul Agent Default ACL token not used by Nomad Client - nomad hot 1
Allocation left in pending state after node lost hot 1
job still running though received kill signal hot 1
Allocation left running but task failing hot 1
[BUG] cannot allocate memory hot 1
Cannot use consul connect deployed through nomad with envoy proxy on centos 7 hot 1
Spread/affinity values in allocations may generate invalid JSON (and may not be working correctly) hot 1
[BUG] Allocs GC not work when Jobs is deleted by API. hot 1
Support Consul Connect with Consul TLS enabled hot 1
Tasks that are signaled to restart fail due to logmon errors hot 1
nomad stuck on (re)start hot 1
Github User Rank List