profile
viewpoint
Drew Erny dperny @Mirantis Tuscaloosa, AL http://www.dperny.net FLOSS Machine. Outside of my official work obligations, I am happy to help anyone contribute to the various projects I work on in any way I can! Get in touch!

docker/classicswarm 5833

Swarm Classic: a container clustering system

docker/swarmkit 2382

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.

docker/leadership 140

Distributed Leader Election using docker/libkv

docker/swarm-library-image 29

Docker Official Image packaging for Swarm

dperny/cs403-assignments 2

Programming assignments from CS 403 at The University of Alabama with Dr. John Lusth

dperny/100pUtils 0

A collection of short programs, scrips, and libraries for common 100p operations.

dperny/binarytree 0

A Python3 class for a binary tree

dperny/caboose-cms 0

Ruby on rails content management system

PR opened docker/swarmkit

[Volumes] Add protos for cluster volumes

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/docker/swarmkit/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

Add protos for cluster volume support. This pull request is against a new feature branch, feature-volumes, which will be merged into master at a later date.

- How I did it

Largely follows the proposal in moby/moby#39624, with one key exception: instead of directly importing CSI protos, it copies the needed definitions.

Because the go code for CSI protos is generated by the default go proto generator, it has subtle incompatibilities with the swarmkit protos, which are generated using gogo. This is mainly related to the absence of gogo methods like Size and Marshal on the CSI generated code. While I'm certain there is a workaround, for now, the most straightforward solution is just copying them in.

- How to test it

Because these are just proto definitions and the associated generated code, the fact that it compiles is evidence enough of its correctness.

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: -->

+10695 -4304

0 comment

11 changed files

pr created time in 7 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 1ab176cdf8e886ac2a16fe6a623f27a589ff8f8d

Add protos for cluster volumes Adds the protocol buffer definitions for cluster volumes and CSI support. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 7 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 5d0c77e4f8791b7bd3964c9fb03b3db67c59e8d2

Add protos for cluster volumes Adds the protocol buffer definitions for cluster volumes and CSI support. Also updates some vendoring as needed. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 7 days

create barnchdocker/swarmkit

branch : feature-volumes

created branch time in 7 days

delete branch docker/swarmkit

delete branch : feature-jobs

delete time in 7 days

create barnchdperny/swarmkit-1

branch : swarm-volume-protos

created branch time in 10 days

create barnchdperny/spec

branch : add-doc-dot-go

created branch time in 11 days

fork dperny/spec

Container Storage Interface (CSI) Specification.

fork in 11 days

pull request commentdocker/cli

Add jobs support to CLI

Yes. I fixed docker service scale to work with replicated jobs, and I reworded some docs to address general comments (although I can't now remember exactly what I reworded. I think it had to do with restart-condition).

dperny

comment created time in 24 days

push eventdperny/cli

Sebastiaan van Stijn

commit sha 1fefbdc29c2f4468318bb81675cb55d80c3e12b8

Dockerfile.e2e: don't show progress, force TLS, and follow redirects Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha aad9d2c958b24639ea1d6dd46e73b2fdb36f4b62

Fix builder prune -a/--all flag description Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 26d71f9e0224916034a4de5f1037e0dee2311e22

Bump version to 20.03.0-dev Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 1edb10fe3088a60f2dd962c93d30fcc9dda3802c

vendor: bump golang.org/x/crypto bac4c82f6975 (CVE-2020-9283) full diff: https://github.com/golang/crypto/compare/1d94cc7ab1c630336ab82ccb9c9cda72a875c382...bac4c82f69751a6dd76e702d54b3ceb88adab236 Version v0.0.0-20200220183623-bac4c82f6975 of golang.org/x/crypto fixes a vulnerability in the golang.org/x/crypto/ssh package which allowed peers to cause a panic in SSH servers that accept public keys and in any SSH client. An attacker can craft an ssh-ed25519 or sk-ssh-ed25519@openssh.com public key, such that the library will panic when trying to verify a signature with it. Clients can deliver such a public key and signature to any golang.org/x/crypto/ssh server with a PublicKeyCallback, and servers can deliver them to any golang.org/x/crypto/ssh client. This issue was discovered and reported by Alex Gaynor, Fish in a Barrel, and is tracked as CVE-2020-9283. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 11869fa42a630dd8dc540388669631f33b76fb20

fix panic on single-character volumes Before this change, this would cause a panic: docker run -it --rm -v 1:/1 alpine panic: runtime error: index out of range goroutine 1 [running]: github.com/docker/cli/cli/compose/loader.isFilePath(0xc42027e058, 0x1, 0x557dcb978c20) ... After this change, a correct error is returned: docker run -it --rm -v 1:/1 alpine docker: Error response from daemon: create 1: volume name is too short, names should be at least two alphanumeric characters. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Silvin Lubecki

commit sha 155b459e6d0a60c7fd7dae7590dfd328a8f65bf1

Merge pull request #2354 from thaJeztah/bump_crypto_security vendor: bump golang.org/x/crypto bac4c82f6975 (CVE-2020-9283)

view details

Silvin Lubecki

commit sha 59eee680159b0e265e7dd02c863d18d86b96acda

Merge pull request #2355 from thaJeztah/fix_bind_panic fix panic on single-character volumes

view details

Silvin Lubecki

commit sha edcbfc8c12d8c6215254fcd452983f8bbffdf068

Merge pull request #2353 from thaJeztah/bump_version Bump version to 20.03.0-dev

view details

Sebastiaan van Stijn

commit sha 4cd4305b3122911843dd416656b08db241326c74

docs: add redirect for old location of daemon reference Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 740919cc7fc02e2aebfdf2f2d128bebc8bc9d668

Merge pull request #2351 from thaJeztah/add_daemon_redirect docs: add redirect for old location of daemon reference

view details

Sebastiaan van Stijn

commit sha 5ef0fa10de2bbb23eef9fe87a9530caf3a4718d8

gofmt compose loader test Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 2c0e93063bc61dae2e96455186e99fe1c56c85f5

bump gotest.tools v3.0.1 for compatibility with Go 1.14 full diff: https://github.com/gotestyourself/gotest.tools/compare/v2.3.0...v3.0.1 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 4313c8b3c6f3702acd81aa780d87b2b6c72e9b95

Update Golang 1.13.8 Also pinning the e2e image to the "buster" variant, which is what's currently used, but making it explicit. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Silvin Lubecki

commit sha e67b4b4be2cf605e53f5388d16d26efb363847f5

Merge pull request #2335 from thaJeztah/silence_curl_and_follow_redirects Dockerfile.e2e: don't show progress, force TLS, and follow redirects

view details

Silvin Lubecki

commit sha 00264b952c7bf10f1b6ad3a3743a658de239b299

Merge pull request #2358 from thaJeztah/gotest_v3 bump gotest.tools v3.0.1 for compatibility with Go 1.14

view details

Silvin Lubecki

commit sha 30d6ee997bc174a2f2e3378ac82bec00b0284d69

Merge pull request #2357 from thaJeztah/gofmt_test gofmt compose loader test

view details

Silvin Lubecki

commit sha 2e4dcacddb1f20dd04deec48e25abda8e3f46019

Merge pull request #2360 from thaJeztah/bump_golang_1.13 Update Golang 1.13.8

view details

Silvin Lubecki

commit sha c3b48c5c9cb269a09c21a39c98d428d9d2feb623

Merge pull request #2343 from thaJeztah/fix_prune_flag_description Fix builder prune -a/--all flag description

view details

Silvin Lubecki

commit sha 40aa02053486902edfff444922006c5bc37e212c

Add an exe extension to windows binary during cross build. Signed-off-by: Silvin Lubecki <silvin.lubecki@docker.com>

view details

Arko Dasgupta

commit sha 67ebcd6dcf82c00f133b886a9a343b67124e58ec

Skip IPAddr validation for "host-gateway" string Relates to - moby/moby 40007 The above PR added support in moby, that detects if a special string "host-gateway" is added to the IP section of --add-host, and if true, replaces it with a special IP value (value of --host-gateway-ip Daemon flag which defaults to the IP of the default bridge). This PR is needed to skip the validation for the above feature Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

push time in a month

pull request commentdocker/cli

Add jobs support to CLI

I'm opposed to the alias of --mode=job for --mode=replicated-job primarily because it makes the documentation unwieldy.

dperny

comment created time in a month

pull request commentdocker/cli

Add jobs support to CLI

  1. The problem where completed jobs are showing 0/4 (4/4 Completed) is actually a bit of a bug in Swarmkit. In the ServiceStatus, Swarmkit should not be setting the denominator to MaxReplicas, but should instead be setting it to the lesser of MaxReplicas or TotalCompletions - CompletedTasks. It's an easy fix, but it's not in this code.
  2. docker service scale should be usable with jobs, and the fact that it's not is a consequence of me overlooking it.
  3. Compose support for jobs isn't in this PR. I was going to open a second PR with compose support. I can add it to this PR if desired.
  4. It is expected that, if a Task fails, a new task should be spawned, until the desired number of completions is reached. The exception should be if --restart-condition=none is set.
  5. RestartOnAny is treated the same as RestartOnFailure if the service is a Job. This needs to be added both to the documentation here and to the swagger docs in the main repo, actually. This behavior isn't accidental; it was a deliberate decision (IIRC, it was part of the jobs design spec).
dperny

comment created time in a month

Pull request review commentdocker/cli

Add jobs support to CLI

 On a manager node: ```bash $ docker service ls -ID            NAME      MODE        REPLICAS    IMAGE-c8wgl7q4ndfd  frontend  replicated  5/5         nginx:alpine-dmu1ept4cxcf  redis     replicated  3/3         redis:3.0.6-iwe3278osahj  mongo     global      7/7         mongo:3.3+ID            NAME      MODE            REPLICAS             IMAGE+c8wgl7q4ndfd  frontend  replicated      5/5                  nginx:alpine+dmu1ept4cxcf  redis     replicated      3/3                  redis:3.0.6+iwe3278osahj  mongo     global          7/7                  mongo:3.3+hh08h9uu8uwr  job       replicated-job  1/1 (3/5 completed)  nginx:latest        

Yes. It implies that 1 task is still running, 3 tasks are completed, and 5 tasks are desired. This would imply the job is running 5 iterations one after another.

dperny

comment created time in a month

issue commentdocker/swarmkit

Is there a roadmap for docker swarm?

linked in an above comment: https://github.com/moby/moby/issues/39624

bitsofinfo

comment created time in a month

issue commentdocker/swarmkit

Is there a roadmap for docker swarm?

Swarm Jobs is in the Docker Engine's master branch, waiting on a CLI PR to be merged.

Swarm Cluster Volumes with CSI Support is still on deck. It will make it into the open source project. It is not on Docker roadmap because it is being worked on by an external contributor (me).

bitsofinfo

comment created time in a month

pull request commentdocker/swarmkit

bump google/certificate-transparency-go v1.0.21

@s0j i'm not sure how familiar you are with go's disastrously bad dependency management, but essentially, the vendored dependencies are completely separate sets for each package. in moby/moby, there is only 1 certificate-transparency version, which is used by every package that depends on it, including swarmkit. in the swarmkit repo, there is only 1 certificate-transparency, which is used to build swarmkit by itself, a situation only useful for CI purposes.

so, you bump the package here to make sure it works, and you bump the package there to actually use it. nothing vendored in this repository is present in the moby/moby repository.

i'm sure that even if that answers your question, it has created a whole host of new ones.

thaJeztah

comment created time in 2 months

pull request commentdocker/swarmkit

[19.03 backport] Fix leaking tasks.db

@thaJeztah merged.

thaJeztah

comment created time in 2 months

push eventdocker/swarmkit

Drew Erny

commit sha 875d50307610e093bb67513696abdef7f4d7414c

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com> (cherry picked from commit 585521df07e18f341b4e4b1fcb1be55f42a0ebad) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Drew Erny

commit sha 0b8364e7d08aa0e972241eb59ae981a67a587a0e

Merge pull request #2940 from thaJeztah/19.03_backport_fix_leaking_task_db [19.03 backport] Fix leaking tasks.db

view details

push time in 2 months

PR merged docker/swarmkit

[19.03 backport] Fix leaking tasks.db

backport of https://github.com/docker/swarmkit/pull/2938

For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization.

When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task.

Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database.

I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system.

Signed-off-by: Drew Erny derny@mirantis.com (cherry picked from commit 585521df07e18f341b4e4b1fcb1be55f42a0ebad) Signed-off-by: Sebastiaan van Stijn github@gone.nl

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/docker/swarmkit/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

- How I did it

- How to test it

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: -->

+31 -15

1 comment

4 changed files

thaJeztah

pr closed time in 2 months

PR opened moby/moby

Bump swarmkit to ebe39a32e3ed4c3a3783a02c11cccf388818694c

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/moby/moby/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

Bumps swarmkit vendoring.

Includes docker/swarmkit#2938, which fixes tasks.db growing out of control on worker nodes.

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> Fix a bug where the tasks.db file grows uncontrolled on Swarm worker nodes.

+27 -5

0 comment

3 changed files

pr created time in 2 months

push eventdperny/docker

Max Harmathy

commit sha 28e93ed8caad2c15d2b3b704801c71b9584de91e

Allow socket activation PartOf deactivates the socket whenever the service get deactivated. The socket unit however should be active nevertheless. Signed-off-by: Max Harmathy <max.harmathy@web.de>

view details

Jintao Zhang

commit sha 9134130b3924918986125977d567f42f908d5442

Remove `SystemInfo()` error handling. Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

view details

Wei Fu

commit sha 9f73396dabf087a8dd5fa74296c2cd4c188ff889

daemon: add grpc.WithBlock option WithBlock makes sure that the following containerd request is reliable. In one edge case with high load pressure, kernel kills dockerd, containerd and containerd-shims caused by OOM. When both dockerd and containerd restart, but containerd will take time to recover all the existing containers. Before containerd serving, dockerd will failed with gRPC error. That bad thing is that restore action will still ignore the any non-NotFound errors and returns running state for already stopped container. It is unexpected behavior. And we need to restart dockerd to make sure that anything is OK. It is painful. Add WithBlock can prevent the edge case. And n common case, the containerd will be serving in shortly. It is not harm to add WithBlock for containerd connection. Signed-off-by: Wei Fu <fuweid89@gmail.com>

view details

Sebastiaan van Stijn

commit sha 20e3b5ba2cc17a5cba47b99e4f74fb9b6b6ab5c7

api/types: minor BuilderVersion refactor Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 9709f6b95e4604956125eaddbb8a4a46b4eb9005

api/server: build: use locally scoped variables Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 04203d13fb52ae41a156506831ce44abcc726dd6

api/server: build: refactor for better readability - construct the initial options as a literal - move validation for windows up, and fail early - move all API-version handling together Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Akihiro Suda

commit sha 612343618dd7dad7cf023e6263d693ab37507a92

cgroup2: use shim V2 * Requires containerd binaries from containerd/containerd#3799 . Metrics are unimplemented yet. * Works with crun v0.10.4, but `--security-opt seccomp=unconfined` is needed unless using master version of libseccomp ( containers/crun#156, seccomp/libseccomp#177 ) * Doesn't work with master runc yet * Resource limitations are unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha 409bbdc3212a37e7a21a70eeae0b44e96509f54d

cgroup2: enable resource limitation enable resource limitation by disabling cgroup v1 warnings resource limitation still doesn't work with rootless mode (even with systemd mode) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha 19baeaca267d5710907ac1b3c3972d44725fe8ad

cgroup2: enable cgroup namespace by default For cgroup v1, we were unable to change the default because of compatibility issue. For cgroup v2, we should change the default right now because switching to cgroup v2 is already breaking change. See also containers/libpod#4363 containers/libpod#4374 Privileged containers also use cgroupns=private by default. https://github.com/containers/libpod/pull/4374#issuecomment-549776387 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Akihiro Suda

commit sha 491531c12bf60a538eeeb37e22b19729a1a65bb8

cgroup2: mark cpu-rt-{period,runtime} unimplemented Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

view details

Sebastiaan van Stijn

commit sha 7e0afd4934528d89e09bd850490db6477222df07

swagger: move ContainerState to definitions Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 234d5a78fe383881ee952bf4c937eae3c54c7b51

swagger: remove classic swarm "Node" field This field is not part of the Docker API and only used for classic (standalone) swarm. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha b5c22f4fcf387c0a6b532b6440134f36999c528f

TestContainerInspectNode: document test as being for classic swarm Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 78c86927abfd8dfd6cfac98398adbc1dbb27784b

api/types: document classic swarm "Node" field Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 8311d6ba9fe69197912f4bba01eab1a92d92ce08

API: omit classic swarm "SystemStatus" field if empty Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 79c877cfa71f7929facda0160bfec12ebed4e0ef

swagger: restore bind options information This information was added to an older version of the API documentation (through 164ab2cfc9900a5e9a8037d41faea2bfdf3d64f1 and 5213a0a67ec635a45e640364e8aa9bf5f431625e), but only added in the "docs" branch. This patch copies the information to the swagger file. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 1c16572fe6a606c0899eb9307edd006fbfa17e5e

registry: fix goimports Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 9ae7196775dddeb49be5195e00572a55b3c4658a

swagger: add missing container Health docs Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 5dbfae694919b1aff76f360061b4065349bf1900

hack/make: ignore failure to stop apparmor ``` ---> Making bundle: .integration-daemon-stop (in bundles/test-integration) ++++ cat bundles/test-integration/docker.pid +++ kill 13137 +++ /etc/init.d/apparmor stop Leaving: AppArmorNo profiles have been unloaded. Unloading profiles will leave already running processes permanently unconfined, which can lead to unexpected situations. To set a process to complain mode, use the command line tool 'aa-complain'. To really tear down all profiles, run 'aa-teardown'." script returned exit code 255 ``` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

Sebastiaan van Stijn

commit sha 4e3ab9e9fbca682f75eb350c8ad4312282869a03

Dockerfile: switch golang image to "buster" variant, and update btrfs packages The btrfs-tools was a transitional package, and no longer exists: > Package btrfs-tools > stretch (oldstable) (admin): transitional dummy package > 4.7.3-1: amd64 arm64 armel armhf i386 mips mips64el mipsel ppc64el s390x It must be replaced either by `btrfs-progs` or `libbtrfs-dev` (which has just the development headers) > Package: libbtrfs-dev (4.20.1-2) > Checksumming Copy on Write Filesystem utilities (development headers) Note that the `libbtrfs-dev` package is not available on Debian stretch (only in stretch-backports) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

view details

push time in 2 months

push eventdocker/swarmkit

Drew Erny

commit sha 585521df07e18f341b4e4b1fcb1be55f42a0ebad

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha ebe39a32e3ed4c3a3783a02c11cccf388818694c

Merge pull request #2938 from dperny/fix-leaking-task-db Fix leaking tasks.db

view details

push time in 2 months

PR merged docker/swarmkit

Fix leaking tasks.db

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/docker/swarmkit/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization.

When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task.

- How I did it

Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database.

I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system.

This is sort of like @olljanat's fix in #2917, but instead of running a routine in a loop, we just delete tasks on demand.

- How to test it

Updates the worker test to reflect the new behavior.

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> Fix longstanding issue with tasks.db growing out of control.

+31 -15

10 comments

4 changed files

dperny

pr closed time in 2 months

pull request commentdocker/swarmkit

Fix leaking tasks.db

Known flaky. There are two or three like that.

dperny

comment created time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func reconcileTaskState(ctx context.Context, w *worker, assignments []*api.Assig  	removeTaskAssignment := func(taskID string) error { 		ctx := log.WithLogger(ctx, log.G(ctx).WithField("task.id", taskID))-		if err := SetTaskAssignment(tx, taskID, false); err != nil {-			log.G(ctx).WithError(err).Error("error setting task assignment in database")+		// if a task is no longer assigned, then we do not have to keep track+		// of it. a task will only be unassigned when it is deleted on the+		// manager. insteaad of SetTaskAssginment to true, we'll just remove+		// the task now.+		if err := DeleteTask(tx, taskID); err != nil {+			log.G(ctx).WithError(err).Errorf(+				"error removing de-assigned task %v", taskID,+			) 		} 		return err

the scope of err is now correct here, but i was so laser focused on that problem that i overlooked the more familiar

if err != nil {
    return err
}
return nil

pattern.

dperny

comment created time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 585521df07e18f341b4e4b1fcb1be55f42a0ebad

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func reconcileTaskState(ctx context.Context, w *worker, assignments []*api.Assig  	removeTaskAssignment := func(taskID string) error { 		ctx := log.WithLogger(ctx, log.G(ctx).WithField("task.id", taskID))-		if err := SetTaskAssignment(tx, taskID, false); err != nil {-			log.G(ctx).WithError(err).Error("error setting task assignment in database")+		// if a task is no longer assigned, then we do not have to keep track+		// of it. a task will only be unassigned when it is deleted on the+		// manager. insteaad of SetTaskAssginment to true, we'll just remove+		// the task now.+		if err := DeleteTask(tx, taskID); err != nil {+			log.G(ctx).WithError(err).Errorf(+				"error removing de-assigned task %v", taskID,+			) 		} 		return err

Yeah, it's fixed, I just did so in a very awkward way. Let me make it the more typical pattern though.

dperny

comment created time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func (w *worker) newTaskManager(ctx context.Context, tx *bolt.Tx, task *api.Task // updateTaskStatus reports statuses to listeners, read lock must be held. func (w *worker) updateTaskStatus(ctx context.Context, tx *bolt.Tx, taskID string, status *api.TaskStatus) error { 	if err := PutTaskStatus(tx, taskID, status); err != nil {+		// we shouldn't fail to put a task status. however, there exists the+		// possibility of a race in which we try to put a task status after the+		// task has been deleted. because this whole contraption is a careful+		// dance of too-tightly-coupled concurrent parts, fixing tht race is+		// fraught with hazards. instead, we'll recognize that it can occur,+		// log the error, and then ignore it. 		log.G(ctx).WithError(err).Error("failed writing status to disk")

Yeah, you're right. I'm gonna log it at Info level, and I added a comment explaining why.

dperny

comment created time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 3b3a2eaef5de697d6202e58332bc59a574174a42

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 8372b961601afb22605a15d933f9147e456eb067

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha b4dd5586297d02d9361abc1a102ac667607b8414

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func reconcileTaskState(ctx context.Context, w *worker, assignments []*api.Assig  	removeTaskAssignment := func(taskID string) error { 		ctx := log.WithLogger(ctx, log.G(ctx).WithField("task.id", taskID))-		if err := SetTaskAssignment(tx, taskID, false); err != nil {-			log.G(ctx).WithError(err).Error("error setting task assignment in database")+		// if a task is no longer assigned, then we do not have to keep track+		// of it. a task will only be unassigned when it is deleted on the+		// manager. insteaad of SetTaskAssginment to true, we'll just remove+		// the task now.+		if err := DeleteTask(tx, taskID); err != nil {+			log.G(ctx).WithError(err).Errorf(+				"error removing de-assigned task %v", taskID,+			) 		} 		return err

🤔 yeah that's not right, fixing that.

dperny

comment created time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func reconcileTaskState(ctx context.Context, w *worker, assignments []*api.Assig  	removeTaskAssignment := func(taskID string) error { 		ctx := log.WithLogger(ctx, log.G(ctx).WithField("task.id", taskID))-		if err := SetTaskAssignment(tx, taskID, false); err != nil {-			log.G(ctx).WithError(err).Error("error setting task assignment in database")+		// if a task is no longer assigned, then we do not have to keep track+		// of it. a task will only be unassigned when it is deleted on the+		// manager. insteaad of SetTaskAssginment to true, we'll just remove+		// the task now.+		if err := DeleteTask(tx, taskID); err != nil {+			log.G(ctx).WithError(err).Errorf(

I think the only way they fail to be deleted is if they don't exist.

dperny

comment created time in 2 months

pull request commentdocker/swarmkit

Fix leaking tasks.db

I found the race condition. I "fixed" it, but I'm not powerful enough to actually fix it the right way without the whole house of cards tumbling down.

dperny

comment created time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 4b3e5a7e4a3237df106cb1381c01e7b09240638c

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

Pull request review commentdocker/swarmkit

Fix leaking tasks.db

 func reconcileTaskState(ctx context.Context, w *worker, assignments []*api.Assig  	removeTaskAssignment := func(taskID string) error { 		ctx := log.WithLogger(ctx, log.G(ctx).WithField("task.id", taskID))-		if err := SetTaskAssignment(tx, taskID, false); err != nil {-			log.G(ctx).WithError(err).Error("error setting task assignment in database")+		// if a task is no longer assigned, then we do not have to keep track+		// of it. a task will only be unassigned when it is deleted on the+		// manager. insteaad of SetTaskAssginment to true, we'll just remove+		// the task now.+		if err := DeleteTask(tx, taskID); err != nil {+			log.G(ctx).WithError(err).Errorf(+				"error removing de-assigned task %v", taskID,

yeah, good call, thanks.

dperny

comment created time in 2 months

pull request commentdocker/swarmkit

Fix leaking tasks.db

@Thomas131 I'm not Officially Endorsing this approach, but I've been told that Nothing Breaks if you delete tasks.db.

Of course, the Official Advice would be to leave the cluster, delete tasks.db, and then rejoin the cluster as a "new" node.

dperny

comment created time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha af58e4d815731d21871bb67fea06e3927d40afa9

Fix leaking tasks.db For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization. When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task. Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database. I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

pull request commentdocker/swarmkit

Fix leaking tasks.db

Just saw that failure. Can't reproduce it locally, so I'm not sure what's going on.

There shouldn't be any difference in a worker promoted to a manager. The databases for each component are completely separate.

dperny

comment created time in 2 months

pull request commentdocker/swarmkit

Fix leaking tasks.db

/cc @cpuguy83

dperny

comment created time in 2 months

pull request commentdocker/swarmkit

[WIP] Cleanup routine for old tasks.db tasks

PTAL #2938, which is a bit cleaner of a fix.

olljanat

comment created time in 2 months

PR opened docker/swarmkit

Fix leaking tasks.db

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/docker/swarmkit/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

For a long time, it's been a known fault that tasks.db grows out of control. The reason is that this database is only cleaned up, and old tasks removed, during initialization.

When an assignment to a worker is removed, previously, the assignment was just marked as removed in the tasks database. However, an assignment is only removed from a worker when the task is removed from the manager. The worker does not need to continue to keep track of the task.

- How I did it

Instead of marking a task as no longer assigned, when a task is removed as an assignment, we'll simply delete it from the database.

I'm not 100% sure of what all the task database is responsible for, or why it needs to be persisted, so this change is targeted to have the minimal impact on the system.

This is sort of like @olljanat's fix in #2917, but instead of running a routine in a loop, we just delete tasks on demand.

- How to test it

Updates the worker test to reflect the new behavior.

- Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> Fix longstanding issue with tasks.db growing out of control.

+12 -12

0 comment

2 changed files

pr created time in 2 months

create barnchdperny/swarmkit-1

branch : fix-leaking-task-db

created branch time in 2 months

issue commentdocker/swarmkit

Proposal: Device Support

it's not on the roadmap but i haven't forgotten about it. volume support is at the forefront, and if that proceed much faster than expected, then i'm planning to lobby to do this next

that said, you could build it, if you wanted to!

dperny

comment created time in 2 months

issue commentmoby/moby

insufficient uniqueness of auto-generated service names

I've actually been aware of this problem for some time. I remember someone getting mad at me for calling it a "birthday problem" because those words have a strong negative connotation in the field of security, but that's essentially what it is.

The difficulty with a retry loop is that each retry iteration would involve making a separate API call to Swarm. It's not all that disruptive, but under the current model, the place where names are generated (your snippet) is quite distant from the place where services are created.

For now, I've no intention of spending time working on this problem, so I'd recommend coming up with a naming scheme for your services, instead of leaving them anonymous and having a name generated.

mlin

comment created time in 3 months

pull request commentdocker/swarmkit

Allow to predefine labels on joining nodes

This contravenes the security model for Swarm, which is to not trust the worker. Under this model, a Worker node could join and label itself something like com.your.company.sensitiveworkloads = true.

There are actually two sets of labels in play for nodes. These labels, the Node labels, are not supposed to be reported by the Worker. However, there are also Engine labels, which are labels set on the Docker Engine itself. These are reported by the Worker itself.

Now, as I understand it, the UI for Engine labels leaves something to be desired, so perhaps a different approach would be to improve the UI for setting Engine labels. This would involve work in the engine (moby/moby) and in the CLI (docker/cli), but not in swarmkit.

I would not merge this PR.

jlevesy

comment created time in 3 months

more