profile
viewpoint
Drew Erny dperny @Mirantis Tuscaloosa, AL http://www.dperny.net FLOSS Machine. Outside of my official work obligations, I am happy to help anyone contribute to the various projects I work on in any way I can! Get in touch!

docker/classicswarm 5820

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at https://github.com/docker/swarmkit

docker/swarmkit 2457

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.

docker/leadership 142

Distributed Leader Election using docker/libkv

docker/swarm-library-image 30

Official Image packaging for Classic Swarm, now archived

dperny/cs403-assignments 2

Programming assignments from CS 403 at The University of Alabama with Dr. John Lusth

dperny/100pUtils 0

A collection of short programs, scrips, and libraries for common 100p operations.

dperny/binarytree 0

A Python3 class for a binary tree

dperny/caboose-cms 0

Ruby on rails content management system

push eventdperny/swarmkit-1

Drew Erny

commit sha 178f735f0719ec4751dbbcce2d00a3c933747b49

Add code for preparing to publish volumes to nodes Publishing volumes is now a two-step process. First, the Scheduler updates the Volume object to PENDING_PUBLISH, which indicates that the volume should be published, but that the call hasn't verifiably succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume RPC, and updates the volume object again to PUBLISHED, indicating that the call has succeeded. This makes sense because the Scheduler has knowledge of when and why a volume is in use. This change includes fairly substantial breaking changes to the protocol buffers, but this is acceptable because this code has not yet been released. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 3 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 64ca5ccc670adb215e047fa95e39cc5e25fcd65e

Add scheduler volumes tests Adds ginkgo tests for the integration between volumes and the scheduler. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha edf390e6d3a12e0471654ce182d59192db4d3522

Merge pull request #2971 from dperny/swarm-volume-scheduler [feature-volumes] Add volume scheduler integration

view details

Drew Erny

commit sha bc339c63249ce095736dd6dfd45a8a54dd33511b

Add code for preparing to publish volumes to nodes Publishing volumes is now a two-step process. First, the Scheduler updates the Volume object to PENDING_PUBLISH, which indicates that the volume should be published, but that the call hasn't verifiably succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume RPC, and updates the volume object again to PUBLISHED, indicating that the call has succeeded. This makes sense because the Scheduler has knowledge of when and why a volume is in use. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 3 days

pull request commentdocker/swarmkit

Verify network allocation on creating network

Additionally, waiting for the network allocation to succeed or fail could possibly take a long time, and in the meantime, the request will be hanging open. For example, if the network uses a third-party IPAM driver, and the driver is slow to respond, then we may be stuck waiting.

xinfengliu

comment created time in 3 days

pull request commentdocker/swarmkit

Verify network allocation on creating network

I can't merge this. The design isn't OK.

Creating a Network in swarmkit is like creating a Service or a Secret. The user is providing a desired state, which swarmkit will later attempt to fulfill. This is an operation that cannot fail. Even if the user's desired state is invalid, it's not swarmkit's place to delete that object. This is, admittedly, made difficult by the fact that Networks cannot be updated. However, it is not the case that this operation could never succeed. If the user deletes the first network, then the second network may be valid and pass allocation.

While the behavior here may feel odd, it isn't errant. What kind of problems is this behavior causing in practice?

xinfengliu

comment created time in 3 days

push eventdocker/swarmkit

Drew Erny

commit sha 7f8357de91eb50c6e3b928801c1fd1015f842b34

Add node inventory tracking to CSI manager Adds code to keep track of node ID mappings to the CSI manager and the csi Plugin interface. This will allow us to use the Plugin interface solely in terms of the swarmkit node ID. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 551cfa36c459c29a471bd834b0aa82d882f0a6d8

Add code for determining if a volume is available Adds code to the csi volume Manager, which determines if a volume is available on a given node or not. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 39cbdb4a36f5e1ed13d96bdbd735ac968ea78183

Add volumeSet to scheduler Adds the volumeSet object to the scheduler. This object keeps track of volumes available on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 64ca5ccc670adb215e047fa95e39cc5e25fcd65e

Add scheduler volumes tests Adds ginkgo tests for the integration between volumes and the scheduler. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha edf390e6d3a12e0471654ce182d59192db4d3522

Merge pull request #2971 from dperny/swarm-volume-scheduler [feature-volumes] Add volume scheduler integration

view details

push time in 3 days

PR merged docker/swarmkit

[feature-volumes] Add volume scheduler integration

Adds code to the swarmkit Scheduler to handle volumes.

+2351 -4

2 comments

16 changed files

dperny

pr closed time in 3 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 64ca5ccc670adb215e047fa95e39cc5e25fcd65e

Add scheduler volumes tests Adds ginkgo tests for the integration between volumes and the scheduler. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 3 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 139c56dae1794528c242b85efb5a843c9c5afec3

Add code for preparing to publish volumes to nodes Publishing volumes is now a two-step process. First, the Scheduler updates the Volume object to PENDING_PUBLISH, which indicates that the volume should be published, but that the call hasn't verifiably succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume RPC, and updates the volume object again to PUBLISHED, indicating that the call has succeeded. This makes sense because the Scheduler has knowledge of when and why a volume is in use. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 5 days

PR opened docker/swarmkit

[feature-volumes] Add code for calling ControllerPublishVolume

NOTE: THIS PR CONTAINS CODE FROM #2971

Creates a two-step process for publishing volumes.

Each Node that a Volume is in use on is associated with a PublishStatus.

When the Scheduler decides a volume should be used on a particular Node, it updates the Volume object with a new PublishStatus for that node, with the state of PENDING_PUBLISH. This indicates that the ControllerPublishVolume RPC should be called.

When the Volume object is updated, the CSI Manager checks if it has volumes in PENDING_PUBLISH state. If so, then it calls the ControllerPublishVolume RPC. Once that RPC succeeds, updates the status to PUBLISHED.

This process means that we can never end up calling ControllerPublishVolume and then "forgetting" about it. If the Volume is in PENDING_PUBLISH, then it must go to PUBLISHED at some point. If we lose leadership before the Volume object is updated to PUBLISHED, but after the RPC has succeeded, then there is no problem, because these RPCs are idempotent.

The next thing to do here is to update the Dispatcher to check the status of a Volume before creating a VolumeAssignment.

+4226 -495

0 comment

19 changed files

pr created time in 6 days

push eventdperny/swarmkit-1

Drew Erny

commit sha c44dffec63672e382e4c4dd1c6855bff11630c6e

Add code for preparing to publish volumes to nodes Publishing volumes is now a two-step process. First, the Scheduler updates the Volume object to PENDING_PUBLISH, which indicates that the volume should be published, but that the call hasn't verifiably succeeded yet. Then, the CSI Manager calls the ControllerPublishVolume RPC, and updates the volume object again to PUBLISHED, indicating that the call has succeeded. This makes sense because the Scheduler has knowledge of when and why a volume is in use. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 6 days

create barnchdperny/swarmkit-1

branch : feature-volumes-controller-publish

created branch time in 10 days

PR opened docker/swarmkit

Add API fields for clean removal.

Removing volumes is a tricky proposition, because it is not sufficient to simply delete the volume in question. The correct removal steps must be followed to cleanly remove the volume.

First, to remove a volume from a node, the manager must know affirmatively that it is no longer in use on that node. If a volume is still in use, then it cannot be unpublished on the controller side. To solve this problem, a repeated string field is added to NodeDescription, reporting all volumes active on that node.

Second, to remove a volume from Swarm, or to update it, the volume must not be active and published anywhere. To facilitate this, volume availability states are added to the VolumeSpec. These states, analogous to NodeAvailability, control the usage of the volume.

+753 -540

0 comment

5 changed files

pr created time in 14 days

create barnchdperny/swarmkit-1

branch : feature-volumes-add-removal-apis

created branch time in 14 days

push eventdperny/swarmkit-1

Drew Erny

commit sha 0976f92c122f0e4d4a998a40b0343e6152336dc5

Add scheduler volumes tests Adds ginkgo tests for the integration between volumes and the scheduler. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync map "m" and "pluginMap"+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 20

Instead of just reducing maxRetries, can you clamp the maximum retry time at something 5 minutes? That seems short, but any longer than that and swarm ought to just reschedule the task.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync map "m" and "pluginMap"+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 20++const initialBackoff = 1 * time.Millisecond++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.mu.RLock()+	defer r.mu.RUnlock()+	if plugin, ok := r.pluginMap[volumeID]; ok {+		return plugin.GetPublishedPath(volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		driverName := v.Driver.Name+		if _, ok := r.pluginMap[driverName]; !ok {+			// TODO - revisit NodePlugin constructor call upon deciding where it needs to initialized.+			// On deciding, will use NodeGetInfo() on plugin object accordingly to get NodeInfo and susbequently, fetch nodeID.+			// Until then, use volume ID as node ID in lazy initialzation.+			r.pluginMap[driverName] = NewNodePlugin(driverName, v.ID)+		}+		volumeObjects = append(volumeObjects, v)+	}+	r.iterateVolumes(volumeObjects, true)+}++func (r *volumes) iterateVolumes(volumeObjects []*api.VolumeAssignment, isAdd bool) {+	ctx := context.Background()+	for _, v := range volumeObjects {+		if isAdd {+			go r.tryAddVolume(ctx, v)+		} else {+			go r.tryRemoveVolume(ctx, v)+		}+	}+}++func (r *volumes) tryAddVolume(ctx context.Context, assignment *api.VolumeAssignment) {+	r.tryMu.Lock()+	defer r.tryMu.Unlock()++	r.mu.RLock()+	plugin, ok := r.pluginMap[assignment.VolumeID]+	r.mu.RUnlock()+	if !ok {+		log.G(ctx).Debugf("plugin not found for VolumeID:%v", assignment.VolumeID)+		return+	}+	if err := plugin.NodeStageVolume(ctx, assignment); err != nil {+		waitFor := initialBackoff+	retryStage:+		for i := 0; i < maxRetries; i++ {+			select {+			case <-ctx.Done():+				// selecting on ctx.Done() allows us to bail out of retrying early+				return+			case <-time.After(waitFor):+				// time.After is better than using time.Sleep, because it blocks+				// on a channel read, rather than suspending the whole+				// goroutine. That lets us do the above check on ctx.Done().+				//+				// time.After is convenient, but it has a key problem: the timer+				// is not garbage collected until the channel fires. this+				// shouldn't be a problem, unless the context is canceled, there+				// is a very long timer, and there are a lot of other goroutines+				// in the same situation.+				if err := plugin.NodeStageVolume(ctx, assignment); err == nil {+					break retryStage+				}+			}+			// if the exponential factor is 2, you can avoid using floats by+			// doing bit shifts. each shift left increases the number by a power+			// of 2. we can do this because Duration is ultimately int64.+			waitFor = waitFor << 1+		}+	}++	// Publish+	if err := plugin.NodePublishVolume(ctx, assignment); err != nil {+		waitFor := initialBackoff+	retryPublish:+		for i := 0; i < maxRetries; i++ {+			select {+			case <-ctx.Done():+				return+			case <-time.After(waitFor):+				if err := plugin.NodePublishVolume(ctx, assignment); err == nil {+					break retryPublish+				}+			}+			waitFor = waitFor << 1+		}+	}+}++func (r *volumes) tryRemoveVolume(ctx context.Context, assignment *api.VolumeAssignment) {

Not going to ask to address in this PR, but we need to add some code to cancel existing attempts at adding a volume if we suddenly need to remove it. So, just adding a TODO comment, like this:

// TODO(ameyag): Cancel existing tryAddVolume when we try to remove a volume
ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 func reconcileConfigs(ctx context.Context, w *worker, assignments []*api.Assignm 	return nil } +func reconcileVolumes(ctx context.Context, w *worker, assignments []*api.AssignmentChange, fullSnapshot bool) error {+	var (+		updatedVolumes []api.VolumeAssignment+		removedVolumes []string+	)+	for _, a := range assignments {+		if r := a.Assignment.GetVolume(); r != nil {+			switch a.Action {+			case api.AssignmentChange_AssignmentActionUpdate:+				updatedVolumes = append(updatedVolumes, *r)+			case api.AssignmentChange_AssignmentActionRemove:+				removedVolumes = append(removedVolumes, r.ID)+			}++		}+	}++	volumesProvider, ok := w.executor.(exec.VolumesProvider)+	if !ok {+		if len(updatedVolumes) != 0 || len(removedVolumes) != 0 {+			log.G(ctx).Warn("volumes update ignored; executor does not support volumes")+		}+		return nil+	}++	volumes := volumesProvider.Volumes()++	log.G(ctx).WithFields(logrus.Fields{+		"len(updatedVolumes)": len(updatedVolumes),+		"len(removedVolumes)": len(removedVolumes),+	}).Debug("(*worker).reconcileVolumes")++	// If this was a complete set of volumes, we're going to clear the volumes map and add all of them+	if fullSnapshot {+		volumes.Reset()

Add another TODO comment here, because when we reset volumes, we don't want to unpublish and unstage only to publish and stage again a moment later. That works for secrets and configs because the object itself contains everything you need, but for volumes it would be a mess.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync map "m" and "pluginMap"+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 20++const initialBackoff = 1 * time.Millisecond++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.mu.RLock()+	defer r.mu.RUnlock()+	if plugin, ok := r.pluginMap[volumeID]; ok {

This won't work, because the pluginMap maps the driver name to the plugin, not the volumeID to the plugin. It's probably better to keep track of published paths in the volume manager, instead of in the plugin, so that you can directly look them up without having to resolve which plugin a given volume uses.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"testing"+	"time"++	"github.com/stretchr/testify/require"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/identity"+	"github.com/stretchr/testify/assert"+)++const iterations = 25+const interval = 100 * time.Millisecond++func NewFakeManager() *volumes {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++func TestTaskRestrictedVolumesProvider(t *testing.T) {+	type testCase struct {+		desc          string+		volumeIDs     map[string]struct{}+		volumes       exec.VolumeGetter+		volumeID      string+		taskID        string+		volumeIDToGet string+		value         string+		expected      string+		expectedErr   string+	}++	originalvolumeID := identity.NewID()+	taskID := identity.NewID()+	taskSpecificID := fmt.Sprintf("%s.%s", originalvolumeID, taskID)

taskSpecificID is related to a feature of secrets, and does not need to be considered for volumes. And in either case, it seems like you don't have any code to handle it.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 package csi  import ( 	"context"+	"fmt"+	"sync"  	"google.golang.org/grpc"+	"google.golang.org/grpc/codes"+	"google.golang.org/grpc/status"  	"github.com/container-storage-interface/spec/lib/go/csi" 	"github.com/docker/swarmkit/api" )  type NodePluginInterface interface { 	NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error)+	NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error+	NodeUnstageVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodePublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodeUnpublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+}++type volumePublishStatus struct {+	// stagingPath is staging path of volume+	stagingPath string++	// isPublished keeps track if the volume is published.+	isPublished bool++	// publishedPath is published path of volume+	publishedPath string }  // plugin represents an individual CSI node plugin type NodePlugin struct {-	// Name is the name of the plugin, which is also the name used as the+	// name is the name of the plugin, which is also the name used as the 	// Driver.Name field-	Name string+	name string++	// node ID is identifier for the node.+	nodeID string++	// socket is the unix socket to connect to this plugin at.+	socket string -	// Node ID is identifier for the node.-	NodeID string+	// cc is the grpc client connection+	cc *grpc.ClientConn -	// Socket is the unix socket to connect to this plugin at.-	Socket string+	// idClient is the identity service client+	idClient csi.IdentityClient -	// CC is the grpc client connection-	CC *grpc.ClientConn+	// nodeClient is the node service client+	nodeClient csi.NodeClient -	// IDClient is the identity service client-	IDClient csi.IdentityClient+	// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	volumeMap map[string]*volumePublishStatus++	// mu for volumeMap+	mu sync.RWMutex+}++const TargetStagePath string = "/var/lib/docker/stage/%s"++const TargetPublishPath string = "/var/lib/docker/publish/%s"++func NewNodePlugin(name string, nodeID string) *NodePlugin {+	return &NodePlugin{+		name:      name,+		nodeID:    nodeID,+		volumeMap: make(map[string]*volumePublishStatus),+	}+} -	// NodeClient is the node service client-	NodeClient csi.NodeClient+// GetPublishedPath returns the path at which the provided volume ID is published.+// Returns an empty string if the volume does not exist.+func (np *NodePlugin) GetPublishedPath(volumeID string) (string, error) {+	if volInfo, ok := np.volumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.publishedPath, nil+		}+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.mu.RLock()+	defer np.mu.RUnlock()

Still need to explain why this lock is acquired here. There is no access to the volumes map, which is what the lock guards.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 package csi  import ( 	"context"+	"fmt"+	"sync"  	"google.golang.org/grpc"+	"google.golang.org/grpc/codes"+	"google.golang.org/grpc/status"  	"github.com/container-storage-interface/spec/lib/go/csi" 	"github.com/docker/swarmkit/api" )  type NodePluginInterface interface { 	NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error)+	NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error+	NodeUnstageVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodePublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodeUnpublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+}++type volumePublishStatus struct {+	// stagingPath is staging path of volume+	stagingPath string++	// isPublished keeps track if the volume is published.+	isPublished bool++	// publishedPath is published path of volume+	publishedPath string }  // plugin represents an individual CSI node plugin type NodePlugin struct {-	// Name is the name of the plugin, which is also the name used as the+	// name is the name of the plugin, which is also the name used as the 	// Driver.Name field-	Name string+	name string++	// node ID is identifier for the node.+	nodeID string++	// socket is the unix socket to connect to this plugin at.+	socket string -	// Node ID is identifier for the node.-	NodeID string+	// cc is the grpc client connection+	cc *grpc.ClientConn -	// Socket is the unix socket to connect to this plugin at.-	Socket string+	// idClient is the identity service client+	idClient csi.IdentityClient -	// CC is the grpc client connection-	CC *grpc.ClientConn+	// nodeClient is the node service client+	nodeClient csi.NodeClient -	// IDClient is the identity service client-	IDClient csi.IdentityClient+	// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	volumeMap map[string]*volumePublishStatus++	// mu for volumeMap+	mu sync.RWMutex+}++const TargetStagePath string = "/var/lib/docker/stage/%s"++const TargetPublishPath string = "/var/lib/docker/publish/%s"++func NewNodePlugin(name string, nodeID string) *NodePlugin {+	return &NodePlugin{+		name:      name,+		nodeID:    nodeID,+		volumeMap: make(map[string]*volumePublishStatus),+	}+} -	// NodeClient is the node service client-	NodeClient csi.NodeClient+// GetPublishedPath returns the path at which the provided volume ID is published.+// Returns an empty string if the volume does not exist.+func (np *NodePlugin) GetPublishedPath(volumeID string) (string, error) {+	if volInfo, ok := np.volumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.publishedPath, nil+		}+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.mu.RLock()+	defer np.mu.RUnlock() 	resp := &csi.NodeGetInfoResponse{-		NodeId: np.NodeID,+		NodeId: np.nodeID, 	}  	return makeNodeInfo(resp), nil }++func (np *NodePlugin) NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID+	stagingTarget := fmt.Sprintf(TargetStagePath, volID)++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()++	v := &volumePublishStatus{+		stagingPath: stagingTarget,+	}++	np.volumeMap[volID] = v++	return nil+}++func (np *NodePlugin) NodeUnstageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()+	if v, ok := np.volumeMap[volID]; ok {+		if v.isPublished {+			return status.Errorf(codes.InvalidArgument, "VolumeID %s is not unpublished", volID)+		}+		delete(np.volumeMap, volID)+		return nil+	}++	return status.Errorf(codes.FailedPrecondition, "VolumeID %s is not staged", volID)+}++func (np *NodePlugin) NodePublishVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()+	if v, ok := np.volumeMap[volID]; ok {+		v.publishedPath = fmt.Sprintf(TargetPublishPath, volID)+		v.isPublished = true+		return nil+	}++	return status.Errorf(codes.FailedPrecondition, "VolumeID %s is not staged", volID)+}++func (np *NodePlugin) NodeUnpublishVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()+	if v, ok := np.volumeMap[volID]; ok {+		v.publishedPath = ""+		v.isPublished = false+		return nil+	}++	return status.Errorf(codes.FailedPrecondition, "VolumeID %s is not staged", volID)

s/staged/published/

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 package csi  import ( 	"context"+	"fmt"+	"sync"  	"google.golang.org/grpc"+	"google.golang.org/grpc/codes"+	"google.golang.org/grpc/status"  	"github.com/container-storage-interface/spec/lib/go/csi" 	"github.com/docker/swarmkit/api" )  type NodePluginInterface interface { 	NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error)+	NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error+	NodeUnstageVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodePublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodeUnpublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+}++type volumePublishStatus struct {+	// stagingPath is staging path of volume+	stagingPath string++	// isPublished keeps track if the volume is published.+	isPublished bool++	// publishedPath is published path of volume+	publishedPath string }  // plugin represents an individual CSI node plugin type NodePlugin struct {-	// Name is the name of the plugin, which is also the name used as the+	// name is the name of the plugin, which is also the name used as the 	// Driver.Name field-	Name string+	name string++	// node ID is identifier for the node.+	nodeID string++	// socket is the unix socket to connect to this plugin at.+	socket string -	// Node ID is identifier for the node.-	NodeID string+	// cc is the grpc client connection+	cc *grpc.ClientConn -	// Socket is the unix socket to connect to this plugin at.-	Socket string+	// idClient is the identity service client+	idClient csi.IdentityClient -	// CC is the grpc client connection-	CC *grpc.ClientConn+	// nodeClient is the node service client+	nodeClient csi.NodeClient -	// IDClient is the identity service client-	IDClient csi.IdentityClient+	// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	volumeMap map[string]*volumePublishStatus++	// mu for volumeMap+	mu sync.RWMutex+}++const TargetStagePath string = "/var/lib/docker/stage/%s"++const TargetPublishPath string = "/var/lib/docker/publish/%s"++func NewNodePlugin(name string, nodeID string) *NodePlugin {+	return &NodePlugin{+		name:      name,+		nodeID:    nodeID,+		volumeMap: make(map[string]*volumePublishStatus),+	}+} -	// NodeClient is the node service client-	NodeClient csi.NodeClient+// GetPublishedPath returns the path at which the provided volume ID is published.+// Returns an empty string if the volume does not exist.+func (np *NodePlugin) GetPublishedPath(volumeID string) (string, error) {+	if volInfo, ok := np.volumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.publishedPath, nil+		}+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.mu.RLock()+	defer np.mu.RUnlock() 	resp := &csi.NodeGetInfoResponse{-		NodeId: np.NodeID,+		NodeId: np.nodeID, 	}  	return makeNodeInfo(resp), nil }++func (np *NodePlugin) NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID+	stagingTarget := fmt.Sprintf(TargetStagePath, volID)++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()++	v := &volumePublishStatus{+		stagingPath: stagingTarget,+	}++	np.volumeMap[volID] = v++	return nil+}++func (np *NodePlugin) NodeUnstageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.mu.Lock()+	defer np.mu.Unlock()+	if v, ok := np.volumeMap[volID]; ok {+		if v.isPublished {+			return status.Errorf(codes.InvalidArgument, "VolumeID %s is not unpublished", volID)

This should also be FailedPrecondition.

ameyag

comment created time in 17 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 package csi  import ( 	"context"+	"fmt"+	"sync"  	"google.golang.org/grpc"+	"google.golang.org/grpc/codes"+	"google.golang.org/grpc/status"  	"github.com/container-storage-interface/spec/lib/go/csi" 	"github.com/docker/swarmkit/api" )  type NodePluginInterface interface { 	NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error)+	NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error+	NodeUnstageVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodePublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+	NodeUnpublishVolume(ctx context.Context, req []*api.VolumeAssignment) error+}++type volumePublishStatus struct {+	// stagingPath is staging path of volume+	stagingPath string++	// isPublished keeps track if the volume is published.+	isPublished bool++	// publishedPath is published path of volume+	publishedPath string }  // plugin represents an individual CSI node plugin type NodePlugin struct {-	// Name is the name of the plugin, which is also the name used as the+	// name is the name of the plugin, which is also the name used as the 	// Driver.Name field-	Name string+	name string++	// node ID is identifier for the node.+	nodeID string++	// socket is the unix socket to connect to this plugin at.+	socket string -	// Node ID is identifier for the node.-	NodeID string+	// cc is the grpc client connection+	cc *grpc.ClientConn -	// Socket is the unix socket to connect to this plugin at.-	Socket string+	// idClient is the identity service client+	idClient csi.IdentityClient -	// CC is the grpc client connection-	CC *grpc.ClientConn+	// nodeClient is the node service client+	nodeClient csi.NodeClient -	// IDClient is the identity service client-	IDClient csi.IdentityClient+	// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	volumeMap map[string]*volumePublishStatus++	// mu for volumeMap+	mu sync.RWMutex+}++const TargetStagePath string = "/var/lib/docker/stage/%s"++const TargetPublishPath string = "/var/lib/docker/publish/%s"++func NewNodePlugin(name string, nodeID string) *NodePlugin {+	return &NodePlugin{+		name:      name,+		nodeID:    nodeID,+		volumeMap: make(map[string]*volumePublishStatus),+	}+} -	// NodeClient is the node service client-	NodeClient csi.NodeClient+// GetPublishedPath returns the path at which the provided volume ID is published.+// Returns an empty string if the volume does not exist.+func (np *NodePlugin) GetPublishedPath(volumeID string) (string, error) {

Still returning an error here.

ameyag

comment created time in 17 days

PullRequestReviewEvent
PullRequestReviewEvent

push eventdperny/swarmkit-1

Drew Erny

commit sha c590a9cccb87cceedd53942b20efd48bd9865a08

Add scheduler volumes tests Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync between Add(), Remove() and Reset() for "m" map+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	mapMu     sync.RWMutex                     // To sync access to "pluginMap" map+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 25++const initialBackoff = 100 * time.Millisecond++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.mapMu.RLock()+	defer r.mapMu.RUnlock()+	if plugin, ok := r.pluginMap[volumeID]; ok {+		return plugin.GetPublisedPath(volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		driverName := v.Driver.Name+		r.mapMu.Lock()+		if _, ok := r.pluginMap[driverName]; !ok {+			r.pluginMap[driverName] = NewNodePlugin(driverName)+		}+		r.mapMu.Unlock()+		volumeObjects = append(volumeObjects, v)+	}+	go r.iterateVolumes(volumeObjects, true)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	ctx := context.Background()+	for _, v := range volumeObjects {+		go r.tryAddVolume(ctx, v)+	}+}++func (r *volumes) iterateVolumes(volumeObjects []*api.VolumeAssignment, isAdd bool) {+	ctx := context.Background()+	for _, v := range volumeObjects {+		if isAdd {+			go r.tryAddVolume(ctx, v)+		} else {+			go r.tryRemoveVolume(ctx, v)+		}+	}+}++func (r *volumes) tryAddVolume(ctx context.Context, assignment *api.VolumeAssignment) {+	r.tryMu.Lock()+	defer r.tryMu.Unlock()++	r.mapMu.RLock()+	plugin, ok := r.pluginMap[assignment.VolumeID]+	r.mapMu.RUnlock()+	if !ok {+		log.G(ctx).Debugf("plugin not found for VolumeID:%v", assignment.VolumeID)+		return+	}+	if err := plugin.NodeStageVolume(ctx, assignment); err != nil {+		waitFor := initialBackoff+	retryStage:+		for i := 0; i < maxRetries; i++ {+			select {+			case <-ctx.Done():+				// selecting on ctx.Done() allows us to bail out of retrying early+				return+			case <-time.After(waitFor):+				// time.After is better than using time.Sleep, because it blocks+				// on a channel read, rather than suspending the whole+				// goroutine. That lets us do the above check on ctx.Done().+				//+				// time.After is convenient, but it has a key problem: the timer+				// is not garbage collected until the channel fires. this+				// shouldn't be a problem, unless the context is canceled, there+				// is a very long timer, and there are a lot of other goroutines+				// in the same situation.+				if err := plugin.NodeStageVolume(ctx, assignment); err == nil {+					break retryStage+				}+			}+			// if the exponential factor is 2, you can avoid using floats by+			// doing bit shifts. each shift left increases the number by a power+			// of 2. we can do this because Duration is ultimately int64.+			waitFor = waitFor << 1

100ms, when bit shifted 25 times, is a really, really big number. Like, 38 days big. You should cap the maximum time between retries to something like an hour.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync between Add(), Remove() and Reset() for "m" map+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	mapMu     sync.RWMutex                     // To sync access to "pluginMap" map

There are a lot of locks here, and several methods acquire more than one at the same time. Take extra care to ensure there can't be any deadlocks, or simply and reduce the number of locks used.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync between Add(), Remove() and Reset() for "m" map+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	mapMu     sync.RWMutex                     // To sync access to "pluginMap" map+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 25++const initialBackoff = 100 * time.Millisecond++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.mapMu.RLock()+	defer r.mapMu.RUnlock()+	if plugin, ok := r.pluginMap[volumeID]; ok {+		return plugin.GetPublisedPath(volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		driverName := v.Driver.Name+		r.mapMu.Lock()+		if _, ok := r.pluginMap[driverName]; !ok {+			r.pluginMap[driverName] = NewNodePlugin(driverName)+		}+		r.mapMu.Unlock()+		volumeObjects = append(volumeObjects, v)+	}+	go r.iterateVolumes(volumeObjects, true)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {

This is dead code. It doesn't appear to be called anywhere.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu        sync.RWMutex                     // To sync between Add(), Remove() and Reset() for "m" map+	tryMu     sync.RWMutex                     // To sync between tryAddVolume() and tryRemoveVolume()+	mapMu     sync.RWMutex                     // To sync access to "pluginMap" map+	m         map[string]*api.VolumeAssignment // Map between VolumeID and VolumeAssignment+	pluginMap map[string]*NodePlugin           // Map between Driver Name and NodePlugin+}++const maxRetries int = 25++const initialBackoff = 100 * time.Millisecond++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:         make(map[string]*api.VolumeAssignment),+		pluginMap: make(map[string]*NodePlugin),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.mapMu.RLock()+	defer r.mapMu.RUnlock()+	if plugin, ok := r.pluginMap[volumeID]; ok {+		return plugin.GetPublisedPath(volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		driverName := v.Driver.Name+		r.mapMu.Lock()+		if _, ok := r.pluginMap[driverName]; !ok {+			r.pluginMap[driverName] = NewNodePlugin(driverName)+		}+		r.mapMu.Unlock()+		volumeObjects = append(volumeObjects, v)+	}+	go r.iterateVolumes(volumeObjects, true)

This goroutine is unnecessary, because the function it executes just spawns its own goroutines.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (np *NodePlugin) GetPublisedPath(volumeID string) (string, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock()+	if volInfo, ok := np.VolumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.targetPath, nil+		}+		return volInfo.targetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock() 	resp := &csi.NodeGetInfoResponse{ 		NodeId: np.NodeID, 	}  	return makeNodeInfo(resp), nil }++func (np *NodePlugin) NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID+	stagingTarget := fmt.Sprintf(TargetPath, volID)++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()++	v := &volumePublishStatus{+		targetPath: stagingTarget,+	}++	np.VolumeMap[volID] = v++	return nil+}++func (np *NodePlugin) NodeUnstageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()+	if v, ok := np.VolumeMap[volID]; ok {+		if v.isPublished {+			return status.Errorf(codes.InvalidArgument, "VolumeID %s is not unpublished", volID)+		}+		delete(np.VolumeMap, volID)+		return nil+	}++	return status.Errorf(codes.NotFound, "VolumeID %s is not staged", volID)+}++func (np *NodePlugin) NodePublishVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()+	if v, ok := np.VolumeMap[volID]; ok {+		if _, err := os.Stat(v.targetPath); os.IsNotExist(err) {+			os.Mkdir(v.targetPath, os.ModeDir)+		}+		v.isPublished = true+		return nil+	}++	return status.Errorf(codes.NotFound, "VolumeID %s is not staged", volID)

Two things:

  1. This error should likely be whatever is returned from the CSI plugin
  2. Since we're not actually calling the CSI plugin yet, this should return the same error one would expect from the CSI plugin. Trying to publish a volume that is not staged returns FAILED_PRECONDITION.
ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (np *NodePlugin) GetPublisedPath(volumeID string) (string, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock()+	if volInfo, ok := np.VolumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.targetPath, nil+		}+		return volInfo.targetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock() 	resp := &csi.NodeGetInfoResponse{ 		NodeId: np.NodeID, 	}  	return makeNodeInfo(resp), nil }++func (np *NodePlugin) NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID+	stagingTarget := fmt.Sprintf(TargetPath, volID)++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()++	v := &volumePublishStatus{+		targetPath: stagingTarget,+	}++	np.VolumeMap[volID] = v++	return nil+}++func (np *NodePlugin) NodeUnstageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()+	if v, ok := np.VolumeMap[volID]; ok {+		if v.isPublished {+			return status.Errorf(codes.InvalidArgument, "VolumeID %s is not unpublished", volID)+		}+		delete(np.VolumeMap, volID)+		return nil+	}++	return status.Errorf(codes.NotFound, "VolumeID %s is not staged", volID)+}++func (np *NodePlugin) NodePublishVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID++	// Check arguments+	if len(volID) == 0 {+		return status.Error(codes.InvalidArgument, "Volume ID missing in request")+	}++	np.Lock.Lock()+	defer np.Lock.Unlock()+	if v, ok := np.VolumeMap[volID]; ok {+		if _, err := os.Stat(v.targetPath); os.IsNotExist(err) {+			os.Mkdir(v.targetPath, os.ModeDir)+		}

I don't believe you can publish to the same directory you stage to, so you'll need a different directory for publishing.

Additionally, you can remove the os import from this file. We don't need it right now, and it just creates side effects if you call this method in testing.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex

Nit: in swarmkit, the lock for an object is usually named "mu", which helps it not conflict with the method name "Lock".

mu sync.RWMutex
ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (np *NodePlugin) GetPublisedPath(volumeID string) (string, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock()+	if volInfo, ok := np.VolumeMap[volumeID]; ok {+		if volInfo.isPublished {+			return volInfo.targetPath, nil+		}+		return volInfo.targetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil }  func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) {+	np.Lock.RLock()+	defer np.Lock.RUnlock()

Why does this lock need to be acquired here?

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (np *NodePlugin) GetPublisedPath(volumeID string) (string, error) {

Additionally, it's probably better to have this just return empty string if the volume is not published.

func (np *NodePlugin) GetPublishedPath(volumeID string) string
ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}

Change all of these fields to be unexported; that is, change them to start with a lowercase letter.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (np *NodePlugin) GetPublisedPath(volumeID string) (string, error) {

Spelling error, should be GetPublishedPath.

ameyag

comment created time in 18 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*volumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex+}++const TargetPath string = "/var/lib/docker/%s"++func NewNodePlugin(name string) *NodePlugin {+	return &NodePlugin{+		Name:      name,+		VolumeMap: make(map[string]*volumePublishStatus),+	}+}++//// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.
  • Comments should start with two slashes
  • Comments immediately proceeding function definitions are godoc comments, and should start with the name of the function.
// GetPublishedPath returns the path at which the provided volume ID is published.
// Returns an empty string if the volume does not exist
ameyag

comment created time in 18 days

PullRequestReviewEvent
PullRequestReviewEvent

push eventdperny/swarmkit-1

Drew Erny

commit sha 39cbdb4a36f5e1ed13d96bdbd735ac968ea78183

Add volumeSet to scheduler Adds the volumeSet object to the scheduler. This object keeps track of volumes available on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 19 days

push eventdperny/swarmkit-1

Drew Erny

commit sha f32ada87fb58d3e06d5e89727ecb2f2bdb1dd4f3

Merge pull request #2969 from dperny/swarm-volume-rename-package [feature-volumes] Rename CSI volumes package

view details

Ameya Gawde

commit sha 3ae3a7a3c9766c9dc7322f7e74d3dc95433bf640

Add implementation for NodeGetInfo Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Ameya Gawde

commit sha a29724224e61e1fdc3da12dd2906c1adc96d243a

Add foundational test code for node agent Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Ameya Gawde

commit sha da5afda1be894f2c000c86095d32db946003243f

Add CSI info test for agent Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Drew Erny

commit sha e90f22ed95a1f86e403305cb6891095c5eeb14c5

Merge pull request #2970 from ameyag/csi-worker-node-info [feature-volumes] CSI Node Info and reporting

view details

Drew Erny

commit sha 0aedafe9f5fac654bab971158403ddb331dfbff5

Add repeated VolumeAttachment to Task object In the initial API commit, I forgot to add the repeated VolumeAttachment field to the Task object. This commit fixes that oversight. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 297c4ff29653f12ea52fbf8e121e40ab37f3bc1d

Merge pull request #2972 from dperny/swarm-volume-attachments [feature-volumes] Add repeated VolumeAttachment to Task object

view details

Drew Erny

commit sha 7f8357de91eb50c6e3b928801c1fd1015f842b34

Add node inventory tracking to CSI manager Adds code to keep track of node ID mappings to the CSI manager and the csi Plugin interface. This will allow us to use the Plugin interface solely in terms of the swarmkit node ID. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 551cfa36c459c29a471bd834b0aa82d882f0a6d8

Add code for determining if a volume is available Adds code to the csi volume Manager, which determines if a volume is available on a given node or not. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 49c14eac6ed76d32af4908c5cace5b399e96a64d

Add volumeSet to scheduler Adds the volumeSet object to the scheduler. This object keeps track of volumes available on the system. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 25 days

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"math"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const maxRetry float64 = 25++const initialBackoff = 100 * time.Millisecond+const factor float64 = 2++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:      make(map[string]*api.VolumeAssignment),+		plugin: NewNodePlugin(),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	return r.plugin.GetPublisedPath(volumeID)+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {

The call to Add is no longer blocking, which is good. However, if one set of volumes is being processed by addVolumes, every subsequent set is going to be blocked waiting on that first set to finish.

You need to set this up so that a failing volume can't hold up other volumes. One solution may be to do something like this:

// tryVolume is called in a goroutine
func (r *volumes) tryVolume(ctx context.Context, assignment *api.VolumeAssignment) {
	// nota bene: the Plugin object needs to contain a mutex. The CSI spec says
	// that the orchestrator is responsible for making sure there is no more
	// than one call in flight at any given time.
	if err := r.plugin.NodeStageVolume(ctx, assignment); err != nil {
		waitFor := initialBackoff
	retryStage:
		for i := 0; i < maxRetries; i++ {
			select {
			case <-ctx.Done():
				// selecting on ctx.Done() allows us to bail out of retrying early
				return
			case <-time.After(waitFor)
				// time.After is better than using time.Sleep, because it blocks
				// on a channel read, rather than suspending the whole
				// goroutine. That lets us do the above check on ctx.Done().
				//
				// time.After is convenient, but it has a key problem: the timer
				// is not garbage collected until the channel fires. this
				// shouldn't be a problem, unless the context is canceled, there
				// is a very long timer, and there are a lot of other goroutines
				// in the same situation.
				if err := r.plugin.NodeStageVolume(ctx, assignment); err == nil {
					break retryStage 
				}
			}
			// if the exponential factor is 2, you can avoid using floats by
			// doing bit shifts. each shift left increases the number by a power
			// of 2. we can do this because Duration is ultimately int64.
			waitFor := waitFor << 1
		}
	}
	
	// then, do something similar for publish
}

Each volume will be called in its own goroutine, and will not block any other volume from proceeding.

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"math"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const maxRetry float64 = 25++const initialBackoff = 100 * time.Millisecond+const factor float64 = 2++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:      make(map[string]*api.VolumeAssignment),+		plugin: NewNodePlugin(),

This needs to be a collection of some type, probably a map, because the node can have many possible plugins. You need to look at the VolumeAssignment to know which plugin to use.

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"math"+	"sync"+	"time"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const maxRetry float64 = 25++const initialBackoff = 100 * time.Millisecond+const factor float64 = 2++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m:      make(map[string]*api.VolumeAssignment),+		plugin: NewNodePlugin(),+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	return r.plugin.GetPublisedPath(volumeID)+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(ctx).WithField("method", "(*volumes).Add").Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	var failedStage []*api.VolumeAssignment+	var failedPublish []*api.VolumeAssignment+	for _, v := range volumeObjects {+		err := r.plugin.NodeStageVolume(ctx, v)+		if err == nil {+			log.G(ctx).WithField("method", "(*volumes).addVolumes").Debugf("Volume:%v staged successfully", v.VolumeID)+			errPub := r.plugin.NodePublishVolume(ctx, v)+			if errPub != nil {+				failedPublish = append(failedPublish, v)+			} else {+				log.G(ctx).WithField("method", "(*volumes).addVolumes").Debugf("Volume:%v published successfully", v.VolumeID)+			}+		} else {+			failedStage = append(failedStage, v)+		}+	}++	if len(failedStage) > 0 {+		r.retryAddVolumes(ctx, failedStage)+	}++	if len(failedPublish) > 0 {+		r.retryPublishVolume(ctx, failedPublish)+	}+}++func (r *volumes) retryAddVolumes(ctx context.Context, volumeObjects []*api.VolumeAssignment) {

There is a lot of duplication between this and addVolumes. They should be condensed into a single method.

ameyag

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

PR opened docker/swarmkit

Add basic dispatcher handling of Volumes

Add basic handling of volumes to the dispatcher.

Creates VolumeAssignments and Secret assignments for the volumes.

+253 -7

0 comment

3 changed files

pr created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(logctx).Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeStageVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v staged successfully", v.VolumeID)+				r.publishVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume stage failed for volID=%v, retrying...", v.VolumeID)+			retry+++		}+	}+}++func (r *volumes) publishVolume(volumeObject *api.VolumeAssignment) {+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Publish")+	retry := 0+	for retry < MaxRetry {+		err := r.plugin.NodePublishVolume(ctx, volumeObject)+		if err == nil {+			log.G(logctx).Debugf("Volume:%v published successfully", volumeObject.VolumeID)+			return+		}+		log.G(logctx).WithError(err).Errorf("Volume publish failed for volID=%v, retrying...", volumeObject.VolumeID)+		retry+++	}+}++func (r *volumes) unStageVolume(volumeObject *api.VolumeAssignment) {+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Unstage")+	retry := 0+	for retry < MaxRetry {+		err := r.plugin.NodeUnstageVolume(ctx, volumeObject)+		if err == nil {+			log.G(logctx).Debugf("Volume:%v unstaged successfully", volumeObject.VolumeID)+			return+		}+		log.G(logctx).WithError(err).Errorf("Volume unstage failed for volID=%v, retrying...", volumeObject.VolumeID)+		retry+++	}+}++// Remove removes one or more volumes by ID from the volumes map. Succeeds+// whether or not the given IDs are in the map.+func (r *volumes) Remove(volumes []string) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Remove")+	for _, volume := range volumes {+		v := r.m[volume]+		log.G(logctx).Debugf("Remove Volume:%v", volume)+		if v != nil {+			volumeObjects = append(volumeObjects, v)+		}+		delete(r.m, volume)+	}+	go r.removeVolumes(volumeObjects)+}++// Reset removes all the volumes.+func (r *volumes) Reset() {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	for _, v := range r.m {+		volumeObjects = append(volumeObjects, v)+	}+	r.m = make(map[string]*api.VolumeAssignment)++	go r.removeVolumes(volumeObjects)+}++func (r *volumes) removeVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Unpublish")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeUnPublishVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v unpublished successfully", v.VolumeID)+				r.unStageVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume unpublish failed for volID=%v, retrying...", v.VolumeID)+			retry+++		}+	}+}++// taskRestrictedVolumesProvider restricts the ids to the task.+type taskRestrictedVolumesProvider struct {+	volumes           exec.VolumeGetter+	volumeIDSourceMap map[string]string //map restricted per task

OK, I see, I missed that this Get method is incorrect. It needs to pass through to the Get method on the volume manager, not return the volume source.

ameyag

comment created time in a month

PullRequestReviewEvent

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(logctx).Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeStageVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v staged successfully", v.VolumeID)+				r.publishVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume stage failed for volID=%v, retrying...", v.VolumeID)+			retry+++		}+	}+}++func (r *volumes) publishVolume(volumeObject *api.VolumeAssignment) {+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Publish")

As mentioned in another spot, "module" means something different.

In other places in Swarmkit, we use this field method to specify the method being called. You can see an example here:

https://github.com/docker/swarmkit/blob/d91813c48f03d585310d22a87e59ab6ba8ebecb5/agent/session.go#L185

So this log might be worth rewriting as log.WithField("method", "(*volumes).publishVolume")

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(logctx).Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeStageVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v staged successfully", v.VolumeID)+				r.publishVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume stage failed for volID=%v, retrying...", v.VolumeID)+			retry++

Additionally, the CSI spec says that failures should be retried with "exponential backoff".

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(logctx).Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeStageVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v staged successfully", v.VolumeID)+				r.publishVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume stage failed for volID=%v, retrying...", v.VolumeID)+			retry+++		}+	}+}++func (r *volumes) publishVolume(volumeObject *api.VolumeAssignment) {+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Publish")+	retry := 0+	for retry < MaxRetry {+		err := r.plugin.NodePublishVolume(ctx, volumeObject)+		if err == nil {+			log.G(logctx).Debugf("Volume:%v published successfully", volumeObject.VolumeID)+			return+		}+		log.G(logctx).WithError(err).Errorf("Volume publish failed for volID=%v, retrying...", volumeObject.VolumeID)+		retry+++	}+}++func (r *volumes) unStageVolume(volumeObject *api.VolumeAssignment) {+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Unstage")+	retry := 0+	for retry < MaxRetry {+		err := r.plugin.NodeUnstageVolume(ctx, volumeObject)+		if err == nil {+			log.G(logctx).Debugf("Volume:%v unstaged successfully", volumeObject.VolumeID)+			return+		}+		log.G(logctx).WithError(err).Errorf("Volume unstage failed for volID=%v, retrying...", volumeObject.VolumeID)+		retry+++	}+}++// Remove removes one or more volumes by ID from the volumes map. Succeeds+// whether or not the given IDs are in the map.+func (r *volumes) Remove(volumes []string) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Remove")+	for _, volume := range volumes {+		v := r.m[volume]+		log.G(logctx).Debugf("Remove Volume:%v", volume)+		if v != nil {+			volumeObjects = append(volumeObjects, v)+		}+		delete(r.m, volume)+	}+	go r.removeVolumes(volumeObjects)+}++// Reset removes all the volumes.+func (r *volumes) Reset() {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	for _, v := range r.m {+		volumeObjects = append(volumeObjects, v)+	}+	r.m = make(map[string]*api.VolumeAssignment)++	go r.removeVolumes(volumeObjects)+}++func (r *volumes) removeVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Unpublish")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {+			err := r.plugin.NodeUnPublishVolume(ctx, v)+			if err == nil {+				log.G(logctx).Debugf("Volume:%v unpublished successfully", v.VolumeID)+				r.unStageVolume(v)+				break //break retry loop+			}+			log.G(logctx).WithError(err).Errorf("Volume unpublish failed for volID=%v, retrying...", v.VolumeID)+			retry+++		}+	}+}++// taskRestrictedVolumesProvider restricts the ids to the task.+type taskRestrictedVolumesProvider struct {+	volumes           exec.VolumeGetter+	volumeIDSourceMap map[string]string //map restricted per task

Is the value of this map ever used? I see it's written to, but is it ever read?

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()+		log.G(logctx).Debugf("Add Volume:%v", volume.VolumeID)++		r.m[volume.VolumeID] = v+		volumeObjects = append(volumeObjects, v)+	}+	go r.addVolumes(volumeObjects)+}++func (r *volumes) addVolumes(volumeObjects []*api.VolumeAssignment) {+	r.lock.Lock()+	defer r.lock.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, v := range volumeObjects {+		retry := 0+		for retry < MaxRetry {

It's probably better to skip and come back to a volume that we can't publish successfully. Consider the case where several volume assignments for different plugins come in, and one of those plugins is inoperable. We will be stuck retrying that one volume 25 times while other, valid volumes are stuck behind it.

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")

In this case, your argument to WithModule would be agent/csi. The "module" is roughly speaking the component or subcomponent path.

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

+package csi++import (+	"context"+	"fmt"+	"sync"++	"github.com/docker/swarmkit/agent/exec"+	"github.com/docker/swarmkit/api"+	"github.com/docker/swarmkit/log"+)++// volumes is a map that keeps all the currently available volumes to the agent+// mapped by volume ID.+type volumes struct {+	mu     sync.RWMutex // To sync between Add(), Remove() and Reset() for "m" map+	lock   sync.RWMutex // To sync between Get(), addVolumes() and removeVolumes()+	m      map[string]*api.VolumeAssignment+	plugin *NodePlugin+}++const MaxRetry int = 25++// NewManager returns a place to store volumes.+func NewManager() exec.VolumesManager {+	return &volumes{+		m: make(map[string]*api.VolumeAssignment),+		plugin: &NodePlugin{+			VolumeMap: make(map[string]*VolumePublishStatus),+		},+	}+}++// Get returns a volume published path for the provided volume ID.  If the volume doesn't exist, returns empty string.+func (r *volumes) Get(volumeID string) (string, error) {+	r.lock.RLock()+	defer r.lock.RUnlock()+	if volInfo, ok := r.plugin.VolumeMap[volumeID]; ok {+		if volInfo.IsPublished {+			return volInfo.TargetPath, nil+		}+		return volInfo.TargetPath, fmt.Errorf("volume %s is not published", volumeID)+	}+	return "", nil+}++// Add adds one or more volumes to the volume map.+func (r *volumes) Add(volumes ...api.VolumeAssignment) {+	r.mu.Lock()+	var volumeObjects []*api.VolumeAssignment+	defer r.mu.Unlock()+	ctx := context.Background()+	logctx := log.WithModule(ctx, "CSI Volume Add")+	for _, volume := range volumes {+		v := volume.Copy()

Why copy?

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*VolumePublishStatus++	// Lock for VolumeMap+	Lock sync.RWMutex } +const TargetPath string = "/var/lib/docker/%s"+ func (np *NodePlugin) NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error) { 	resp := &csi.NodeGetInfoResponse{ 		NodeId: np.NodeID, 	}  	return makeNodeInfo(resp), nil }++func (np *NodePlugin) NodeStageVolume(ctx context.Context, req *api.VolumeAssignment) error {++	volID := req.VolumeID+	stagingTarget := fmt.Sprint(TargetPath, volID)

So, if I'm understanding correctly, we're not making the actual RPC call, and instead at this stage just printing information?

ameyag

comment created time in a month

Pull request review commentdocker/swarmkit

[feature-volumes] Handle VolumeAssignment from dispatcher

 type NodePlugin struct {  	// NodeClient is the node service client 	NodeClient csi.NodeClient++	// VolumeMap is the map from volume ID to Volume. Will place a volume once it is staged,+	// remove it from the map for unstage.+	// TODO: Make this map persistent if the swarm node goes down+	VolumeMap map[string]*VolumePublishStatus

Avoid using exported fields in objects. Use a constructor instead.

ameyag

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

create barnchdperny/swarmkit-1

branch : swarm-volume-dispatcher

created branch time in a month

push eventdocker/swarmkit

Drew Erny

commit sha 0aedafe9f5fac654bab971158403ddb331dfbff5

Add repeated VolumeAttachment to Task object In the initial API commit, I forgot to add the repeated VolumeAttachment field to the Task object. This commit fixes that oversight. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 297c4ff29653f12ea52fbf8e121e40ab37f3bc1d

Merge pull request #2972 from dperny/swarm-volume-attachments [feature-volumes] Add repeated VolumeAttachment to Task object

view details

push time in a month

PR merged docker/swarmkit

[feature-volumes] Add repeated VolumeAttachment to Task object

In the initial API commit, I forgot to add the repeated VolumeAttachment field to the Task object. This commit fixes that oversight.

+197 -111

2 comments

3 changed files

dperny

pr closed time in a month

PR opened docker/swarmkit

[feature-volumes] Add repeated VolumeAttachment to Task object

In the initial API commit, I forgot to add the repeated VolumeAttachment field to the Task object. This commit fixes that oversight.

+197 -111

0 comment

3 changed files

pr created time in a month

create barnchdperny/swarmkit-1

branch : swarm-volume-attachments

created branch time in a month

issue closedmoby/moby

Swarm Jobs Proposal

This issue is a proposal for the implementation of Swarm Jobs. It has already undergone a closed design review, and I am now bringing it to the community for your input. This proposal has been adapted from that format.

This issue is a formal feature proposal for docker/swarmkit#2852 and #23880.

Problem Statement

Swarm Jobs are a feature allowing the scheduling of different kinds of one-off Tasks, which run to completion and then exit. Jobs are desired to perform tasks like database migrations, or periodic batch operations. Jobs either run on-demand, or are scheduled to occur at specific times (“cron-jobs”). augmenting Replicated and Global services.

Background

Swarmkit is the Docker Engine’s native container orchestration platform. It provides a way to easily manage distributed services in a cluster. Swarmkit represents workloads as “Services” which spawn “Tasks”. The user inputs the desired state of the Service as a “Service Spec”, which explains what and how a Service should run. Swarm creates a Service object to represent the actual state of the Service, and then spawns Tasks based on the spec.

Currently, Swarm supports Services in two modes: Replicated and Global. Replicated Services run a certain number of Tasks of a given workload somewhere on the cluster. Global Services run one Task on each node in the cluster (subject to constraints). Both of these Service modes are designed for long-running daemon-type workloads, like web servers or databases.

Swarmkit’s model includes the concept of Task States, which track the progression and lifecycle of a given Task of a Service. When a Task exits, it enters one of several terminal states, depending on what caused it to exit. Regardless of the terminal state entered, with existing Service modes, a new Task is created to replace the Task that exited. With some irrelevant states elided, these terminal states include:

  • COMPLETE: entered when a task exits with no errors (code 0).
  • SHUTDOWN: the task is is requested to exit by the orchestrator.
  • FAILED: the task exits with an error (code non-0).
  • REJECTED: the task cannot be executed on the given node.

Notably, the COMPLETE state is considered more-or-less equivalent to the other terminal states by most of the swarmkit code, and will result in a new Task being created. However, this state becomes important to the implementation of Jobs, as it allows us to differentiate between successful and unsuccessful job runs.

Goals

  • Implement support for one-off on-demand jobs
  • Implement support for scheduled, recurring “cron” jobs.
  • Allow specifying total number of desired completed jobs, and total number of simultaneously executing Tasks for a job.
  • Allow specifying a “global” job, to execute on every node.
  • Allow specifying a failure threshold, beyond which job execution will stop being attempted.
  • Allowing re-execution of a completed job.

Non-Goals

  • Provide assistance for coordinating or partitioning work between instances of a Job.
  • Implement any sort of dependency between jobs (“job A only proceeds if job B first succeeds”)
  • Garbage collection of completed jobs.

Functional Elements

Behavior

Jobs

“Jobs” is not a technical term. A Job is a kind of Service, and it follows the rules of all other Swarm Services. The most notable difference between job Services and the existing long-running Services is that the Tasks spawned for Jobs are not rescheduled when they enter the COMPLETE state. Jobs have two modes, like long-running services: Replicated and Global.

Replicated Jobs

A Replicated Job is any job for which a certain number of successful runs is expected. Replicated jobs can execute in parallel, and will be scheduled to any node meeting the placement constraints of the Service. Replicated Jobs have two main parameters: maximum concurrent tasks, and total completions.

Maximum concurrent tasks indicates the maximum number of Tasks to execute at the same time for the job. When a Replicated job is started, Swarmkit will launch Tasks up to the count of maximum concurrent tasks. Like existing Services, if placement constraints do not allow this many tasks to be run simultaneously, then the maximum number of tasks may not be the number actually running.

Total completions is the total number of Tasks desired to enter the COMPLETE state. Jobs will be fulfilled when the count of COMPLETE tasks is equal to the total completions.

When a Task completes, another Task can be scheduled to execute. This will continue until the sum of the total of number of running tasks and the total number of completed tasks is equal to value of total completions. This means a job will never spawn more Tasks than necessary to fulfill the desired total completions.

Global Jobs

Global Jobs are like Global Services; they run on every node matching placement constraints. Global Jobs are much simpler than Replicated Jobs, as they have no special parameters. They simply run on every node.

Running and Re-running Jobs

Creating a job will cause it to be immediately executed. When a job is finished, the Service object will remain until it is manually removed. Users may want to re-execute a job with the same settings many times. To simplify this, users can increment the ForceUpdate field of the job’s Task spec, which will cause the job to be re-executed as if it was new. Additionally, any other update to the job’s Spec will cause it to re-execute.

If a job has Tasks running when its execution is requested, those tasks will be immediately terminated, and new Tasks will be started. In the case of cron jobs, if execution of the job’s Tasks takes longer than the scheduling interval, then the Tasks for the job may never complete.

Cron Jobs

Instead of manually executing and re-executing a job, users may want to schedule a job to run periodically. While this could be accomplished with clever use of the cron utility, because of the difficulty of doing so in the distributed cluster environment, this functionality is provided natively within Swarm. Both Replicated and Global jobs can be configured with cron -like specificity to execute at particular times or intervals. Importantly, however, this does not allow for at -like one-off scheduling at a future time.

Importantly, users may try to set up a Cron Job, and then push new image versions, expecting the job to pull the new image each time. Swarmkit pins the image version by using the image digest when a service is created. This means that each execution of a Cron Job will use the exact same image. Even when Swarm is explicitly instructed not to pin images by digest, this leaves selecting the image version up to the engine, which may or may not pull the latest version of the image. Because of these difficulties, this pattern is not supported.

Handling Failure

To specify how failed Tasks are handled in Jobs, we used the TaskSpec’s existing RestartPolicy. The only difference in behavior for Jobs is that the restart condition RestartOnAny will have the same effect as RestartOnFailure. The whole goal of a Job is to run to completion, and restarting when it does so is contrary to that purpose.

Differences from Long-Running Services

Notably, Jobs do not support setting an UpdateConfig. The UpdateConfig is used to specify the parameters of a rolling update. Jobs do not need to be rolled out; updating a job will trigger its re-execution. If a Job is currently running when it is updated, all running tasks will be aborted.

REST API

The Docker REST API will be expanded to encompass the new Jobs functionality.

ServiceMode

The ServiceMode struct indicates the mode of a service. Jobs will introduce two new fields, corresponding to the new modes, to this struct.

// ServiceMode represents the mode of a service.
type ServiceMode struct {
	Replicated    *ReplicatedService `json:",omitempty"`
	Global        *GlobalService     `json:",omitempty"`
	ReplicatedJob *ReplicatedJob     `json:",omitempty"`
	GlobalJob     *GlobalJob         `json:",omitempty"`
}

ReplicatedJob

// ReplicatedJob is a type of one-off job which executes many Tasks in parallel,
// until a certain number of Tasks have succeeded.
type ReplicatedJob struct {
	// MaxConcurrent indicates the maximum number of Tasks that should be
	// executing simultaneously at any given time.
	MaxConcurrent uint64 `json:",omitempty"`
	
	// TotalCompletions sets the total number of Tasks desired to run to
	// completion. This is also the absolute maximum number of jobs that
	// will be executed in parallel. That is, if this number is smaller
	// than MaxConcurrent, only this many replicas will be run in parallel.
	TotalCompletions uint64 `json:",omitempty"`
	
	// Schedule, if non-empty, specifies the times and intervals at which
	// to re-run this job.
	Schedule *Schedule `json:",omitempty"`
}

GlobalJob

// GlobalJob is a type of one-off job which executes one Task on every
// node matching the Service's constraints.
type GlobalJob struct {
	// Schedule, if non-empty, specifies the times and intervals at which
	// to re-run this job.
	Schedule *Schedule `json:",omitempty"`
}

Schedule

The decision of whether to represent the Schedule intervals as a simple string type or as a more structured datatype is a difficult one.

If a cron-like string is used, then it becomes the responsibility of the server to parse that string. Changes to that parser would have to be made very carefully, so as not to break existing strings or change their meaning. On the other hand, new concepts, like non-standard cron characters, could be easily added.

If a structured datatype is used, then there is no concern over parsing, and the cron syntax is not baked into the datatype. The downside is that, to be fully expressive, the data may be represented inefficiently.

// Schedule indicates time intervals. It is a structured datatype which
// represents time intervals as a set of values. Each interval field
// (minutes, hours, day of month, months, day of week) is represented by a
// list of values.
//
// If day of month and day of week are both specified, then only days which
// match *both* values are valid. That is, if the 13th day of the month and
// Friday are both required, then the job will only run on Friday the 13th.
//
// An empty list for any field is invalid (though it may be assigned a valid
// meaning in the future). The equivalent to cron's "*" value is a complete
// list of all possible values for that field.
//
// NOTE: Times specified in the Schedule are _always_ in terms of the
// system timezone. If different nodes have different system timezones
// set, this may lead to odd or unexpected behavior.
type Schedule struct {
	// Minute is the minute of the hour. Valid values are 0-59 inclusive.
	Minute []int
	
	// Hour is the hour of the day. Valid values are 0-23 inclusive.
	Hour []int
	
	// DayOfMonth is the day of the month. Valid values are 1-31 inclusive.
	DayOfMonth []int
	
	// Month is the month of the year. Valid values are 1-12 inclusive.
	Month []int
	
	// DayOfWeek is the day of the week. Valid values are 0-6 inclusive.
	// 0 is Sunday and 6 is Saturday. 7 is not a valid value; yoiu only get
	// one Sunday.
	DayOfWeek []int
	
	// Window represents the window of time in which the job can be executed.
	// Jobs will be executed, by default, as close to the specified time as
	// possible. However, in the case of outages, the Swarm may be unavailable
	// to start execution of the job at precisely the desired time. Window
	// informs Swarmkit up to how long past the the desired scheduling time
	// Swarm should attempt to start the job. If empty or 0, Swarm will never
	// attempt to start the job if its desired execution time has been missed.
	Window time.Duration `json:",omitempty"`
}

JobStatus

The Service object will be amended to include a JobStatus field, which contains an object indicating when the job was last executed and, if a cron job, the computed time when the job will be next executed.

// JobStatus indicates the status of the Service, if it is a Job-mode
// service.
type JobStatus struct {
	// JobIteration is the count of how many times this job has been
	// executed, successfully or otherwise.
	JobIteration Version
	
	// LastExecution is the time that the job was last executed. If
	// the JobStatus is present on the Service, LastExecution will
	// always be present.
	LastExecution time.Time
	
	// NextExecution is the time that the job will next be executed,
	// if the job is a cron job. If the job is not a cron job, this
	// field will be empty.
	NextExecution *time.Time `json:",omitempty"`
}

Design Elaboration

The Design Elaboration contains a lower-level overview of specific details of implementation. It is not necessary to review this section in order to under the feature or its behavior, but it may be useful for providing further detail.

Swarmkit Implemention

Task Life Cycle

The life cycle of a Task in Swarmkit involves the Task passing through several states, and being operated on by specific components concerned with specific states. The life-cycle of job Tasks will be largely identical to that of of service-type tasks, with minor differences.

Tasks are created by the Orchestrator. Each service mode has a corresponding Orchestrator to handle its special requirements. Jobs will likewise need a new Orchestrator to handle the ReplicatedJob and GlobalJob service modes.

These Orchestrators will share much in common with the existing replicated and global orchestrators, but will be much simpler because of the lack of need to handle the service update cases.

When a job is processed by the orchestrator, it will have a job iteration value set. This value will increase monotonically with each successive iteration. Each Task spawned as part of a job will have the job iteration value present, in order to distinguish which iteration of the Task the job belongs to. The existing SpecVersion field on Tasks will not be repurposed to store job iteration.

Scheduling Cron Jobs

In order to handle cron jobs, a new component will be needed. Both Orchestrators will share a common component, here referred to as the “cron scheduler”, which will notify them of the need to reschedule a job. During initialization, the cron scheduler will compute the last desired start time for each job. Then, it will determine if the job has been run since that time. If it has not, and the Window duration has not elapsed, then the cron scheduler will immediately signal the orchestrator to launch the job. In either case, the cron scheduler the computes the duration until the next desired run time, and creates a timer to expire at that time. When one of the timers expires, the cron scheduler will signal the orchestrator to begin the specified job.

The orchestrator will rely on and trust the cron scheduler to notify it of when to execute a job. It will not double-check whether execution of a cron job is needed. Correctness of the cron scheduler is highly important, as if the cron scheduler signals the orchestrator to execute the job erroneously, the job will be executed without question.

Reliability

Reliability of cron jobs is the most difficult problem in this proposal, as it is the only one that has no analogous solution for the existing service modes. The naive solution for cron jobs, to fire when the scheduling interval has occurred, creates a situation where a system outage during a scheduled cron time can cause the job to not be executed. To ameliorate this problem, the “Window” field is included as part of the Schedule type, and allows the user to specify the window in which execution of the job is valid.

At startup, the cron scheduler will compute the last desired execution time of the job, and compare this to the last known execution time of the job. If no execution has occurred since the last desired interval, and the duration specified by Window has not elapsed since that desired execution time, the cron scheduler will immediately notify the Orchestrator to start the job.

“Time” is a difficult concept in distributed systems, specifically when it comes to ordering events. However, as the whole point of cron jobs is to execute at specific times, we must unfortunately contend with it. To simplify as much as possible, the JobIteration field acts as a Lamport clock, associating each job execution with the wall-clock time at which it was started. This way, each Task in a cron job is associated not with the clock-time of the cron job execution but with the particular job iteration, and the responsibility for dealing with wall clock times lies solely in the cron scheduler.

Scalability

Job functionality, being an extension of existing Service functionality, is subject to the same scalability constraints as existing services. There are a few new scale problems to contend with, however:

  • Running cron jobs in a tight (~1 minute) interval will cause heavy loading on the cluster, as Tasks are constantly churned. There is no obvious workaround or optimization for this case, and so users should be advised to not run cron jobs with small intervals.
  • Cron jobs are scheduled with minute granularity. This means that by default, users will likely choose minute 0 or minute 1 for every job, and heavy users will likely see large cluster loading at the top of the hour, as all cron jobs will be executed simultaneously. In the future, it may be advisable to invest in functionality analogous to Jenkin’s H character, which instructs the scheduler to pick some time during the interval at random, but use the same time every time, which spreads out the start time of workloads automatically.
  • Garbage collection of jobs is excluded from this proposal, which means users will need to ensure they remove jobs that are no longer needed.

closed time in 2 months

dperny

push eventdperny/swarmkit-1

Drew Erny

commit sha a39ec68ac7ee26278b484f3070bff57b1dfd4544

Add code for determining if a volume is available Adds code to the csi volume Manager, which determines if a volume is available on a given node or not. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

PR opened docker/swarmkit

[feature-volumes] Determine if a given mount can be satisfied by a volume on a given node

Adds code to determine if a mount can be satisfied by a volume on the given node. This code will be used as part of the scheduling algorithm.

+1121 -3

0 comment

8 changed files

pr created time in 2 months

create barnchdperny/swarmkit-1

branch : swarm-volume-scheduler

created branch time in 2 months

PR closed docker/swarmkit

[feature-volumes] Add foundational Volumes support code

- What I did

This code is a pull request against the feature-volumes branch, and so will not be merged into master until volumes work is completed

Added a bunch of code which will form the foundation of Volumes work. This includes:

  • Added the Volumes protocol buffers (supercedes #2945)
  • Added code to support Volumes in the swarmkit object store.
  • Added Volumes controlapi implementation, which allows for the creation of volumes from the swarmkit API. This does not include deletion, which is more complicated and which will be handled at a later time.
  • Added basic test rigging, which allows for vendoring the CSI go library. This work includes:
    • Updated a substantial amount of vendoring around protocol buffers and gRPC
    • Added mostly-empty test files which serve as a proof of concept for future tests.
    • Includes some cruft-y pre-implementation scaffolding, which should largely be ignored. It will be expanded on in future PRs.

- How to test it

This includes automated tests, which should cover quite well the newly added code.

+102383 -21957

1 comment

338 changed files

dperny

pr closed time in 2 months

push eventdocker/swarmkit

Ameya Gawde

commit sha 3ae3a7a3c9766c9dc7322f7e74d3dc95433bf640

Add implementation for NodeGetInfo Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Ameya Gawde

commit sha a29724224e61e1fdc3da12dd2906c1adc96d243a

Add foundational test code for node agent Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Ameya Gawde

commit sha da5afda1be894f2c000c86095d32db946003243f

Add CSI info test for agent Signed-off-by: Ameya Gawde <agawde@mirantis.com>

view details

Drew Erny

commit sha e90f22ed95a1f86e403305cb6891095c5eeb14c5

Merge pull request #2970 from ameyag/csi-worker-node-info [feature-volumes] CSI Node Info and reporting

view details

push time in 2 months

PR merged docker/swarmkit

[feature-volumes] CSI Node Info and reporting
  • Adding a node plugin object
  • Implementation for NodeGetInfo and reporting NodeCSIInfo
  • Test using fake data
+211 -1

2 comments

6 changed files

ameyag

pr closed time in 2 months

pull request commentdocker/swarmkit

[feature-volumes] Add foundational Volumes support code

superceded by the now-merged #2966

dperny

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

+package csi++import (+	"context"++	"google.golang.org/grpc"++	"github.com/container-storage-interface/spec/lib/go/csi"+	"github.com/docker/swarmkit/api"+)++type NodePlugin interface {+	NodeGetInfo(ctx context.Context) (*api.NodeCSIInfo, error)+}++// plugin represents an individual CSI node plugin+type nodePlugin struct {+	// name is the name of the plugin, which is also the name used as the+	// Driver.Name field+	name string++	// node ID is identifier for the node.+	nodeID string++	// socket is the unix socket to connect to this plugin at.+	socket string++	// cc is the grpc client connection+	cc *grpc.ClientConn+	// idClient is the identity service client+	idClient csi.IdentityClient++	// nodeClient is the node service client+	nodeClient csi.NodeClient+}++// NewNodePlugin creates a new NodePlugin object.+func NewNodePlugin(nodeID string, name string) NodePlugin {+	return &nodePlugin{+		name:   name,+		nodeID: nodeID,+	}+}++func (np *nodePlugin) Client() csi.NodeClient {+	return np.nodeClient+}++// makeGetInfoRequest makes a csi.NodeGetInfoRequest.+// Messsage is emoty in CSI specs -  https://github.com/container-storage-interface/spec/blob/v1.2.0/csi.proto#L1292-L1293+func (np *nodePlugin) makeGetInfoRequest() *csi.NodeGetInfoRequest {+	return &csi.NodeGetInfoRequest{}+}

You can skip making this its own method and just create an object directly.

ameyag

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

 func (a *Agent) Publisher(ctx context.Context, subscriptionID string) (exec.LogP func (a *Agent) nodeDescriptionWithHostname(ctx context.Context, tlsInfo *api.NodeTLSInfo) (*api.NodeDescription, error) { 	desc, err := a.config.Executor.Describe(ctx) -	// Override hostname and TLS info 	if desc != nil {++		// Override CSI node plugin info+		ind := 0+		for _, plugin := range a.CSIPlugins {+			a := csi.NewNodePlugin(plugin.NodeID, plugin.PluginName)+			nodeCSIInfo, err := a.NodeGetInfo(ctx)+			if err != nil {+				return nil, err+			}+			desc.CSIInfo[ind] = nodeCSIInfo

This would almost certainly segfault anyway, as the slice hasn't been initialized.

ameyag

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

 func (a *Agent) Publisher(ctx context.Context, subscriptionID string) (exec.LogP func (a *Agent) nodeDescriptionWithHostname(ctx context.Context, tlsInfo *api.NodeTLSInfo) (*api.NodeDescription, error) { 	desc, err := a.config.Executor.Describe(ctx) -	// Override hostname and TLS info 	if desc != nil {++		// Override CSI node plugin info+		ind := 0+		for _, plugin := range a.CSIPlugins {+			a := csi.NewNodePlugin(plugin.NodeID, plugin.PluginName)+			nodeCSIInfo, err := a.NodeGetInfo(ctx)+			if err != nil {+				return nil, err+			}+			desc.CSIInfo[ind] = nodeCSIInfo

instead of keeping track of the index, just append to the array, like this:

desc.CSIInfo = append(desc.CSIInfo, nodeCSIInfo)
ameyag

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

 type Agent struct { 	// for this node known to the agent. 	node *api.Node +	CSIPlugins []*api.NodeCSIInfo

Instead of []*api.NodeCSIInfo, make this []NodePlugin. Then, when you're writing tests, you can just fill in CSIPlugins with already-created node plugins. We'll figure out how the node plugins get initialized later.

ameyag

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

 func (a *Agent) Publisher(ctx context.Context, subscriptionID string) (exec.LogP func (a *Agent) nodeDescriptionWithHostname(ctx context.Context, tlsInfo *api.NodeTLSInfo) (*api.NodeDescription, error) { 	desc, err := a.config.Executor.Describe(ctx) -	// Override hostname and TLS info 	if desc != nil {++		// Override CSI node plugin info+		ind := 0+		for _, plugin := range a.CSIPlugins {+			a := csi.NewNodePlugin(plugin.NodeID, plugin.PluginName)

Also, this is a contradiction -- in this form, you need to have the node CSI info in order to get the node CSI info.

ameyag

comment created time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] CSI Node Info and reporting

 func (a *Agent) Publisher(ctx context.Context, subscriptionID string) (exec.LogP func (a *Agent) nodeDescriptionWithHostname(ctx context.Context, tlsInfo *api.NodeTLSInfo) (*api.NodeDescription, error) { 	desc, err := a.config.Executor.Describe(ctx) -	// Override hostname and TLS info 	if desc != nil {++		// Override CSI node plugin info+		ind := 0+		for _, plugin := range a.CSIPlugins {+			a := csi.NewNodePlugin(plugin.NodeID, plugin.PluginName)

Instead of calling NewNodePlugin every time the node description is updated, store already-initialized node plugins on the agent, and then get the node info.

ameyag

comment created time in 2 months

create barnchdperny/swarmkit-1

branch : swarm-volume-node-inventory

created branch time in 2 months

push eventdocker/swarmkit

Drew Erny

commit sha 58ba323fae9dd865bde9b3e826a1714cbe244a31

Rename CSI volumes package Renames the github.com/docker/swarmkit/manager/volumes package to github.com/docker/swarmkit/manager/csi, which more accurately reflects what the purpose of that package is. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha f32ada87fb58d3e06d5e89727ecb2f2bdb1dd4f3

Merge pull request #2969 from dperny/swarm-volume-rename-package [feature-volumes] Rename CSI volumes package

view details

push time in 2 months

PR merged docker/swarmkit

[feature-volumes] Rename CSI volumes package

Renames the github.com/docker/swarmkit/manager/volumes package to github.com/docker/swarmkit/manager/csi, which more accurately reflects what the purpose of that package is.

+30 -30

1 comment

10 changed files

dperny

pr closed time in 2 months

PR opened docker/swarmkit

[feature-volumes] Rename CSI volumes package

Renames the github.com/docker/swarmkit/manager/volumes package to github.com/docker/swarmkit/manager/csi, which more accurately reflects what the purpose of that package is.

+30 -30

0 comment

10 changed files

pr created time in 2 months

create barnchdperny/swarmkit-1

branch : swarm-volume-rename-package

created branch time in 2 months

PR merged docker/swarmkit

[feature-volumes] Swarm volume create

Adds code for creating CSI volumes. This includes:

  • The basic Plugin object, which manages the connection to the CSI plugin
  • The basic VolumeManager object, which manages plugins and responds to store events

This also includes lots of tests and tests rigging, including fake CSI clients.

+92488 -16721

1 comment

337 changed files

dperny

pr closed time in 2 months

push eventdocker/swarmkit

Drew Erny

commit sha 2c05427332f68c2c1dba58143900e21f23d99de9

Add foundational Volumes code. * Adds the protocol buffer definitions for cluster volumes and CSI support. * Add controlapi and store support for volumes * Add CSI library, and basic test rigging. Test rigging is necessary to ensure that vndr pulls in all of the correct imports to the correct locations. * Make a substantial number of vendoring updates, in order to accomodate a newer version of protobuf required by the CSI library. * Adds a CSIConfig object to the ClusterSpec, which allows a user to specify the available plugins and the location to connect to them. This may or may not be the final API for CSI plugins, but should be adequate for initial testing. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 80c41f0e358886f51651f26969a3549e419d1eb4

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

Drew Erny

commit sha 6d3749f1794880c3612a21ee8c314637152c7d58

Merge pull request #2966 from dperny/swarm-volume-create [feature-volumes] Swarm volume create

view details

push time in 2 months

push eventdocker/swarmkit

Albin Kerouanton

commit sha ee3aa3ca4cc55ccf6c27abc6230ddfa5bbad69be

Add Ulimits to ContainerSpec Unlike Docker and docker-compose, Swarm never supported ulimits. This change introduces a new Ulimits field on the ContainerSpec type. It can be used by API clients to set desired ulimits. This is related to moby/moby#40639. Signed-off-by: Albin Kerouanton <albin@akerouanton.name>

view details

Drew Erny

commit sha d6592ddefd8a5319aadff74c558b816b1a0b2590

Merge pull request #2967 from akerouanton/ulimits Add Ulimits to ContainerSpec

view details

push time in 2 months

PR merged docker/swarmkit

Add Ulimits to ContainerSpec

<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/docker/swarmkit/blob/master/CONTRIBUTING.md

** Make sure all your commits include a signature generated with git commit -s **

For additional information on our contributing process, read our contributing guide https://docs.docker.com/opensource/code/

If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx"

Please provide the following information: -->

- What I did

Unlike Docker and docker-compose, Swarm never supported ulimits.

This change introduces a new Ulimits field on the ContainerSpec type. It can be used by API clients to set desired ulimits.

This is related to moby/moby#40639.

- How to test it

term1$ sudo ./bin/swarmd
term2$ sudo ./bin/swarmctl service create --ulimit=nofile=100 --image debian --command /bin/bash --args="-c" --args="ulimit -n" --name test
term2$ sudo ./bin/swarmctl service logs -f test
test.1@aker❯ 100

- Description for the changelog

Add Ulimits field to the API

+565 -141

3 comments

7 changed files

akerouanton

pr closed time in 2 months

pull request commentdocker/swarmkit

Add Ulimits to ContainerSpec

This looks fine. Swarm is just plumbing this field down, and the hard work is all handled in the Engine.

Nota bene, @akerouanton, that the github.com/swarmkit/agent/exec/dockerapi package is only used for internally testing swarmkit, and you'll need to make identical changes to the github.com/docker/docker/daemon/cluster/executor package.

akerouanton

comment created time in 2 months

push eventdocker/swarmkit

Albin Kerouanton

commit sha 264af21b2ce659bf9d4f689d3156cb938b2aadfe

Fix bad comment on capability_drop field in protobuf def This is a follow-up of #2965. Signed-off-by: Albin Kerouanton <albin@akerouanton.name>

view details

Drew Erny

commit sha 293aa2e66279a930999044cbf6d0e590baac16ff

Merge pull request #2968 from akerouanton/fix-capability-drop-comment Fix bad comment on capability_drop field in protobuf def

view details

push time in 2 months

PR merged docker/swarmkit

Fix bad comment on capability_drop field in protobuf def

This is a follow-up of #2965.

+2 -2

1 comment

2 changed files

akerouanton

pr closed time in 2 months

Pull request review commentdocker/swarmkit

[feature-volumes] Swarm volume create

+package store++import (+	"strings"++	"github.com/docker/swarmkit/api"+	memdb "github.com/hashicorp/go-memdb"+)++const tableVolume = "volume"++func init() {+	register(ObjectStoreConfig{+		Table: &memdb.TableSchema{+			Name: tableVolume,+			Indexes: map[string]*memdb.IndexSchema{+				indexID: {+					Name:    indexID,+					Unique:  true,+					Indexer: api.VolumeIndexerByID{},+				},+				indexName: {+					Name:    indexName,+					Unique:  true,+					Indexer: api.VolumeIndexerByName{},+				},+				indexCustom: {+					Name:         indexCustom,+					Indexer:      api.VolumeCustomIndexer{},+					AllowMissing: true,+				},+				indexVolumeGroup: {+					Name:    indexVolumeGroup,+					Indexer: volumeIndexerByGroup{},+				},+				indexDriver: {+					Name:    indexDriver,+					Indexer: volumeIndexerByDriver{},+				},+			},+		},+		Save: func(tx ReadTx, snapshot *api.StoreSnapshot) error {+			var err error+			snapshot.Volumes, err = FindVolumes(tx, All)+			return err

frankly i haven't the slightest idea. as far as i have been concerned, the only errors that can be returned from FindX functions in the store are those resulting from using invalid By specifiers.

dperny

comment created time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 80c41f0e358886f51651f26969a3549e419d1eb4

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha c7a395c9bdee254013350b164798790857f3f7d5

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

pull request commentdocker/swarmkit

Replace capabilities api

merged

cpuguy83

comment created time in 2 months

push eventdocker/swarmkit

Brian Goff

commit sha 8c7b14d577ac86fc477f596de80c64305704f4e8

Revert "Add capabilities list to container specification" This reverts commit 18d91706dd7b821232140543c214b0429cb6bbc0. This is reverted to replace with a different model since a full cap list specified on clients means the client must specify the base cap spec for nodes it may not know about. Instead we intend to split this into add/drop lists.

view details

Brian Goff

commit sha 06a0d2daa776d8d8e0bcaa7b23ac039d41c53b51

Add support cap add/drop to services. This allows clients to specify capabilities to add or drop form the default capability. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

view details

Drew Erny

commit sha 035d564a3686f5e348d861ec0c074ff26854c498

Merge pull request #2965 from cpuguy83/replace_capabilities_api Replace capabilities api

view details

push time in 2 months

PR merged docker/swarmkit

Replace capabilities api

After extended discussion with moby/moby maintainers, we feel the existing API (not yet released) for supplying a capabilities list is not very useful since it requires clients to know the full list of capabilities, which usually isn't known... so instead we fall back to a default, which is already defined on the engine, but because the client needs to specify the capabilities, the client also has it's own default list.

With the old method, the full cap list is also encoded in the service spec which means the default cannot be changed except by the client.

If the client still wants to define the full list themselves they can --cap-drop ALL -cap-add CAP_FOO

  • This reverts commit 18d9170
  • Replaces 18d9170 with add/drop lists which get passed to the agent to deal with.
+336 -207

11 comments

7 changed files

cpuguy83

pr closed time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha b9fa9717aff98982ce4ace2408f65aeb33569ad0

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha d2f7badcfe9b7734cf065de92d2b580d6af27135

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

push eventdperny/swarmkit-1

Drew Erny

commit sha 888f7fdfe33223bcf6e2bf424bd0e47670ff67e7

Add code for creating volumes Adds code for creating CSI volumes. This includes: * The basic Plugin object, which manages the connection to the CSI plugin * The basic VolumeManager object, which manages plugins and responds to store events This also includes lots of tests and tests rigging, including fake CSI clients. Signed-off-by: Drew Erny <derny@mirantis.com>

view details

push time in 2 months

more