profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/nathanlws/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Nathan Williams nathanlws San Francisco, CA

nuclearsugar/VectorRecursionWorkbench 13

Generate recursively nested polygons as an SVG /// designed with laser cutting in mind

gjvargas/knex 0

A query builder for PostgreSQL, MySQL and SQLite3, designed to be flexible, portable, and fun to use.

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure we never remove a coordinator

+/*+ * foundationdb_status.go+ *+ * This source file is part of the FoundationDB open source project+ *+ * Copyright 2021 Apple Inc. and the FoundationDB project authors+ *+ * Licensed under the Apache License, Version 2.0 (the "License");+ * you may not use this file except in compliance with the License.+ * You may obtain a copy of the License at+ *+ *     http://www.apache.org/licenses/LICENSE-2.0+ *+ * Unless required by applicable law or agreed to in writing, software+ * distributed under the License is distributed on an "AS IS" BASIS,+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+ * See the License for the specific language governing permissions and+ * limitations under the License.+ */++package internal++import fdbtypes "github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta1"++// GetCoordinatorsFromStatus gets the current coordinators from the status+func GetCoordinatorsFromStatus(status *fdbtypes.FoundationDBStatus) map[string]None {+	coordinators := make(map[string]None)++	for _, pInfo := range status.Cluster.Processes {+		for _, roleInfo := range pInfo.Roles {+			if roleInfo.Role != string(fdbtypes.ProcessRoleCoordinator) {+				continue+			}++			coordinators[pInfo.Address] = None{}

That makes sense and simplifies that part.

johscheuer

comment created time in 12 hours

push eventFoundationDB/fdb-kubernetes-operator

Johannes Scheuermann

commit sha da5456858629f906534dd0df589c94c8333ba95f

Allow to downgrade from TLS to non-TLS (#752) * Allow to downgrade from TLS to non-TLS

view details

push time in 13 hours

PR merged FoundationDB/fdb-kubernetes-operator

Allow to downgrade from TLS to non-TLS bug

Fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/751 (at least the first part)

+556 -434

1 comment

9 changed files

johscheuer

pr closed time in 13 hours

issue closedFoundationDB/fdb-kubernetes-operator

Downgrade from TLS to non-TLS is broken

Currently we can consistently destroy clusters that run TLS by disabling TLS. That will lead to the factor that the operator will remove coordinators until the cluster is completely unavailable. In the checkCoordinatorValidity we check if the tls flag is set or removed: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/cbb47704674e2e9886c68dde73de109469a23d74/controllers/cluster_controller.go#L1297-L1299 this only works for the upgrade path but not the downgrade path since the tls flag will always be present if we provide two addresses (one with tls and one without tls) the tls address will be the preferred one and will be set as primary address. The assumption setting the addresses in a specific order is in that case not relevant or actually false (https://github.com/FoundationDB/fdb-kubernetes-operator/blob/cbb47704674e2e9886c68dde73de109469a23d74/api/v1beta1/foundationdbcluster_types.go#L1738-L1758).

There are (at least) 2 things we have to fix:

  1. Rather check the command_line if both addresses are available instead of using the process address.
  2. Ensure that the operator doesn't remove more Pods. The cluster goes unavailable since the recreated Pods will have a new IP address and the operator waits with the coordinator selection until all processes have a non-tls address (which will never happen during the transition).

Here is an example process status (only the relevant fields):

"address" : "1.2.3.4:4500:tls",
"class_source" : "command_line",
"class_type" : "stateless",
"command_line" : "/usr/bin/fdbserver ... --public_address=1.2.3.4:4501,1.2.3.4:4500:tls ...",

closed time in 13 hours

johscheuer

pull request commentFoundationDB/fdb-kubernetes-operator

Allow to downgrade from TLS to non-TLS

Solved the conflict, once all checks are successful I merge it.

johscheuer

comment created time in 13 hours

Pull request review commentFoundationDB/fdb-kubernetes-operator

Allow to define the number of concurrent automatic replacements

 func chooseNewRemovals(cluster *fdbtypes.FoundationDBCluster) bool { 		return false 	} +	// The maximum number of removals will be the defined number in the cluster spec+	// minus all currently ongoing removals e.g. process groups marked fro removal but+	// not fully excluded.+	removalCnt := 0

Sure :)

johscheuer

comment created time in 13 hours

push eventFoundationDB/fdb-kubernetes-operator

Johannes Scheuermann

commit sha 6d9cd1d6fcac0e408a28c8fea196ceb2be808913

Print message for removals in analyze (#755)

view details

push time in 15 hours

PR merged FoundationDB/fdb-kubernetes-operator

Print message for removals in analyze

fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/708 fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/711

+6 -3

0 comment

2 changed files

johscheuer

pr closed time in 15 hours

issue closedFoundationDB/fdb-kubernetes-operator

kubectl analyze should not print error message with auto-fix

When we run kubectl fdb analyze --auto-fix .. we shouldn't print the error message about the cluster has issue since that's unexpected and we should only print errors that happens during the auto-fix apply. Additionally we should print a message when the user confirms the action to show that the auto-fix action is triggered.

closed time in 15 hours

johscheuer

issue closedFoundationDB/fdb-kubernetes-operator

kubectl analyze should also report process groups marked for removal

Currently the analyze command skips process groups that are marked fro removal. Instead of skipping we should print those out as a warning otherwise that's confusing for the user why the cluster is not reconciled but the analyze command doesn't print out any further information.

closed time in 15 hours

johscheuer

PR merged FoundationDB/fdb-kubernetes-operator

Initial design for multi dc/kc FDB clusters with the plugin plugin

Initial design to support multi dc/kc FDB clusters with the plugin see: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/482

+170 -2

0 comment

2 changed files

johscheuer

pr closed time in 15 hours

push eventFoundationDB/fdb-kubernetes-operator

Johannes Scheuermann

commit sha 259e4a50bfc769ac962d7bfdc92daf56dac16b9c

Initial design for multi dc/kc FDB clusters with the plugin (#540) * Initial design for multi dc/kc FDB clusters with the plugin

view details

push time in 15 hours

push eventFoundationDB/fdb-kubernetes-operator

Johannes Scheuermann

commit sha ca5bf0b5a51b6c81b7b70b05b0ae69fafc2a99a7

Ensure that the coordinator selection is deterministic (#747)

view details

push time in 15 hours

PR merged FoundationDB/fdb-kubernetes-operator

Ensure that the coordinator selection is deterministic

Fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/718

The sorting makes the coordinator selection more expensive but I think that's worth the deterministic output and also we only select new coordinators when a coordinator is not available.

+139 -24

0 comment

3 changed files

johscheuer

pr closed time in 15 hours

issue closedFoundationDB/fdb-kubernetes-operator

selectCoordinators should be deterministic

Currently selectCoordinators in the https://github.com/FoundationDB/fdb-kubernetes-operator/blob/master/controllers/change_coordinators.go is not deterministic and returns different coordinators for multiple runs (even when the state is unchanged). That is not a problem currently since the operator only selects new coordinators when the old ones are not valid anymore. We should still change the code to always return a deterministic set of coordinators.

closed time in 15 hours

johscheuer

Pull request review commentFoundationDB/fdb-kubernetes-operator

Allow to define the number of concurrent automatic replacements

 func chooseNewRemovals(cluster *fdbtypes.FoundationDBCluster) bool { 		return false 	} +	// The maximum number of removals will be the defined number in the cluster spec+	// minus all currently ongoing removals e.g. process groups marked fro removal but+	// not fully excluded.+	removalCnt := 0

Can we change this to removalCount? My brain consistently parses the word cnt as expanding to something else, so I would rather we spell it out 😊

johscheuer

comment created time in 3 days

push eventFoundationDB/fdb-document-layer

netop://ウエハ

commit sha 421b2ca3f2140da40f4e89f7ea59d2a347818008

Updates configuration link 404 occurs otherwise

view details

John Brownlee

commit sha c94f55fdc21444d75831a0e39549b08470d09a30

Merge pull request #224 from NetOperatorWibby/patch-1 Updates configuration link

view details

push time in 3 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure we never remove a coordinator

+/*+ * foundationdb_status.go+ *+ * This source file is part of the FoundationDB open source project+ *+ * Copyright 2021 Apple Inc. and the FoundationDB project authors+ *+ * Licensed under the Apache License, Version 2.0 (the "License");+ * you may not use this file except in compliance with the License.+ * You may obtain a copy of the License at+ *+ *     http://www.apache.org/licenses/LICENSE-2.0+ *+ * Unless required by applicable law or agreed to in writing, software+ * distributed under the License is distributed on an "AS IS" BASIS,+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+ * See the License for the specific language governing permissions and+ * limitations under the License.+ */++package internal++import fdbtypes "github.com/FoundationDB/fdb-kubernetes-operator/api/v1beta1"++// GetCoordinatorsFromStatus gets the current coordinators from the status+func GetCoordinatorsFromStatus(status *fdbtypes.FoundationDBStatus) map[string]None {+	coordinators := make(map[string]None)++	for _, pInfo := range status.Cluster.Processes {+		for _, roleInfo := range pInfo.Roles {+			if roleInfo.Role != string(fdbtypes.ProcessRoleCoordinator) {+				continue+			}++			coordinators[pInfo.Address] = None{}

Since we've got the full process information here, I wonder if it would be identify the coordinators by process group ID rather than the address.

johscheuer

comment created time in 3 days

release yshavit/whatdid

v0.0.1-alpha.1

released time in 3 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure that the coordinator selection is deterministic

 type localityInfo struct {  	// The locality map. 	LocalityData map[string]string++	Class fdbtypes.ProcessClass+}++// These indexes are used for sorting and since we sort ascending+func getClassIndex(cls fdbtypes.ProcessClass) int {+	switch cls {

You're right, since we're sorting by IP we don't have a risk of non-determinism.

johscheuer

comment created time in 4 days

PR opened FoundationDB/fdb-kubernetes-operator

Reviewers
Print message for removals in analyze

fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/708

+6 -3

0 comment

2 changed files

pr created time in 4 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Allow to define the number of concurrent automatic replacements

 func chooseNewRemovals(cluster *fdbtypes.FoundationDBCluster) bool { 		return false 	} +	// The maximum number of removals will be the defined number in the cluster spec+	// minus all currently ongoing removals e.g. process groups marked fro removal but+	// not fully excluded.+	removalCnt := 0 	for _, processGroupStatus := range cluster.Status.ProcessGroups { 		if processGroupStatus.Remove && !processGroupStatus.Excluded { 			// If we already have a removal in-flight, we should not try 			// replacing more failed pods.-			return false+			removalCnt++ 		} 	}+	maxReplacements := cluster.GetMaxConcurrentReplacements() - removalCnt +	hasReplacement := false 	for _, processGroupStatus := range cluster.Status.ProcessGroups {+		if maxReplacements == 0 {
		if maxReplacements <= 0 {

There are cases where we have a negative number --> Add a test case for that.

johscheuer

comment created time in 4 days

PR opened FoundationDB/fdb-document-layer

Updates configuration link

404 occurs otherwise

+3 -3

0 comment

1 changed file

pr created time in 5 days

PR opened FoundationDB/fdb-kubernetes-operator

Allow to define the number of concurrent automatic replacements

Fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/604 Fixes: https://github.com/FoundationDB/fdb-kubernetes-operator/issues/695

+71 -5

0 comment

6 changed files

pr created time in 5 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure we never remove a coordinator

 func (clusterStatus *FoundationDBClusterStatus) AddStorageServerPerDisk(serversP  	clusterStatus.StorageServersPerDisk = append(clusterStatus.StorageServersPerDisk, serversPerDisk) }++// None represents an empty struct used for creating set where we don't care about the value.+type None struct{}++// GetCoordinatorSet returns the current coordinator addresses as a set based on the information+// in the connection string of the FoundationDBClusterStatus+func (cluster FoundationDBCluster) GetCoordinatorSet() map[string]None {+	coordinators := make(map[string]None)+	// Split the connection string to get only the addresses after the @+	conSplit := strings.Split(cluster.Status.ConnectionString, "@")

That sounds reasonable, my idea was to prevent to have to many status calls against the FDB cluster but I guess that's a valid case.

johscheuer

comment created time in 6 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure that the coordinator selection is deterministic

 func chooseDistributedProcesses(processes []localityInfo, count int, constraint 		currentLimits[field] = 1 	} +	// Sort the processes to ensure a deterministic result+	sortLocalities(processes)+ 	for len(chosen) < count { 		choseAny := false  		for _, process := range processes {-			if !chosenIDs[process.ID] {

I only reduced the nesting to make it more readable.

johscheuer

comment created time in 6 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure that the coordinator selection is deterministic

 type localityInfo struct {  	// The locality map. 	LocalityData map[string]string++	Class fdbtypes.ProcessClass+}++// These indexes are used for sorting and since we sort ascending+func getClassIndex(cls fdbtypes.ProcessClass) int {+	switch cls {

So there are two things we need to keep in mind:

1.) Currently the coordinator selection will try storage -> log -> transaction processes all other processes are currently ignore so in theory the default case should never happen. 2.) If the default case will happen this only means that all processes that are not storage/log/transaction will have the least preference (have the highest index) and all processes in that "bucket" will be sorted by their ID so the behaviour is deterministic.

johscheuer

comment created time in 6 days

issue commentFoundationDB/fdb-kubernetes-operator

Use resolver in process counts leads to an error

That's a fair point. I'll create a PR with the deprecation message and the warning that the setting doesn't work.

johscheuer

comment created time in 6 days

Pull request review commentFoundationDB/fdb-kubernetes-operator

Ensure we never remove a coordinator

 func (clusterStatus *FoundationDBClusterStatus) AddStorageServerPerDisk(serversP  	clusterStatus.StorageServersPerDisk = append(clusterStatus.StorageServersPerDisk, serversPerDisk) }++// None represents an empty struct used for creating set where we don't care about the value.+type None struct{}++// GetCoordinatorSet returns the current coordinator addresses as a set based on the information+// in the connection string of the FoundationDBClusterStatus+func (cluster FoundationDBCluster) GetCoordinatorSet() map[string]None {+	coordinators := make(map[string]None)+	// Split the connection string to get only the addresses after the @+	conSplit := strings.Split(cluster.Status.ConnectionString, "@")

We could also look at the process roles so we can check this by instance ID instead of using the address as an indirection. That may be more future-proof.

johscheuer

comment created time in 6 days