profile
viewpoint
Brennan Saeta saeta Google Tech lead & manager for the Swift for TensorFlow (S4TF) team in Google Brain.

apple/swift 51789

The Swift Programming Language

fastai/fastai_dev 549

fast.ai early development experiments

google/swift-benchmark 478

A swift library to benchmark code snippets.

fastai/swiftai 393

Swift for TensorFlow's high-level API, modeled after fastai

apple/llvm-project 194

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. This fork is used to manage Apple’s stable releases of Clang as well as support the Swift project.

google-research/swift-tfp 129

Find shape errors before you run your code!

saeta/penguin 58

A suite of libraries for data science & high performance computation in Swift

saeta/play-deb-packaging 25

A souce snapshot of a sample Play! app with SBT packaging.

saeta/CS-249A-Reimplementation-In-Scala 10

I am simply reimplementing an 8000 line C++ Project for CS 249A @ Stanford in Scala to learn about scala, and to experiment with ideas in software engineering.

issue commentdeepmind/open_spiel

Breakthrough Swift implementation is slow

Thanks @sbodenstein for noticing and inquiring about this! I've put together https://github.com/deepmind/open_spiel/pull/230 which helps a bit. I suspect there's a fair bit more that's possible to improve Breakthrough's performance.

Please do let me know if you'd be interested in further optimizations!

sbodenstein

comment created time in 4 hours

PR opened deepmind/open_spiel

Improve Swift Breakthrough Performance

This PR defines some benchmarks for Swift OpenSpiel, and modestly improves Swift's Breakthrough performance.

On my machine, performance improves by ~1.65x based on the benchmarks:

Before:

name                      time        std        iterations
-----------------------------------------------------------
random game: Breakthrough 173380.0 ns ±  24.30 %       7276

After:

name                      time        std        iterations
-----------------------------------------------------------
random game: Breakthrough 104894.5 ns ±  24.74 %      13624

I suspect future optimizations are quite possible. In particular, I suspect flattening out State.board from being: Array<Optional<BreakthroughPlayer>> to Array<FlattenedEnum> might help, but have not tried that yet.

Issue: https://github.com/deepmind/open_spiel/issues/228

+110 -0

0 comment

6 changed files

pr created time in 4 hours

create barnchsaeta/open_spiel

branch : improve-breakout-perf

created branch time in 5 hours

pull request commentsaeta/penguin

Remove the `GraphDistanceMeasure` protocol, as it is no longer needed.

Note the base branch! This is a follow-on to https://github.com/saeta/penguin/pull/45 and so for now, I've pointed the base branch to redo-heaps. But once https://github.com/saeta/penguin/pull/45 goes in, I'll merge master into this branch, and switch the PR to base off of master.

saeta

comment created time in 12 hours

PR opened saeta/penguin

Reviewers
Remove the `GraphDistanceMeasure` protocol, as it is no longer needed.

This is a follow-on to https://github.com/saeta/penguin/pull/45.

+103 -59

0 comment

4 changed files

pr created time in 12 hours

create barnchsaeta/penguin

branch : remove-graphdistancemeasure

created branch time in 12 hours

push eventsaeta/penguin

Brennan Saeta

commit sha ec71a0162b5eb5c7e5d2d60f4b00b7aff77a169f

Update comments & sipmlify API for Dijkstra's search.

view details

push time in 14 hours

push eventsaeta/penguin

Brennan Saeta

commit sha f0a1c6863dd9ac69d923a62559636bc29b29c146

Add tests & fix-up existing tests to work with the new API.

view details

push time in 15 hours

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.

Oh, apologies for missing that, and for only seeing your comment now; https://github.com/saeta/penguin/pull/46 for the fix (LMK if I missed something else)!

For context, I often choose to re-type the suggestions out. I find it helps me internalize them and think about them more deeply. I will try and be more careful about not accidentally truncating them in the future!

saeta

comment created time in a day

PR opened saeta/penguin

Reviewers
Add truncated doc comment.

Thanks for catching this @dabrahams!

+6 -3

0 comment

1 changed file

pr created time in a day

create barnchsaeta/penguin

branch : improve-doc-comment

created branch time in a day

push eventsaeta/penguin

Brennan Saeta

commit sha 1fe092ac00a5691d8f5892df83f12ac2884393c8

Rewrite the heap and related priority queue structures. This change moves the heap related algorithms to operate generically on `Collection` (and its various refinements). In doing so, the algorithms become far more general and reusable. Further, we re-build the PriorityQueue type, solving the indexing TODO's that plagued the previous implementation. Finally, we update all the rest of the code to use the new API.

view details

push time in a day

PR opened saeta/penguin

Reviewers
Rewrite the heap and related priority queue structures.

This change moves the heap related algorithms to operate generically on Collection (and its various refinements). In doing so, the algorithms become far more general and reusable.

Further, we re-build the PriorityQueue type, solving the indexing TODO's that plagued the previous implementation.

Finally, we update all the rest of the code to use the new API.

+447 -266

0 comment

6 changed files

pr created time in a day

create barnchsaeta/penguin

branch : redo-heaps

created branch time in a day

delete branch google/swift-benchmark

delete branch : remove-warning

delete time in a day

pull request commentgoogle/swift-benchmark

Silence warning.

hehe; no worries! (@dabrahams fixed up one of my warnings a while back, so I can now pay it forward.)

saeta

comment created time in a day

PR opened google/swift-benchmark

Silence warning.

Previously, a warning was emitted when compiling in non-release builds. This change fixes the warning.

+2 -1

0 comment

1 changed file

pr created time in 2 days

create barnchgoogle/swift-benchmark

branch : remove-warning

created branch time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

Dave Abrahams

commit sha e43e285bbbd8e2960b5c2645905c9c92a8a53294

Tuples (#28) Thanks to @saeta for suggestions and fixes. Co-authored-by: Brennan Saeta <brennan.saeta@gmail.com>

view details

Brennan Saeta

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

Brennan Saeta

commit sha b450bc7f0d4a513d0a7140ccb17dacfade4285d5

Merge branch 'master' into unmanaged-buffer

view details

push time in 2 days

pull request commentsaeta/penguin

Ensure grain size is >= 1 & reduce duplication in `parallelFor`.

https://github.com/saeta/penguin/issues/33

saeta

comment created time in 2 days

pull request commentsaeta/penguin

Copies references to data structures to avoid extra ARC traffic.

https://github.com/saeta/penguin/issues/33

saeta

comment created time in 2 days

pull request commentsaeta/penguin

Implement `UnmanagedBuffer` and switch `TaskDeque` to use it.

https://github.com/saeta/penguin/issues/33

saeta

comment created time in 2 days

create barnchsaeta/penguin

branch : reduce-dupes-in-parallel-for

created branch time in 2 days

PR opened saeta/penguin

Copies references to data structures to avoid extra ARC traffic.

Note: this change (by itself) does not reduce ARC traffic, but in concert with UnmanagedBuffer (#41), we see a ~15x performance improvement in the parallelFor benchmark.

+32 -23

0 comment

1 changed file

pr created time in 2 days

create barnchsaeta/penguin

branch : threadpool-reduce-arc-traffic

created branch time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

Dave Abrahams

commit sha e43e285bbbd8e2960b5c2645905c9c92a8a53294

Tuples (#28) Thanks to @saeta for suggestions and fixes. Co-authored-by: Brennan Saeta <brennan.saeta@gmail.com>

view details

Brennan Saeta

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

Brennan Saeta

commit sha 7324d367f5e3b5ebc0a079720e6b00224fb31701

Merge remote-tracking branch 'origin/master' into nicer-numbers

view details

push time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 9dbb65c7a7232c92e21cfb9bdd10719c8749063a

Respond to comments.

view details

push time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import PenguinStructures+import XCTest++final class NumberOperationsTests: XCTestCase {

Fixed! Including some wonky ones. LMK if you have some other good suggestions!

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

 public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr     let totalThreadCount = threadCount + externalFastPathThreadCount     self.totalThreadCount = totalThreadCount     self.externalFastPathThreadCount = externalFastPathThreadCount-    self.coprimes = positiveCoprimes(totalThreadCount)+    self.coprimes = Array(totalThreadCount.positiveCoprimes)

Fair enough, and to take it one step better, I've added a doc comment. :-)

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+  /// The number to find co-primes relative to.+  let n: Number++  /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+  internal init(_ n: Number) {+    precondition(n > 0, "\(n) doees not have defined positive co-primes.")+    self.n = n+  }++  /// Returns an iterator that incrementally computes co-primes relative to `n`.+  public func makeIterator() -> Iterator {+    Iterator(n: n, i: 0)+  }++  /// Iteratively computes co-primes relative to `n` starting from 1.+  public struct Iterator: IteratorProtocol {+    /// The number we are finding co-primes relative to.+    let n: Number+    /// A sequence counter representing one less than the next candidate to try.+    var i: Number++    /// Returns the next co-prime, or nil if all co-primes have been found.+    mutating public func next() -> Number? {+      while (i+1) < n {+        i += 1+        if greatestCommonDivisor(i, n) == 1 { return i }+      }+      return nil+    }+  }+}++/// Returns the greatest common divisor between two numbers.+///+/// This implementation uses Euclid's algorithm.+// TODO: Switch to the Binary GCD algorithm which avoids expensive modulo operations.+public func greatestCommonDivisor<Number: BinaryInteger>(_ a: Number, _ b: Number) -> Number {

👍

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+  /// The number to find co-primes relative to.+  let n: Number++  /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+  internal init(_ n: Number) {+    precondition(n > 0, "\(n) doees not have defined positive co-primes.")+    self.n = n+  }++  /// Returns an iterator that incrementally computes co-primes relative to `n`.+  public func makeIterator() -> Iterator {+    Iterator(n: n, i: 0)+  }++  /// Iteratively computes co-primes relative to `n` starting from 1.+  public struct Iterator: IteratorProtocol {+    /// The number we are finding co-primes relative to.+    let n: Number+    /// A sequence counter representing one less than the next candidate to try.+    var i: Number++    /// Returns the next co-prime, or nil if all co-primes have been found.+    mutating public func next() -> Number? {+      while (i+1) < n {+        i += 1+        if greatestCommonDivisor(i, n) == 1 { return i }+      }+      return nil+    }+  }+}++/// Returns the greatest common divisor between two numbers.+///+/// This implementation uses Euclid's algorithm.+// TODO: Switch to the Binary GCD algorithm which avoids expensive modulo operations.

Fixed up by:

  • Good point; updated doc comment.
  • Removed the documented algorithm.
  • Documented the complexity.
  • Appropriately handle negative numbers (because gcd is defined for them).
saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+  /// The number to find co-primes relative to.+  let n: Number++  /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+  internal init(_ n: Number) {+    precondition(n > 0, "\(n) doees not have defined positive co-primes.")

Good catch! I think the right answer from a mathematical perspective is to convert to positive, and go from there. (Coprimes are symmetric around 0, so I don't think we need to define the corresponding negative coprimes sequence as a type, but it might be good to define a .lazy.map { $0 * -1 } extension... WDYT?

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {

I'd rather instead of making it conform to collection instead make it a sequence. Concretely, we know there's infinitely many primes, and there's no reason why we should stop at the number itself, so I propose that this actually be an (infinite) sequence instead.

I also dislike the collection protocol here because the size of the collection is unknown until it's been computed, which reduces the value of it being a constant storage abstraction.

I like Domain, but I dislike limit, and think n (or perhaps b) is more appropriate appropriate.

Edit: I'm fine with target (as you suggested), and have made the corresponding edits, but I'm still not fully convinced this is the right choice...

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+  /// The number to find co-primes relative to.+  let n: Number++  /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+  internal init(_ n: Number) {+    precondition(n > 0, "\(n) doees not have defined positive co-primes.")+    self.n = n+  }++  /// Returns an iterator that incrementally computes co-primes relative to `n`.+  public func makeIterator() -> Iterator {+    Iterator(n: n, i: 0)+  }++  /// Iteratively computes co-primes relative to `n` starting from 1.+  public struct Iterator: IteratorProtocol {+    /// The number we are finding co-primes relative to.+    let n: Number+    /// A sequence counter representing one less than the next candidate to try.+    var i: Number++    /// Returns the next co-prime, or nil if all co-primes have been found.+    mutating public func next() -> Number? {+      while (i+1) < n {+        i += 1+        if greatestCommonDivisor(i, n) == 1 { return i }+      }+      return nil+    }+  }+}

I like a lot of these suggestions. I've had to tweak them to apply to the new definition of the PositiveCoprimes sequence.

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }

👍

saeta

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+  /// Returns a sequence of the positive integers that are co-prime with `self`.+  ///+  /// Definition: Two numbers are co-prime if their GCD is 1.+  public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.

Edited text based on your input and in light of the discussion below about whether this should even be limited to smaller-than-n.

saeta

comment created time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

push time in 2 days

delete branch saeta/penguin

delete branch : reorder-tests

delete time in 2 days

PR merged saeta/penguin

Alphabetize test suites.
+4 -3

0 comment

1 changed file

saeta

pr closed time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 675e1d6205edba6b637e416bf230f899024c1dae

Update Tests/PenguinStructuresTests/XCTestManifests.swift Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

push time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 9e31153761351c76b153b9893808e447998e6e8d

Implement `UnmanagedBuffer` and switch `TaskDeque` to use it. Previously, `TaskDeque` was implemented in terms of `ManagedBuffer`. While `ManagedBuffer` implements the semantics we'd like, it is implemented as a class. This can induce a significant amount of reference couting traffic, especially when stored in `Array`s. `UnmanagedBuffer` implements a similar interface to `ManagedBuffer`, but instead uses manual pointer allocation and management. This allows us to avoid all reference counting traffic, at the cost of requiring explicit destruction. Switching `TaskDeque` from `ManagedBuffer` to `UnmanagedBuffer` yields between a 2x and 6x performance improvemenet for key workloads that stress the `TaskDeque` data structure within the `NonBlockingThreadPool`. Below are performance numbers across 2 machines & operating systems demonstrating performance improvements. Note: because this change has been extracted from a stack of related performance improvements, if you benchmark this PR itself, you will not see the expected performance improvements. Instead, this PR has been separated out to facilitate easier reviewing. Benchmark numbers on machine A: ------------------------------- Before (from previous commit): ``` name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 ``` After: ``` name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 429.0 ns ± 59095.36128834737 223050 NonBlockingThreadPool: join, two levels 1270.0 ns ± 101587.48601579959 64903 NonBlockingThreadPool: join, three levels 3098.0 ns ± 165407.1669656578 28572 NonBlockingThreadPool: join, four levels, three on thread pool thread 3990.5 ns ± 227217.34017343252 10000 NonBlockingThreadPool: parallel for, one level 16853.0 ns ± 260015.39296821563 8660 NonBlockingThreadPool: parallel for, two levels 563926.0 ns ± 609298.6358076902 2189 ``` Benchmark numbers from machine B: --------------------------------- Before: ``` name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 3022.0 ns ± 366686.3050127019 21717 NonBlockingThreadPool: join, two levels 13313.5 ns ± 550429.476815564 5970 NonBlockingThreadPool: join, three levels 39009.5 ns ± 716172.9687807652 3546 NonBlockingThreadPool: join, four levels, three on thread pool thread 341631.0 ns ± 767483.9227743072 2367 NonBlockingThreadPool: parallel for, one level 404375.0 ns ± 590178.6724299589 3123 NonBlockingThreadPool: parallel for, two levels 1000872.0 ns ± 1592704.2766365155 805 ``` After: ``` name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 749.0 ns ± 174096.69284101788 91247 NonBlockingThreadPool: join, two levels 12046.5 ns ± 414670.5686344325 5920 NonBlockingThreadPool: join, three levels 46975.0 ns ± 543858.2306554643 3561 NonBlockingThreadPool: join, four levels, three on thread pool thread 559837.0 ns ± 591477.1893574063 2795 NonBlockingThreadPool: parallel for, one level 66446.0 ns ± 627245.5098742851 2236 NonBlockingThreadPool: parallel for, two levels 1668739.0 ns ± 1536323.375783659 765 ```

view details

push time in 2 days

PR opened saeta/penguin

Implement `UnmanagedBuffer` and switch `TaskDeque` to use it.

Previously, TaskDeque was implemented in terms of ManagedBuffer. While ManagedBuffer implements the semantics we'd like, it is implemented as a class. This can induce a significant amount of reference couting traffic, especially when stored in Arrays.

UnmanagedBuffer implements a similar interface to ManagedBuffer, but instead uses manual pointer allocation and management. This allows us to avoid all reference counting traffic, at the cost of requiring explicit destruction.

Switching TaskDeque from ManagedBuffer to UnmanagedBuffer yields between a 2x and 6x performance improvemenet for key workloads that stress the TaskDeque data structure within the NonBlockingThreadPool.

Below are performance numbers across 2 machines & operating systems demonstrating performance improvements. Note: because this change has been extracted from a stack of related performance improvements, if you benchmark this PR itself, you will not see the expected performance improvements. Instead, this PR has been separated out to facilitate easier reviewing.

Benchmark numbers on machine A:

Before (from previous commit):

name                                                                   time         std                   iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 700.0 ns     ± 70289.84998218225   127457
NonBlockingThreadPool: join, two levels                                2107.0 ns    ± 131041.5070696377   31115
NonBlockingThreadPool: join, three levels                              4960.0 ns    ± 178122.9562964306   15849
NonBlockingThreadPool: join, four levels, three on thread pool thread  5893.0 ns    ± 224021.47900401088  13763
NonBlockingThreadPool: parallel for, one level                         22420.0 ns   ± 203689.69689780468  7581
NonBlockingThreadPool: parallel for, two levels                        500985.5 ns  ± 642136.0139757036   1390

After:

name                                                                   time         std                   iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 429.0 ns     ± 59095.36128834737   223050
NonBlockingThreadPool: join, two levels                                1270.0 ns    ± 101587.48601579959  64903
NonBlockingThreadPool: join, three levels                              3098.0 ns    ± 165407.1669656578   28572
NonBlockingThreadPool: join, four levels, three on thread pool thread  3990.5 ns    ± 227217.34017343252  10000
NonBlockingThreadPool: parallel for, one level                         16853.0 ns   ± 260015.39296821563  8660
NonBlockingThreadPool: parallel for, two levels                        563926.0 ns  ± 609298.6358076902   2189

Benchmark numbers from machine B:

Before:

name                                                                   time          std                   iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 3022.0 ns     ± 366686.3050127019   21717
NonBlockingThreadPool: join, two levels                                13313.5 ns    ± 550429.476815564    5970
NonBlockingThreadPool: join, three levels                              39009.5 ns    ± 716172.9687807652   3546
NonBlockingThreadPool: join, four levels, three on thread pool thread  341631.0 ns   ± 767483.9227743072   2367
NonBlockingThreadPool: parallel for, one level                         404375.0 ns   ± 590178.6724299589   3123
NonBlockingThreadPool: parallel for, two levels                        1000872.0 ns  ± 1592704.2766365155  805

After:

name                                                                   time          std                   iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 749.0 ns      ± 174096.69284101788  91247
NonBlockingThreadPool: join, two levels                                12046.5 ns    ± 414670.5686344325   5920
NonBlockingThreadPool: join, three levels                              46975.0 ns    ± 543858.2306554643   3561
NonBlockingThreadPool: join, four levels, three on thread pool thread  559837.0 ns   ± 591477.1893574063   2795
NonBlockingThreadPool: parallel for, one level                         66446.0 ns    ± 627245.5098742851   2236
NonBlockingThreadPool: parallel for, two levels                        1668739.0 ns  ± 1536323.375783659   765
+80 -74

0 comment

4 changed files

pr created time in 2 days

create barnchsaeta/penguin

branch : unmanaged-buffer

created branch time in 2 days

Pull request review commentsaeta/penguin

Tuples

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//    https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.+//++/// Generalized algebraic product types+///+/// Swift's built-in tuple types are algebraic product types, but since they are+/// not nominal and not easily decomposed, they don't lend themselves to many+/// types of useful processing.  Models of `TupleProtocol` don't have those+/// problems.+public protocol TupleProtocol {+  /// The type of the first element.+  associatedtype Head++  /// An algebriac product formed by composing the types of all but the first+  /// element.+  associatedtype Tail: TupleProtocol++  /// The first element.+  var head: Head { get set }+  +  /// All elements but the first.+  var tail: Tail { get set }++  /// The number of elements+  static var count: Int { get }+}++extension Empty: TupleProtocol {+  /// The first element, when `self` is viewed as an instance of algebraic+  /// product type.+  public var head: Never { get {  fatalError() } set {  } }+  +  /// All elements but the first, when `self` is viewed as an instance of+  /// algebraic product type.+  public var tail: Self { get { self } set { } }++  /// The number of elements, when `self` is viewed as an instance of+  /// algebraic product type.+  public static var count: Int { 0 }+}++/// An algebraic product type whose first element is of type `Head` and+/// whose remaining elements can be stored in `Tail`.+public struct Tuple<Head, Tail: TupleProtocol>: TupleProtocol {+  /// The first element.+  public var head: Head+  +  /// All elements but the first.+  public var tail: Tail+  +  /// The number of elements+  public static var count: Int { Tail.count + 1 }+}++extension Tuple: DefaultInitializable+  where Head: DefaultInitializable, Tail: DefaultInitializable+{+  // Initialize `self`.+  public init() {+    head = Head()+    tail = Tail()+  }+}++extension Tuple: Equatable where Head: Equatable, Tail: Equatable {}+extension Tuple: Hashable where Head: Hashable, Tail: Hashable {}+extension Tuple: Comparable where Head: Comparable, Tail: Comparable {+  public static func < (lhs: Self, rhs: Self) -> Bool {+    if lhs.head < rhs.head { return true }+    if lhs.head > rhs.head { return false }+    return lhs.tail < rhs.tail+  }+}++private let prefixLength = "Tuple(".count++extension Tuple: CustomStringConvertible {+  public var description: String {+    if Tail.self == Empty.self {+      return "Tuple(\(String(reflecting:head )))"
      return "Tuple(\(String(reflecting: head)))"
dabrahams

comment created time in 2 days

Pull request review commentsaeta/penguin

Tuples

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//    https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import XCTest+import PenguinStructures++// ************************************************************************+// A demonstration that we can write straightforward algorithms over tuples+// ************************************************************************++/// Function-like types that can be passed any argument type.+protocol GenericFunction {+  associatedtype Result+  func callAsFunction<T>(_: T) -> Result+}++/// Returns `[T].self`, given an argument of type `T`+struct ArrayType: GenericFunction {+  func callAsFunction<T>(_ x: T) -> Any.Type {+    [T].self+  }+}++struct X<T>: DefaultInitializable, Equatable {}++extension TupleProtocol {+  /// Returns `start` reduced via `reducer` into the result of calling `mapper` +  /// on each element.+  func mapReduce<M: GenericFunction, R>(+    _ start: R, _ mapper: M, _ reducer: (R, M.Result)->R+  ) -> R {+    if Self.self == Empty.self { return start }+    let next = reducer(start, mapper(head))+    return tail.mapReduce(next, mapper, reducer)+  }+}++class TupleTests: XCTestCase {+  func test_mapReduce() {+    XCTAssertEqual(+      Tuple(2, 3.4, "Bar").mapReduce([], ArrayType()) { $0 + [$1] }+        .lazy.map(ObjectIdentifier.init),+      [[Int].self, [Double].self, [String].self]+        .lazy.map(ObjectIdentifier.init)+    )+  }++  func test_head() {+    XCTAssertEqual(Tuple(0).head, 0)+    XCTAssertEqual(Tuple(0.0, 1).head, 0.0)+    XCTAssertEqual(Tuple("foo", 0.0, 1).head, "foo")+  }+  +  func test_tail() {+    XCTAssertEqual(Tuple(0).tail, Empty())+    XCTAssertEqual(Tuple(0.0, 1).tail, Tuple(1))+    XCTAssertEqual(Tuple("foo", 0.0, 1).tail, Tuple(0.0, 1))+  }++  func test_count() {+    typealias I = Int+    XCTAssertEqual(Tuple0.count, 0)+    XCTAssertEqual(Tuple1<I>.count, 1)+    XCTAssertEqual(Tuple2<I, I>.count, 2)+    XCTAssertEqual(Tuple3<I, I, I>.count, 3)+  }++  func test_DefaultInitializable() {+    XCTAssertEqual(Tuple0(), Empty())+    XCTAssertEqual(Tuple1<X<Int>>(), Tuple(.init()))+    XCTAssertEqual(Tuple2<X<Int>, X<String>>(), Tuple(.init(), .init()))+  }+  +  func test_Equatable() {+    XCTAssertEqual(Tuple0(), Tuple0())+    +    XCTAssertEqual(Tuple(1), Tuple(1))+    XCTAssertNotEqual(Tuple(1), Tuple(2))+    +    XCTAssertEqual(Tuple(1, 2), Tuple(1, 2))+    XCTAssertNotEqual(Tuple(0, 2), Tuple(1, 2))+    XCTAssertNotEqual(Tuple(1, "2"), Tuple(1, "XXX"))+    +    XCTAssertEqual(Tuple(1, 2.3, "4"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(1, 2.3, "5"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(1, 2.9, "4"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(0, 2.3, "4"), Tuple(1, 2.3, "4"))+    // This effectively tests Hashable too; we know the conformance synthesizer

I'll take your word on this one...

dabrahams

comment created time in 2 days

Pull request review commentsaeta/penguin

Tuples

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//    https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import XCTest+import PenguinStructures++// ************************************************************************+// A demonstration that we can write straightforward algorithms over tuples+// ************************************************************************++/// Function-like types that can be passed any argument type.+protocol GenericFunction {+  associatedtype Result+  func callAsFunction<T>(_: T) -> Result+}++/// Returns `[T].self`, given an argument of type `T`+struct ArrayType: GenericFunction {+  func callAsFunction<T>(_ x: T) -> Any.Type {+    [T].self+  }+}++struct X<T>: DefaultInitializable, Equatable {}++extension TupleProtocol {+  /// Returns `start` reduced via `reducer` into the result of calling `mapper` +  /// on each element.+  func mapReduce<M: GenericFunction, R>(+    _ start: R, _ mapper: M, _ reducer: (R, M.Result)->R+  ) -> R {+    if Self.self == Empty.self { return start }+    let next = reducer(start, mapper(head))+    return tail.mapReduce(next, mapper, reducer)+  }+}++class TupleTests: XCTestCase {+  func test_mapReduce() {+    XCTAssertEqual(+      Tuple(2, 3.4, "Bar").mapReduce([], ArrayType()) { $0 + [$1] }+        .lazy.map(ObjectIdentifier.init),+      [[Int].self, [Double].self, [String].self]+        .lazy.map(ObjectIdentifier.init)+    )+  }++  func test_head() {+    XCTAssertEqual(Tuple(0).head, 0)+    XCTAssertEqual(Tuple(0.0, 1).head, 0.0)+    XCTAssertEqual(Tuple("foo", 0.0, 1).head, "foo")+  }+  +  func test_tail() {+    XCTAssertEqual(Tuple(0).tail, Empty())+    XCTAssertEqual(Tuple(0.0, 1).tail, Tuple(1))+    XCTAssertEqual(Tuple("foo", 0.0, 1).tail, Tuple(0.0, 1))+  }++  func test_count() {+    typealias I = Int+    XCTAssertEqual(Tuple0.count, 0)+    XCTAssertEqual(Tuple1<I>.count, 1)+    XCTAssertEqual(Tuple2<I, I>.count, 2)+    XCTAssertEqual(Tuple3<I, I, I>.count, 3)+  }++  func test_DefaultInitializable() {+    XCTAssertEqual(Tuple0(), Empty())+    XCTAssertEqual(Tuple1<X<Int>>(), Tuple(.init()))+    XCTAssertEqual(Tuple2<X<Int>, X<String>>(), Tuple(.init(), .init()))+  }+  +  func test_Equatable() {+    XCTAssertEqual(Tuple0(), Tuple0())+    +    XCTAssertEqual(Tuple(1), Tuple(1))+    XCTAssertNotEqual(Tuple(1), Tuple(2))+    +    XCTAssertEqual(Tuple(1, 2), Tuple(1, 2))+    XCTAssertNotEqual(Tuple(0, 2), Tuple(1, 2))+    XCTAssertNotEqual(Tuple(1, "2"), Tuple(1, "XXX"))+    +    XCTAssertEqual(Tuple(1, 2.3, "4"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(1, 2.3, "5"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(1, 2.9, "4"), Tuple(1, 2.3, "4"))+    XCTAssertNotEqual(Tuple(0, 2.3, "4"), Tuple(1, 2.3, "4"))+    // This effectively tests Hashable too; we know the conformance synthesizer+    // is considering all the fields in order.+  }++  func test_Comparable() {+    XCTAssertLessThanOrEqual(Tuple0(), Tuple0())+    XCTAssertGreaterThanOrEqual(Tuple0(), Tuple0())+    +    XCTAssertLessThan(Tuple(1), Tuple(2))+    // Consistency with equality+    XCTAssertLessThanOrEqual(Tuple(2), Tuple(2))+    XCTAssertGreaterThanOrEqual(Tuple(2), Tuple(2))++    XCTAssertLessThan(Tuple(1, 0.1), Tuple(2, 1.1))+    XCTAssertLessThan(Tuple(1, 1.1), Tuple(2, 1.1))+    XCTAssertLessThan(Tuple(2, 0.1), Tuple(2, 1.1))+    +    // Consistency with equality+    XCTAssertLessThanOrEqual(Tuple(2, 1.1), Tuple(2, 1.1))+    XCTAssertGreaterThanOrEqual(Tuple(2, 1.1), Tuple(2, 1.1))++    XCTAssertLessThan(Tuple(1, 0.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(1, 0.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(1, 1.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(1, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(2, 0.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(2, 0.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertLessThan(Tuple(2, 1.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+    +    // Consistency with equality+    XCTAssertLessThanOrEqual(Tuple(2, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+    XCTAssertGreaterThanOrEqual(Tuple(2, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+  }++  func test_CustomStringConvertible() {+    XCTAssertEqual("\(Empty())", "Empty()")+    XCTAssertEqual("\(Tuple(1))", "Tuple(1)")+    XCTAssertEqual("\(Tuple(1, 2.5, "foo"))", "Tuple(1, 2.5, \"foo\")")+  }++  func test_conveniences() {+    struct X0 {  }; let x0 = X0()+    struct X1 {  }; let x1 = X1()+    struct X2 {  }; let x2 = X2()+    struct X3 {  }; let x3 = X3()+    struct X4 {  }; let x4 = X4()+    struct X5 {  }; let x5 = X5()+    struct X6 {  }; let x6 = X6()++    XCTAssert(type(of: Tuple(x0)) == Tuple1<X0>.self)+    XCTAssert(type(of: Tuple(x0, x1)) == Tuple2<X0, X1>.self)+    XCTAssert(type(of: Tuple(x0, x1, x2)) == Tuple3<X0, X1, X2>.self)+    XCTAssert(type(of: Tuple(x0, x1, x2, x3)) == Tuple4<X0, X1, X2, X3>.self)+    XCTAssert(+      type(of: Tuple(x0, x1, x2, x3, x4)) == Tuple5<X0, X1, X2, X3, X4>.self)+    XCTAssert(+      type(of: Tuple(x0, x1, x2, x3, x4, x5))+        == Tuple6<X0, X1, X2, X3, X4, X5>.self)+    XCTAssert(+      type(of: Tuple(x0, x1, x2, x3, x4, x5, x6))+        == Tuple7<X0, X1, X2, X3, X4, X5, X6>.self)+  }+  +  static var allTests = [+    ("test_mapReduce", test_mapReduce),+  ]
  static var allTests = [
    ("test_mapReduce", test_mapReduce),
    ("test_head", test_head),
    ("test_tail", test_tail),
    ("test_count", test_count),
    ("test_DefaultInitializable", test_DefaultInitializable),
    ("test_Equatable", test_Equatable),
    ("test_Comparable", test_Comparable),
    ("test_CustomStringConvertible", test_CustomStringConvertible),
    ("test_conveniences", test_conveniences),
  ]
dabrahams

comment created time in 2 days

Pull request review commentsaeta/penguin

Tuples

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//    https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.+//++/// Generalized algebraic product types+///+/// Swift's built-in tuple types are algebraic product types, but since they are+/// not nominal and not easily decomposed, they don't lend themselves to many+/// types of useful processing.  Models of `TupleProtocol` don't have those+/// problems.+public protocol TupleProtocol {+  /// The type of the first element.+  associatedtype Head++  /// An algebriac product formed by composing the types of all but the first+  /// element.+  associatedtype Tail: TupleProtocol++  /// The first element.+  var head: Head { get set }+  +  /// All elements but the first.+  var tail: Tail { get set }++  /// The number of elements+  static var count: Int { get }+}++extension Empty: TupleProtocol {+  /// The first element, when `self` is viewed as an instance of algebraic+  /// product type.+  public var head: Never { get {  fatalError() } set {  } }+  +  /// All elements but the first, when `self` is viewed as an instance of+  /// algebraic product type.+  public var tail: Self { get { self } set { } }++  /// The number of elements, when `self` is viewed as an instance of+  /// algebraic product type.+  public static var count: Int { 0 }+}++/// An algebraic product type whose first element is of type `Head` and+/// whose remaining elements can be stored in `Tail`.+public struct Tuple<Head, Tail: TupleProtocol>: TupleProtocol {+  /// The first element.+  public var head: Head+  +  /// All elements but the first.+  public var tail: Tail+  +  /// The number of elements+  public static var count: Int { Tail.count + 1 }+}++extension Tuple: DefaultInitializable+  where Head: DefaultInitializable, Tail: DefaultInitializable+{+  // Initialize `self`.+  public init() {+    head = Head()+    tail = Tail()+  }+}++extension Tuple: Equatable where Head: Equatable, Tail: Equatable {}+extension Tuple: Hashable where Head: Hashable, Tail: Hashable {}+extension Tuple: Comparable where Head: Comparable, Tail: Comparable {+  public static func < (lhs: Self, rhs: Self) -> Bool {+    if lhs.head < rhs.head { return true }+    if lhs.head > rhs.head { return false }+    return lhs.tail < rhs.tail+  }+}++private let prefixLength = "Tuple(".count++extension Tuple: CustomStringConvertible {+  public var description: String {+    if Tail.self == Empty.self {+      return "Tuple(\(String(reflecting:head )))"+    }+    else {+      return "Tuple(\(String(reflecting: head)), "+        + String(reflecting: tail).dropFirst(prefixLength)+    }+  }+}++// ======== Conveniences ============++public typealias Tuple0 = Empty

Might we want to add doc comments to these in some form?

dabrahams

comment created time in 2 days

push eventsaeta/penguin

Brennan Saeta

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

push time in 2 days

delete branch saeta/penguin

delete branch : threadpool-cleanup

delete time in 2 days

PR merged saeta/penguin

Reviewers
ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

…nfusing extensions and implementations.

This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt:

  1. Removes the Naive thread pool implementation from PJoin.
  2. Removes the unnecessary TypedComputeThreadPool protocol refinement.
  3. Removes the badly implemented extensions that implemented parallelFor in terms of join.
  4. Removes use of rethrows, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case.
  5. Adds a vectorized API (which improves performance).

Performance measurements:

After:

name                                                                   time         std                   iterations  
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 700.0 ns     ± 70289.84998218225   127457      
NonBlockingThreadPool: join, two levels                                2107.0 ns    ± 131041.5070696377   31115       
NonBlockingThreadPool: join, three levels                              4960.0 ns    ± 178122.9562964306   15849       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5893.0 ns    ± 224021.47900401088  13763       
NonBlockingThreadPool: parallel for, one level                         22420.0 ns   ± 203689.69689780468  7581        
NonBlockingThreadPool: parallel for, two levels                        500985.5 ns  ± 642136.0139757036   1390        

Before:

name                                                                   time          std                   iterations  
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 728.0 ns      ± 78662.43173968921   115554      
NonBlockingThreadPool: join, two levels                                2149.0 ns     ± 144611.11773139169  30425       
NonBlockingThreadPool: join, three levels                              5049.0 ns     ± 188450.6773907647   15157       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5951.0 ns     ± 229270.51587738466  10255       
NonBlockingThreadPool: parallel for, one level                         4919427.5 ns  ± 887590.5386061076   302         
NonBlockingThreadPool: parallel for, two levels                        4327151.0 ns  ± 855302.611386676    313         
+163 -553

2 comments

9 changed files

saeta

pr closed time in 2 days

issue openedsaeta/penguin

Review Compute Thread Pool API

Context: https://github.com/saeta/penguin/pull/32 (Deferred until after performance issues from https://github.com/saeta/penguin/issues/33 are resolved.)

CC @dabrahams

created time in 3 days

push eventsaeta/penguin

Brennan Saeta

commit sha ac6afb3010aedb7fadeb934a074754abb90e827c

Fix up tests.

view details

push time in 3 days

push eventsaeta/penguin

Brennan Saeta

commit sha ad6eb9b5af391635c924439d9ead18ccaca24735

Remove stale comment. (#29) The functionality referred to in the comment has been implemented in the `PenguinParallelWithFoundation` package, which leverages the `Foundation` dependency to query the number of available processors.

view details

Brennan Saeta

commit sha 9dd365d5e4a37efce149dfad42f339dfaf4d81ed

ComputeThreadPool cleanup (4/n): move NonBlockingSpinningState to its own file. (#34) This change moves `NonBlockingSpinningState` to its own file to reduce the size of the `NonBlockingThreadPool.swift` file.

view details

Brennan Saeta

commit sha 4c41ec8912d9e0691a4b7a0943fc6c8cb13de209

Factor out number operations from NonBlockingThreadPool.swift (#30) The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

view details

Brennan Saeta

commit sha 43f726bc5870426007197a3692190b32a10b9d1c

Merge branch 'master' into threadpool-cleanup

view details

Brennan Saeta

commit sha 590ebd0eee76430ca3d6caf85dd022b5f8d5dba7

Improve the code based on feedback.

view details

Brennan Saeta

commit sha a91c527a81eb8bfa094a1d3686296c962346bb3a

Merge branch 'threadpool-cleanup' of github.com:saeta/penguin into threadpool-cleanup

view details

push time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public struct InlineComputeThreadPool: TypedComputeThreadPool {   /// Dispatch `fn` to be run at some point in the future (immediately).   ///   /// Note: this implementation just executes `fn` immediately.-  public func dispatch(_ fn: (Self) -> Void) {-    fn(self)+  public func dispatch(_ fn: () -> Void) {

I didn't think you needed to add a completion handler at this level of abstraction. (Concretely, if you want to run something after fn completes, just wrap fn in a closure yourself, and put your code right there!) You can build up whatever you want on top, with whatever arbitrary synchronization primitives you'd like. (Concretely, I'm intentionally avoiding binding to any particular lock implementation.)

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public protocol ComputeThreadPool { }  extension ComputeThreadPool {-  /// A default implementation of the non-throwing variation in terms of the throwing one.-  public func join(_ a: () -> Void, _ b: () -> Void) {-    withoutActuallyEscaping(a) { a in-      let throwing: () throws -> Void = a-      try! join(throwing, b)-    }-  }-} -/// Holds a parallel for function; this is used to avoid extra refcount overheads on the function-/// itself.-fileprivate struct ParallelForFunctionHolder {-  var fn: ComputeThreadPool.ParallelForFunc-}--/// Uses `ComputeThreadPool.join` to execute `fn` in parallel.-fileprivate func runParallelFor<C: ComputeThreadPool>(-  pool: C,-  start: Int,-  end: Int,-  total: Int,-  fn: UnsafePointer<ParallelForFunctionHolder>-) throws {-  if start + 1 == end {-    try fn.pointee.fn(start, total)-  } else {-    assert(end > start)-    let distance = end - start-    let midpoint = start + (distance / 2)-    try pool.join(-      { try runParallelFor(pool: pool, start: start, end: midpoint, total: total, fn: fn) },-      { try runParallelFor(pool: pool, start: midpoint, end: end, total: total, fn: fn) })-  }-}--extension ComputeThreadPool {-  public func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows {-    try withoutActuallyEscaping(fn) { fn in-      var holder = ParallelForFunctionHolder(fn: fn)-      try withUnsafePointer(to: &holder) { holder in-        try runParallelFor(pool: self, start: 0, end: n, total: n, fn: holder)+  /// Convert a non-vectorized operation to a vectorized operation.

Hmmm, I thought that comments on extension methods that are implementations of methods on the protocols themselves don't show up in typical doc-generation, I tried to write something different & more specific here. I can certainly just copy-pasta the doc comment from the protocol method itself if you think that's more appropriate... :-)

That said, I've attempted to refine this a bit (in the same direction, however).

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public struct InlineComputeThreadPool: TypedComputeThreadPool {   /// Dispatch `fn` to be run at some point in the future (immediately).   ///   /// Note: this implementation just executes `fn` immediately.-  public func dispatch(_ fn: (Self) -> Void) {-    fn(self)+  public func dispatch(_ fn: () -> Void) {+    fn()+  }++  /// Executes `a` and `b` optionally in parallel, and returns when both are complete.+  ///+  /// Note: this implementation simply executes them serially.+  public func join(_ a: () -> Void, _ b: () -> Void) {

For context: I picked join as the typical term-of-art in this space. I'm not fully sold on concurrently yet, because join represents optional concurrency, which is important for performance at scale.

I think that it would be good to go over this API and think hard about naming & how the abstractions compose, but only once we understand the performance limitations & constraints. (Concretely, some of the (internal) abstractions are being re-written due to performance limitations in the current structure of things.)

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public protocol ComputeThreadPool {   /// This is the throwing overload   func join(_ a: () throws -> Void, _ b: () throws -> Void) throws +  /// A function that can be executed in parallel.+  ///+  /// The first argument is the index of the invocation, and the second argument is the total number+  /// of invocations.+  typealias ParallelForFunction = (Int, Int) -> Void+   /// A function that can be executed in parallel.   ///   /// The first argument is the index of the copy, and the second argument is the total number of   /// copies being executed.-  typealias ParallelForFunc = (Int, Int) throws -> Void+  typealias ThrowingParallelForFunction = (Int, Int) throws -> Void++  /// A vectorized function that can be executed in parallel.+  ///+  /// The first argument is the start index for the vectorized operation, and the second argument+  /// corresponds to the end of the range. The third argument contains the total size of the range.+  typealias VectorizedParallelForFunction = (Int, Int, Int) -> Void++  /// A vectorized function that can be executed in parallel.+  ///+  /// The first argument is the start index for the vectorized operation, and the second argument+  /// corresponds to the end of the range. The third argument contains the total size of the range.+  typealias ThrowingVectorizedParallelForFunction = (Int, Int, Int) throws -> Void    /// Returns after executing `fn` `n` times.   ///   /// - Parameter n: The total times to execute `fn`.-  func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows+  func parallelFor(n: Int, _ fn: ParallelForFunction)++  /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+  /// called with parameters that perfectly cover of the range `0..<n`.+  ///+  /// - Parameter n: The range of numbers `0..<n` to cover.+  func parallelFor(n: Int, _ fn: VectorizedParallelForFunction)++  /// Returns after executing `fn` `n` times.+  ///+  /// - Parameter n: The total times to execute `fn`.+  func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws++  /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+  /// called with parameters that perfectly cover of the range `0..<n`.+  ///+  /// - Parameter n: The range of numbers `0..<n` to cover.+  func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws+    // TODO: Add this & a default implementation!   // /// Returns after executing `fn` `n` times.   // ///   // /// - Parameter n: The total times to execute `fn`.   // /// - Parameter blocksPerThread: The minimum block size to subdivide. If unspecified, a good   // ///   value will be chosen based on the amount of available parallelism.-  // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunc)-  // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunc)+  // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunction)+  // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunction)    /// The maximum amount of parallelism possible within this thread pool.   var parallelism: Int { get }

lol had a similar thought after pondering the doc comment a bit further. 👍

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public protocol ComputeThreadPool { }  extension ComputeThreadPool {-  /// A default implementation of the non-throwing variation in terms of the throwing one.-  public func join(_ a: () -> Void, _ b: () -> Void) {-    withoutActuallyEscaping(a) { a in-      let throwing: () throws -> Void = a-      try! join(throwing, b)-    }-  }-} -/// Holds a parallel for function; this is used to avoid extra refcount overheads on the function-/// itself.-fileprivate struct ParallelForFunctionHolder {-  var fn: ComputeThreadPool.ParallelForFunc-}--/// Uses `ComputeThreadPool.join` to execute `fn` in parallel.-fileprivate func runParallelFor<C: ComputeThreadPool>(-  pool: C,-  start: Int,-  end: Int,-  total: Int,-  fn: UnsafePointer<ParallelForFunctionHolder>-) throws {-  if start + 1 == end {-    try fn.pointee.fn(start, total)-  } else {-    assert(end > start)-    let distance = end - start-    let midpoint = start + (distance / 2)-    try pool.join(-      { try runParallelFor(pool: pool, start: start, end: midpoint, total: total, fn: fn) },-      { try runParallelFor(pool: pool, start: midpoint, end: end, total: total, fn: fn) })-  }-}--extension ComputeThreadPool {-  public func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows {-    try withoutActuallyEscaping(fn) { fn in-      var holder = ParallelForFunctionHolder(fn: fn)-      try withUnsafePointer(to: &holder) { holder in-        try runParallelFor(pool: self, start: 0, end: n, total: n, fn: holder)+  /// Convert a non-vectorized operation to a vectorized operation.+  public func parallelFor(n: Int, _ fn: ParallelForFunction) {+    parallelFor(n: n) { start, end, total in+      for i in start..<end {+        fn(i, total)       }     }   }-} -/// Typed compute threadpools support additional sophisticated operations.-public protocol TypedComputeThreadPool: ComputeThreadPool {-  /// Submit a task to be executed on the threadpool.-  ///-  /// `pRun` will execute task in parallel on the threadpool and it will complete at a future time.-  /// `pRun` returns immediately.-  func dispatch(_ task: (Self) -> Void)--  /// Run two tasks (optionally) in parallel.-  ///-  /// Fork-join parallelism allows for efficient work-stealing parallelism. The two non-escaping-  /// functions will have finished executing before `pJoin` returns. The first function will execute on-  /// the local thread immediately, and the second function will execute on another thread if resources-  /// are available, or on the local thread if there are not available other resources.-  func join(_ a: (Self) -> Void, _ b: (Self) -> Void)--  /// Run two throwing tasks (optionally) in parallel; if one task throws, it is unspecified-  /// whether the second task is even started.-  ///-  /// This is the throwing overloaded variation.-  func join(_ a: (Self) throws -> Void, _ b: (Self) throws -> Void) throws-}--extension TypedComputeThreadPool {-  /// Implement the non-throwing variation in terms of the throwing one.-  public func join(_ a: (Self) -> Void, _ b: (Self) -> Void) {-    withoutActuallyEscaping(a) { a in-      let throwing: (Self) throws -> Void = a-      // Implement the non-throwing in terms of the throwing implementation.-      try! join(throwing, b)+  /// Convert a non-vectorized operation to a vectorized operation.+  public func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws {+    try parallelFor(n: n) { start, end, total in+      for i in start..<end {+        try fn(i, total)+      }     }   } } -extension TypedComputeThreadPool {-  public func dispatch(_ fn: @escaping () -> Void) {-    dispatch { _ in fn() }-  }--  public func join(_ a: () -> Void, _ b: () -> Void) {-    join({ _ in a() }, { _ in b() })-  }--  public func join(_ a: () throws -> Void, _ b: () throws -> Void) throws {-    try join({ _ in try a() }, { _ in try b() })-  }-}- /// A `ComputeThreadPool` that executes everything immediately on the current thread. /// /// This threadpool implementation is useful for testing correctness, as well as avoiding context /// switches when a computation is designed to be parallelized at a coarser level.-public struct InlineComputeThreadPool: TypedComputeThreadPool {+public struct InlineComputeThreadPool: ComputeThreadPool {

In terms of thread-pools, there can be a number of different designs with different properties. In the same way that you can implement a random access collection in terms of a collection (just really inefficiently), I wanted to clearly distinguish what properties the thread-pool has. Concretely, there are I/O-focused thread-pools, where you can blocking and/or non-blocking I/O. This thread pool abstraction is focused on compute-bound tasks, and is tuned / structured with APIs focused on that domain. Does that make sense?

Happy to ponder the names further... related work also uses ConcurrentWorkQueue.

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public protocol ComputeThreadPool {   /// This is the throwing overload   func join(_ a: () throws -> Void, _ b: () throws -> Void) throws +  /// A function that can be executed in parallel.+  ///+  /// The first argument is the index of the invocation, and the second argument is the total number+  /// of invocations.+  typealias ParallelForFunction = (Int, Int) -> Void+   /// A function that can be executed in parallel.   ///   /// The first argument is the index of the copy, and the second argument is the total number of   /// copies being executed.-  typealias ParallelForFunc = (Int, Int) throws -> Void+  typealias ThrowingParallelForFunction = (Int, Int) throws -> Void++  /// A vectorized function that can be executed in parallel.+  ///+  /// The first argument is the start index for the vectorized operation, and the second argument+  /// corresponds to the end of the range. The third argument contains the total size of the range.+  typealias VectorizedParallelForFunction = (Int, Int, Int) -> Void++  /// A vectorized function that can be executed in parallel.+  ///+  /// The first argument is the start index for the vectorized operation, and the second argument+  /// corresponds to the end of the range. The third argument contains the total size of the range.+  typealias ThrowingVectorizedParallelForFunction = (Int, Int, Int) throws -> Void    /// Returns after executing `fn` `n` times.   ///   /// - Parameter n: The total times to execute `fn`.-  func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows+  func parallelFor(n: Int, _ fn: ParallelForFunction)++  /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+  /// called with parameters that perfectly cover of the range `0..<n`.+  ///+  /// - Parameter n: The range of numbers `0..<n` to cover.+  func parallelFor(n: Int, _ fn: VectorizedParallelForFunction)++  /// Returns after executing `fn` `n` times.+  ///+  /// - Parameter n: The total times to execute `fn`.+  func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws++  /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+  /// called with parameters that perfectly cover of the range `0..<n`.+  ///+  /// - Parameter n: The range of numbers `0..<n` to cover.+  func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws+    // TODO: Add this & a default implementation!   // /// Returns after executing `fn` `n` times.   // ///   // /// - Parameter n: The total times to execute `fn`.   // /// - Parameter blocksPerThread: The minimum block size to subdivide. If unspecified, a good   // ///   value will be chosen based on the amount of available parallelism.-  // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunc)-  // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunc)+  // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunction)+  // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunction)    /// The maximum amount of parallelism possible within this thread pool.

Took a quick pass, although this can probably be refined further.

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr     if let e = err { throw e }   } +  public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {+    let grainSize = n / parallelism  // TODO: Make adaptive!++    func executeParallelFor(_ start: Int, _ end: Int) {+      if start + grainSize >= end {+        fn(start, end, n)+      } else {+        // Divide into 2 & recurse.+        let rangeSize = end - start+        let midPoint = start + (rangeSize / 2)+        self.join({ executeParallelFor(start, midPoint) }, { executeParallelFor(midPoint, end)})+      }+    }++    executeParallelFor(0, n)+  }++  public func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws {+    let grainSize = n / parallelism  // TODO: Make adaptive!++    func executeParallelFor(_ start: Int, _ end: Int) throws {+      if start + grainSize >= end {+        try fn(start, end, n)+      } else {+        // Divide into 2 & recurse.+        let rangeSize = end - start+        let midPoint = start + (rangeSize / 2)+        try self.join({ try executeParallelFor(start, midPoint) }, { try executeParallelFor(midPoint, end) })

Ah, good point. That description is getting ahead of the actual implementation in this patch set. I'll update the description in the PR shortly.

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr     if let e = err { throw e }   } +  public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {

+1 to doc comment. PTAL?

I believe that this should most often be accessed as an operation on a random access (or likely some form of "splittable") collection. But in any case, that will have to be generic over the thread pool itself, so we don't get away from having this method and coming up with a name for it.

Note: I started going in this direction a while back but I think that direction needs a "reboot". For now, I'd like to focus on getting this low-level API implemented correctly and efficiently, and we can then refactor and/or stack on the further abstractions.

FWIW: I started out by having VectorizedParallelForFunction take a range instead of 2 integers representing the start and end, but that makes type inference not work as well (as code requires annotations because the alternative API induces an ambiguity between the non-vectorized and vectorized APIs).

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

 public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr     if let e = err { throw e }   } +  public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {+    let grainSize = n / parallelism  // TODO: Make adaptive!++    func executeParallelFor(_ start: Int, _ end: Int) {+      if start + grainSize >= end {+        fn(start, end, n)+      } else {+        // Divide into 2 & recurse.+        let rangeSize = end - start+        let midPoint = start + (rangeSize / 2)+        self.join({ executeParallelFor(start, midPoint) }, { executeParallelFor(midPoint, end)})+      }+    }++    executeParallelFor(0, n)+  }++  public func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws {

+1; done. (Although I suspect that this comment could be improved...)

saeta

comment created time in 3 days

push eventsaeta/penguin

Brennan Saeta

commit sha 35e0208385c44945174a25be1cf6c92dfe45fd54

Update Sources/PenguinParallel/ThreadPool.swift Co-authored-by: Dave Abrahams <dabrahams@google.com>

view details

push time in 3 days

PR opened saeta/penguin

Reviewers
Alphabetize test suites.
+3 -3

0 comment

1 changed file

pr created time in 3 days

create barnchsaeta/penguin

branch : reorder-tests

created branch time in 3 days

issue openedsaeta/penguin

Polish fast RNG & make available publicly

https://github.com/saeta/penguin/pull/30 pulled the PCGRandomNumberGenerator out from NonBlockingThreadPool.swift, but it (and fastFit) should really move to PenguinStructures (or some similar library) and be made publicly available (and generic where appropriate).

Note: https://github.com/saeta/penguin/pull/36 pulls out the other numerical bits and cleans them up.

created time in 3 days

pull request commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

This relates to umbrella issue: https://github.com/saeta/penguin/issues/33

saeta

comment created time in 3 days

pull request commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

This is a follow-up to https://github.com/saeta/penguin/pull/30

saeta

comment created time in 3 days

PR opened saeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

Thank you @dabrahams for the suggestions!

+103 -20

0 comment

5 changed files

pr created time in 3 days

create barnchsaeta/penguin

branch : nicer-numbers

created branch time in 3 days

delete branch saeta/penguin

delete branch : factor-out-numbers

delete time in 3 days

push eventsaeta/penguin

Brennan Saeta

commit sha 4c41ec8912d9e0691a4b7a0943fc6c8cb13de209

Factor out number operations from NonBlockingThreadPool.swift (#30) The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

view details

push time in 3 days

PR merged saeta/penguin

Reviewers
Factor out number operations from NonBlockingThreadPool.swift

The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

+60 -44

0 comment

2 changed files

saeta

pr closed time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+///     https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+  let l = UInt32(lhs)+  let r = UInt32(size)+  return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential+/// generators](https://en.wikipedia.org/wiki/Permuted_congruential_generator)

I've definitely looked over that site a fair bit. That said, you're definitely right that this link should go to the proper source and not wikipedia. FWIW, I didn't port an implementation from Wikipedia, but I did skimp on some aspects of the implementation that could be improved. I consider this good-enough for now, as I don't need a high-quality source of random number generation, but when we factor this out into a more reusable spot, we should definitely ensure we have a good implementation.

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+///     https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+  let l = UInt32(lhs)+  let r = UInt32(size)+  return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential+/// generators](https://en.wikipedia.org/wiki/Permuted_congruential_generator)+internal struct PCGRandomNumberGenerator {

For some reason, I thought I could only conform if I returned UInt64's. But as it turns out, apparently I don't need do... Thanks for the pointer!

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+///     https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {

Definitely; will follow-up to pull this out. One question to ponder: what should the name of this be, and what operator should we define for it (if any)? ;-)

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+///     https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+  let l = UInt32(lhs)+  let r = UInt32(size)+  return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential

Good point!

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.

Excellent. Thank you!

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+  var coprimes = [Int]()+  for i in 1...n {+    var a = i+    var b = n+    // If GCD(a, b) == 1, then a and b are coprimes.+    while b != 0 {+      let tmp = a+      a = b+      b = tmp % b+    }+    if a == 1 { coprimes.append(i) }+  }+  return coprimes+}++/// Reduce `lhs` into `[0, size)`.

Yeah, that's definitely much nicer.

saeta

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//      http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {

Yes, I will do this in a follow-up PR. Great suggestion! (I will likely move this over to PenguinStructures as that's probably the best spot to put this for now (unless you have a better suggestion)?

saeta

comment created time in 3 days

push eventsaeta/penguin

Dave Abrahams

commit sha 52d266126e30b0e6582c5346b4148c07fffcb5b6

Add ArrayBuffer basics. (#25)

view details

Dave Abrahams

commit sha 45223e965b7ff965f5d1be8d66e344af23b8126f

Add FixedSizeArray. (#24) * Add FixedSizeArray. * Use new accessors * Guaranteed O(1) complexity for subscript. Also some doc comment updates.

view details

Dave Abrahams

commit sha 16fa3976ebd1c38ef8ffa1f9efad1457e534a846

Improve doc comments.

view details

Dave Abrahams

commit sha e2da4d6bfecc921cfd0c60f730e784b09c10add8

Fix indentation

view details

Dave Abrahams

commit sha 81d80b876449e29a33d174bb257a58ec381ac0f9

Value semantics verification for withUnsafeMutableBufferPointer. (#26)

view details

Brennan Saeta

commit sha ad6eb9b5af391635c924439d9ead18ccaca24735

Remove stale comment. (#29) The functionality referred to in the comment has been implemented in the `PenguinParallelWithFoundation` package, which leverages the `Foundation` dependency to query the number of available processors.

view details

Brennan Saeta

commit sha 9dd365d5e4a37efce149dfad42f339dfaf4d81ed

ComputeThreadPool cleanup (4/n): move NonBlockingSpinningState to its own file. (#34) This change moves `NonBlockingSpinningState` to its own file to reduce the size of the `NonBlockingThreadPool.swift` file.

view details

Brennan Saeta

commit sha b1fa87ce7e2c3f94bd137d099f0b8f46cf086cf1

Merge remote-tracking branch 'origin/master' into factor-out-numbers

view details

Brennan Saeta

commit sha 216e4bdaf22e7a30ebc0ab22c0002fbd7b65414b

Improve the code based on reviewer feedback.

view details

push time in 3 days

create barnchsaeta/penguin

branch : fast-parallel-for

created branch time in 4 days

pull request commentsaeta/penguin

Rename `WorkItem` to `JoinWorkItem`.

https://github.com/saeta/penguin/issues/33

saeta

comment created time in 5 days

pull request commentsaeta/penguin

ComputeThreadPool cleanup (4/n): NonBlockingSpinningState to new file.

https://github.com/saeta/penguin/issues/33

saeta

comment created time in 5 days

PR opened saeta/penguin

Rename `WorkItem` to `JoinWorkItem`.

This is in preparation for changing the TaskQueues to not hold closures. The key insight is that closures, being reference types, induce a lot of ARC traffic. In order to achieve good performance, we need to move away from that, where possible.

+21 -21

0 comment

1 changed file

pr created time in 5 days

create barnchsaeta/penguin

branch : rename-workitem

created branch time in 5 days

PR opened saeta/penguin

Reviewers
ComputeThreadPool cleanup (4/n): NonBlockingSpinningState to new file.

This change moves NonBlockingSpinningState to its own file to reduce the size of the NonBlockingThreadPool.swift file.

+79 -62

0 comment

2 changed files

pr created time in 5 days

create barnchsaeta/penguin

branch : break-out-spinning-state

created branch time in 5 days

issue commentsaeta/penguin

Improve the `ComputeThreadPool` abstraction

Experimental PR/branch with some of the key ideas: https://github.com/saeta/penguin/pull/11

saeta

comment created time in 5 days

issue commentsaeta/penguin

Improve the `ComputeThreadPool` abstraction

https://github.com/saeta/penguin/pull/29, https://github.com/saeta/penguin/pull/30, https://github.com/saeta/penguin/pull/32

saeta

comment created time in 5 days

issue openedsaeta/penguin

Improve the `ComputeThreadPool` abstraction

Creating a tracking issue to group related PRs.

created time in 5 days

pull request commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

Note: I'm trying to break up a very large refactoring I've been working on in #11 (and related branches) into more easily reviewable pieces. Happy to explain how they all fit together out-of-band as appropriate.

saeta

comment created time in 5 days

PR opened saeta/penguin

Reviewers
ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

…nfusing extensions and implementations.

This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt:

  1. Removes the Naive thread pool implementation from PJoin.
  2. Removes the unnecessary TypedComputeThreadPool protocol refinement.
  3. Removes the badly implemented extensions that implemented parallelFor in terms of join.
  4. Removes use of rethrows, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case.
  5. Adds a vectorized API (which improves performance).

Performance measurements:

After:

name                                                                   time         std                   iterations  
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 700.0 ns     ± 70289.84998218225   127457      
NonBlockingThreadPool: join, two levels                                2107.0 ns    ± 131041.5070696377   31115       
NonBlockingThreadPool: join, three levels                              4960.0 ns    ± 178122.9562964306   15849       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5893.0 ns    ± 224021.47900401088  13763       
NonBlockingThreadPool: parallel for, one level                         22420.0 ns   ± 203689.69689780468  7581        
NonBlockingThreadPool: parallel for, two levels                        500985.5 ns  ± 642136.0139757036   1390        

Before:

name                                                                   time          std                   iterations  
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 728.0 ns      ± 78662.43173968921   115554      
NonBlockingThreadPool: join, two levels                                2149.0 ns     ± 144611.11773139169  30425       
NonBlockingThreadPool: join, three levels                              5049.0 ns     ± 188450.6773907647   15157       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5951.0 ns     ± 229270.51587738466  10255       
NonBlockingThreadPool: parallel for, one level                         4919427.5 ns  ± 887590.5386061076   302         
NonBlockingThreadPool: parallel for, two levels                        4327151.0 ns  ± 855302.611386676    313         
+132 -534

0 comment

6 changed files

pr created time in 5 days

create barnchsaeta/penguin

branch : threadpool-cleanup

created branch time in 5 days

delete branch saeta/penguin

delete branch : refactor-to-class

delete time in 5 days

PR closed saeta/penguin

Refactor ComputeThreadPool to a class

This was an attempt to refactor ComputeThreadPool into a class. It ended up slowing things down by 2x, so this should not be merged (as-is), but making this PR for posterity.

+454 -223

0 comment

12 changed files

saeta

pr closed time in 5 days

PR opened saeta/penguin

Refactor ComputeThreadPool to a class

This was an attempt to refactor ComputeThreadPool into a class. It ended up slowing things down by 2x, so this should not be merged (as-is), but making this PR for posterity.

+454 -223

0 comment

12 changed files

pr created time in 5 days

create barnchsaeta/penguin

branch : refactor-to-class

created branch time in 5 days

PR opened saeta/penguin

Reviewers
Factor out number operations from NonBlockingThreadPool.swift

The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

+59 -43

0 comment

2 changed files

pr created time in 5 days

more