apple/swift 51789

The Swift Programming Language

fast.ai early development experiments

A swift library to benchmark code snippets.

fastai/swiftai 393

Swift for TensorFlow's high-level API, modeled after fastai

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. This fork is used to manage Apple’s stable releases of Clang as well as support the Swift project.

Find shape errors before you run your code!

A suite of libraries for data science & high performance computation in Swift

A souce snapshot of a sample Play! app with SBT packaging.

saeta/CS-249A-Reimplementation-In-Scala 10

I am simply reimplementing an 8000 line C++ Project for CS 249A @ Stanford in Scala to learn about scala, and to experiment with ideas in software engineering.

issue commentdeepmind/open_spiel

Breakthrough Swift implementation is slow

Thanks @sbodenstein for noticing and inquiring about this! I've put together https://github.com/deepmind/open_spiel/pull/230 which helps a bit. I suspect there's a fair bit more that's possible to improve Breakthrough's performance.

Please do let me know if you'd be interested in further optimizations!

comment created time in 4 hours

PR opened deepmind/open_spiel

This PR defines some benchmarks for Swift OpenSpiel, and modestly improves Swift's `Breakthrough`

performance.

On my machine, performance improves by ~1.65x based on the benchmarks:

Before:

```
name time std iterations
-----------------------------------------------------------
random game: Breakthrough 173380.0 ns ± 24.30 % 7276
```

After:

```
name time std iterations
-----------------------------------------------------------
random game: Breakthrough 104894.5 ns ± 24.74 % 13624
```

I suspect future optimizations are quite possible. In particular, I suspect flattening out `State.board`

from being: `Array<Optional<BreakthroughPlayer>>`

to `Array<FlattenedEnum>`

might help, but have not tried that yet.

Issue: https://github.com/deepmind/open_spiel/issues/228

pr created time in 4 hours

pull request commentsaeta/penguin

Remove the `GraphDistanceMeasure` protocol, as it is no longer needed.

Note the base branch! This is a follow-on to https://github.com/saeta/penguin/pull/45 and so for now, I've pointed the base branch to `redo-heaps`

. But once https://github.com/saeta/penguin/pull/45 goes in, I'll merge `master`

into this branch, and switch the PR to base off of `master`

.

comment created time in 12 hours

PR opened saeta/penguin

This is a follow-on to https://github.com/saeta/penguin/pull/45.

pr created time in 12 hours

push eventsaeta/penguin

commit sha ec71a0162b5eb5c7e5d2d60f4b00b7aff77a169f

Update comments & sipmlify API for Dijkstra's search.

push time in 14 hours

push eventsaeta/penguin

commit sha f0a1c6863dd9ac69d923a62559636bc29b29c146

Add tests & fix-up existing tests to work with the new API.

push time in 15 hours

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.

Oh, apologies for missing that, and for only seeing your comment now; https://github.com/saeta/penguin/pull/46 for the fix (LMK if I missed something else)!

For context, I often choose to re-type the suggestions out. I find it helps me internalize them and think about them more deeply. I will try and be more careful about not accidentally truncating them in the future!

comment created time in a day

push eventsaeta/penguin

commit sha 1fe092ac00a5691d8f5892df83f12ac2884393c8

Rewrite the heap and related priority queue structures. This change moves the heap related algorithms to operate generically on `Collection` (and its various refinements). In doing so, the algorithms become far more general and reusable. Further, we re-build the PriorityQueue type, solving the indexing TODO's that plagued the previous implementation. Finally, we update all the rest of the code to use the new API.

push time in a day

PR opened saeta/penguin

This change moves the heap related algorithms to operate generically on
`Collection`

(and its various refinements). In doing so, the algorithms
become far more general and reusable.

Further, we re-build the PriorityQueue type, solving the indexing TODO's that plagued the previous implementation.

Finally, we update all the rest of the code to use the new API.

pr created time in a day

pull request commentgoogle/swift-benchmark

hehe; no worries! (@dabrahams fixed up one of my warnings a while back, so I can now pay it forward.)

comment created time in a day

PR opened google/swift-benchmark

Previously, a warning was emitted when compiling in non-release builds. This change fixes the warning.

pr created time in 2 days

push eventsaeta/penguin

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

commit sha e43e285bbbd8e2960b5c2645905c9c92a8a53294

Tuples (#28) Thanks to @saeta for suggestions and fixes. Co-authored-by: Brennan Saeta <brennan.saeta@gmail.com>

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

commit sha b450bc7f0d4a513d0a7140ccb17dacfade4285d5

Merge branch 'master' into unmanaged-buffer

push time in 2 days

pull request commentsaeta/penguin

Ensure grain size is >= 1 & reduce duplication in `parallelFor`.

https://github.com/saeta/penguin/issues/33

comment created time in 2 days

pull request commentsaeta/penguin

Copies references to data structures to avoid extra ARC traffic.

https://github.com/saeta/penguin/issues/33

comment created time in 2 days

pull request commentsaeta/penguin

Implement `UnmanagedBuffer` and switch `TaskDeque` to use it.

https://github.com/saeta/penguin/issues/33

comment created time in 2 days

PR opened saeta/penguin

Note: this change (by itself) does not reduce ARC traffic, but in concert
with `UnmanagedBuffer`

(#41), we see a ~15x performance improvement in
the parallelFor benchmark.

pr created time in 2 days

push eventsaeta/penguin

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

commit sha e43e285bbbd8e2960b5c2645905c9c92a8a53294

Tuples (#28) Thanks to @saeta for suggestions and fixes. Co-authored-by: Brennan Saeta <brennan.saeta@gmail.com>

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

commit sha 7324d367f5e3b5ebc0a079720e6b00224fb31701

Merge remote-tracking branch 'origin/master' into nicer-numbers

push time in 2 days

push eventsaeta/penguin

commit sha 9dbb65c7a7232c92e21cfb9bdd10719c8749063a

Respond to comments.

push time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import PenguinStructures+import XCTest++final class NumberOperationsTests: XCTestCase {

Fixed! Including some wonky ones. LMK if you have some other good suggestions!

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr let totalThreadCount = threadCount + externalFastPathThreadCount self.totalThreadCount = totalThreadCount self.externalFastPathThreadCount = externalFastPathThreadCount- self.coprimes = positiveCoprimes(totalThreadCount)+ self.coprimes = Array(totalThreadCount.positiveCoprimes)

Fair enough, and to take it one step better, I've added a doc comment. :-)

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+ /// The number to find co-primes relative to.+ let n: Number++ /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+ internal init(_ n: Number) {+ precondition(n > 0, "\(n) doees not have defined positive co-primes.")+ self.n = n+ }++ /// Returns an iterator that incrementally computes co-primes relative to `n`.+ public func makeIterator() -> Iterator {+ Iterator(n: n, i: 0)+ }++ /// Iteratively computes co-primes relative to `n` starting from 1.+ public struct Iterator: IteratorProtocol {+ /// The number we are finding co-primes relative to.+ let n: Number+ /// A sequence counter representing one less than the next candidate to try.+ var i: Number++ /// Returns the next co-prime, or nil if all co-primes have been found.+ mutating public func next() -> Number? {+ while (i+1) < n {+ i += 1+ if greatestCommonDivisor(i, n) == 1 { return i }+ }+ return nil+ }+ }+}++/// Returns the greatest common divisor between two numbers.+///+/// This implementation uses Euclid's algorithm.+// TODO: Switch to the Binary GCD algorithm which avoids expensive modulo operations.+public func greatestCommonDivisor<Number: BinaryInteger>(_ a: Number, _ b: Number) -> Number {

👍

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+ /// The number to find co-primes relative to.+ let n: Number++ /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+ internal init(_ n: Number) {+ precondition(n > 0, "\(n) doees not have defined positive co-primes.")+ self.n = n+ }++ /// Returns an iterator that incrementally computes co-primes relative to `n`.+ public func makeIterator() -> Iterator {+ Iterator(n: n, i: 0)+ }++ /// Iteratively computes co-primes relative to `n` starting from 1.+ public struct Iterator: IteratorProtocol {+ /// The number we are finding co-primes relative to.+ let n: Number+ /// A sequence counter representing one less than the next candidate to try.+ var i: Number++ /// Returns the next co-prime, or nil if all co-primes have been found.+ mutating public func next() -> Number? {+ while (i+1) < n {+ i += 1+ if greatestCommonDivisor(i, n) == 1 { return i }+ }+ return nil+ }+ }+}++/// Returns the greatest common divisor between two numbers.+///+/// This implementation uses Euclid's algorithm.+// TODO: Switch to the Binary GCD algorithm which avoids expensive modulo operations.

Fixed up by:

- Good point; updated doc comment.
- Removed the documented algorithm.
- Documented the complexity.
- Appropriately handle negative numbers (because gcd is defined for them).

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+ /// The number to find co-primes relative to.+ let n: Number++ /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+ internal init(_ n: Number) {+ precondition(n > 0, "\(n) doees not have defined positive co-primes.")

Good catch! I think the right answer from a mathematical perspective is to convert to positive, and go from there. (Coprimes are symmetric around 0, so I don't think we need to define the corresponding negative coprimes sequence as a type, but it might be good to define a `.lazy.map { $0 * -1 }`

extension... WDYT?

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {

I'd rather instead of making it conform to collection instead make it a sequence. Concretely, we know there's infinitely many primes, and there's no reason why we should stop at the number itself, so I propose that this actually be an (infinite) sequence instead.

I also dislike the collection protocol here because the size of the collection is unknown until it's been computed, which reduces the value of it being a constant storage abstraction.

I like `Domain`

, but I dislike `limit`

, and think `n`

(or perhaps `b`

) is more appropriate appropriate.

Edit: I'm fine with `target`

(as you suggested), and have made the corresponding edits, but I'm still not fully convinced this is the right choice...

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.+public struct PositiveCoprimes<Number: BinaryInteger>: Sequence {+ /// The number to find co-primes relative to.+ let n: Number++ /// Constructs a `PositiveCoprimes` sequence of numbers co-prime relative to `n`.+ internal init(_ n: Number) {+ precondition(n > 0, "\(n) doees not have defined positive co-primes.")+ self.n = n+ }++ /// Returns an iterator that incrementally computes co-primes relative to `n`.+ public func makeIterator() -> Iterator {+ Iterator(n: n, i: 0)+ }++ /// Iteratively computes co-primes relative to `n` starting from 1.+ public struct Iterator: IteratorProtocol {+ /// The number we are finding co-primes relative to.+ let n: Number+ /// A sequence counter representing one less than the next candidate to try.+ var i: Number++ /// Returns the next co-prime, or nil if all co-primes have been found.+ mutating public func next() -> Number? {+ while (i+1) < n {+ i += 1+ if greatestCommonDivisor(i, n) == 1 { return i }+ }+ return nil+ }+ }+}

I like a lot of these suggestions. I've had to tweak them to apply to the new definition of the PositiveCoprimes sequence.

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }

👍

comment created time in 2 days

Pull request review commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++extension BinaryInteger {+ /// Returns a sequence of the positive integers that are co-prime with `self`.+ ///+ /// Definition: Two numbers are co-prime if their GCD is 1.+ public var positiveCoprimes: PositiveCoprimes<Self> { PositiveCoprimes(self) }+}++/// A sequence of numbers that are co-prime with `n`, up to `n`.

Edited text based on your input and in light of the discussion below about whether this should even be limited to smaller-than-`n`

.

comment created time in 2 days

push eventsaeta/penguin

commit sha b5ab4b60429b74ac0bd1891ade80ac1755172ed2

Alphabetize test suites. (#38) * Alphabetize test suites. Co-authored-by: Dave Abrahams <dabrahams@google.com>

push time in 2 days

push eventsaeta/penguin

commit sha 675e1d6205edba6b637e416bf230f899024c1dae

Update Tests/PenguinStructuresTests/XCTestManifests.swift Co-authored-by: Dave Abrahams <dabrahams@google.com>

push time in 2 days

push eventsaeta/penguin

commit sha 9e31153761351c76b153b9893808e447998e6e8d

Implement `UnmanagedBuffer` and switch `TaskDeque` to use it. Previously, `TaskDeque` was implemented in terms of `ManagedBuffer`. While `ManagedBuffer` implements the semantics we'd like, it is implemented as a class. This can induce a significant amount of reference couting traffic, especially when stored in `Array`s. `UnmanagedBuffer` implements a similar interface to `ManagedBuffer`, but instead uses manual pointer allocation and management. This allows us to avoid all reference counting traffic, at the cost of requiring explicit destruction. Switching `TaskDeque` from `ManagedBuffer` to `UnmanagedBuffer` yields between a 2x and 6x performance improvemenet for key workloads that stress the `TaskDeque` data structure within the `NonBlockingThreadPool`. Below are performance numbers across 2 machines & operating systems demonstrating performance improvements. Note: because this change has been extracted from a stack of related performance improvements, if you benchmark this PR itself, you will not see the expected performance improvements. Instead, this PR has been separated out to facilitate easier reviewing. Benchmark numbers on machine A: ------------------------------- Before (from previous commit): ``` name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 ``` After: ``` name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 429.0 ns ± 59095.36128834737 223050 NonBlockingThreadPool: join, two levels 1270.0 ns ± 101587.48601579959 64903 NonBlockingThreadPool: join, three levels 3098.0 ns ± 165407.1669656578 28572 NonBlockingThreadPool: join, four levels, three on thread pool thread 3990.5 ns ± 227217.34017343252 10000 NonBlockingThreadPool: parallel for, one level 16853.0 ns ± 260015.39296821563 8660 NonBlockingThreadPool: parallel for, two levels 563926.0 ns ± 609298.6358076902 2189 ``` Benchmark numbers from machine B: --------------------------------- Before: ``` name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 3022.0 ns ± 366686.3050127019 21717 NonBlockingThreadPool: join, two levels 13313.5 ns ± 550429.476815564 5970 NonBlockingThreadPool: join, three levels 39009.5 ns ± 716172.9687807652 3546 NonBlockingThreadPool: join, four levels, three on thread pool thread 341631.0 ns ± 767483.9227743072 2367 NonBlockingThreadPool: parallel for, one level 404375.0 ns ± 590178.6724299589 3123 NonBlockingThreadPool: parallel for, two levels 1000872.0 ns ± 1592704.2766365155 805 ``` After: ``` name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 749.0 ns ± 174096.69284101788 91247 NonBlockingThreadPool: join, two levels 12046.5 ns ± 414670.5686344325 5920 NonBlockingThreadPool: join, three levels 46975.0 ns ± 543858.2306554643 3561 NonBlockingThreadPool: join, four levels, three on thread pool thread 559837.0 ns ± 591477.1893574063 2795 NonBlockingThreadPool: parallel for, one level 66446.0 ns ± 627245.5098742851 2236 NonBlockingThreadPool: parallel for, two levels 1668739.0 ns ± 1536323.375783659 765 ```

push time in 2 days

PR opened saeta/penguin

Previously, `TaskDeque`

was implemented in terms of `ManagedBuffer`

. While
`ManagedBuffer`

implements the semantics we'd like, it is implemented as a
class. This can induce a significant amount of reference couting traffic,
especially when stored in `Array`

s.

`UnmanagedBuffer`

implements a similar interface to `ManagedBuffer`

, but
instead uses manual pointer allocation and management. This allows us to
avoid all reference counting traffic, at the cost of requiring explicit
destruction.

Switching `TaskDeque`

from `ManagedBuffer`

to `UnmanagedBuffer`

yields
between a 2x and 6x performance improvemenet for key workloads that stress
the `TaskDeque`

data structure within the `NonBlockingThreadPool`

.

Below are performance numbers across 2 machines & operating systems demonstrating performance improvements. Note: because this change has been extracted from a stack of related performance improvements, if you benchmark this PR itself, you will not see the expected performance improvements. Instead, this PR has been separated out to facilitate easier reviewing.

## Benchmark numbers on machine A:

Before (from previous commit):

```
name time std iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457
NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115
NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849
NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763
NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581
NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390
```

After:

```
name time std iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 429.0 ns ± 59095.36128834737 223050
NonBlockingThreadPool: join, two levels 1270.0 ns ± 101587.48601579959 64903
NonBlockingThreadPool: join, three levels 3098.0 ns ± 165407.1669656578 28572
NonBlockingThreadPool: join, four levels, three on thread pool thread 3990.5 ns ± 227217.34017343252 10000
NonBlockingThreadPool: parallel for, one level 16853.0 ns ± 260015.39296821563 8660
NonBlockingThreadPool: parallel for, two levels 563926.0 ns ± 609298.6358076902 2189
```

## Benchmark numbers from machine B:

Before:

```
name time std iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 3022.0 ns ± 366686.3050127019 21717
NonBlockingThreadPool: join, two levels 13313.5 ns ± 550429.476815564 5970
NonBlockingThreadPool: join, three levels 39009.5 ns ± 716172.9687807652 3546
NonBlockingThreadPool: join, four levels, three on thread pool thread 341631.0 ns ± 767483.9227743072 2367
NonBlockingThreadPool: parallel for, one level 404375.0 ns ± 590178.6724299589 3123
NonBlockingThreadPool: parallel for, two levels 1000872.0 ns ± 1592704.2766365155 805
```

After:

```
name time std iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 749.0 ns ± 174096.69284101788 91247
NonBlockingThreadPool: join, two levels 12046.5 ns ± 414670.5686344325 5920
NonBlockingThreadPool: join, three levels 46975.0 ns ± 543858.2306554643 3561
NonBlockingThreadPool: join, four levels, three on thread pool thread 559837.0 ns ± 591477.1893574063 2795
NonBlockingThreadPool: parallel for, one level 66446.0 ns ± 627245.5098742851 2236
NonBlockingThreadPool: parallel for, two levels 1668739.0 ns ± 1536323.375783659 765
```

pr created time in 2 days

Pull request review commentsaeta/penguin

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.+//++/// Generalized algebraic product types+///+/// Swift's built-in tuple types are algebraic product types, but since they are+/// not nominal and not easily decomposed, they don't lend themselves to many+/// types of useful processing. Models of `TupleProtocol` don't have those+/// problems.+public protocol TupleProtocol {+ /// The type of the first element.+ associatedtype Head++ /// An algebriac product formed by composing the types of all but the first+ /// element.+ associatedtype Tail: TupleProtocol++ /// The first element.+ var head: Head { get set }+ + /// All elements but the first.+ var tail: Tail { get set }++ /// The number of elements+ static var count: Int { get }+}++extension Empty: TupleProtocol {+ /// The first element, when `self` is viewed as an instance of algebraic+ /// product type.+ public var head: Never { get { fatalError() } set { } }+ + /// All elements but the first, when `self` is viewed as an instance of+ /// algebraic product type.+ public var tail: Self { get { self } set { } }++ /// The number of elements, when `self` is viewed as an instance of+ /// algebraic product type.+ public static var count: Int { 0 }+}++/// An algebraic product type whose first element is of type `Head` and+/// whose remaining elements can be stored in `Tail`.+public struct Tuple<Head, Tail: TupleProtocol>: TupleProtocol {+ /// The first element.+ public var head: Head+ + /// All elements but the first.+ public var tail: Tail+ + /// The number of elements+ public static var count: Int { Tail.count + 1 }+}++extension Tuple: DefaultInitializable+ where Head: DefaultInitializable, Tail: DefaultInitializable+{+ // Initialize `self`.+ public init() {+ head = Head()+ tail = Tail()+ }+}++extension Tuple: Equatable where Head: Equatable, Tail: Equatable {}+extension Tuple: Hashable where Head: Hashable, Tail: Hashable {}+extension Tuple: Comparable where Head: Comparable, Tail: Comparable {+ public static func < (lhs: Self, rhs: Self) -> Bool {+ if lhs.head < rhs.head { return true }+ if lhs.head > rhs.head { return false }+ return lhs.tail < rhs.tail+ }+}++private let prefixLength = "Tuple(".count++extension Tuple: CustomStringConvertible {+ public var description: String {+ if Tail.self == Empty.self {+ return "Tuple(\(String(reflecting:head )))"

```
return "Tuple(\(String(reflecting: head)))"
```

comment created time in 2 days

Pull request review commentsaeta/penguin

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import XCTest+import PenguinStructures++// ************************************************************************+// A demonstration that we can write straightforward algorithms over tuples+// ************************************************************************++/// Function-like types that can be passed any argument type.+protocol GenericFunction {+ associatedtype Result+ func callAsFunction<T>(_: T) -> Result+}++/// Returns `[T].self`, given an argument of type `T`+struct ArrayType: GenericFunction {+ func callAsFunction<T>(_ x: T) -> Any.Type {+ [T].self+ }+}++struct X<T>: DefaultInitializable, Equatable {}++extension TupleProtocol {+ /// Returns `start` reduced via `reducer` into the result of calling `mapper` + /// on each element.+ func mapReduce<M: GenericFunction, R>(+ _ start: R, _ mapper: M, _ reducer: (R, M.Result)->R+ ) -> R {+ if Self.self == Empty.self { return start }+ let next = reducer(start, mapper(head))+ return tail.mapReduce(next, mapper, reducer)+ }+}++class TupleTests: XCTestCase {+ func test_mapReduce() {+ XCTAssertEqual(+ Tuple(2, 3.4, "Bar").mapReduce([], ArrayType()) { $0 + [$1] }+ .lazy.map(ObjectIdentifier.init),+ [[Int].self, [Double].self, [String].self]+ .lazy.map(ObjectIdentifier.init)+ )+ }++ func test_head() {+ XCTAssertEqual(Tuple(0).head, 0)+ XCTAssertEqual(Tuple(0.0, 1).head, 0.0)+ XCTAssertEqual(Tuple("foo", 0.0, 1).head, "foo")+ }+ + func test_tail() {+ XCTAssertEqual(Tuple(0).tail, Empty())+ XCTAssertEqual(Tuple(0.0, 1).tail, Tuple(1))+ XCTAssertEqual(Tuple("foo", 0.0, 1).tail, Tuple(0.0, 1))+ }++ func test_count() {+ typealias I = Int+ XCTAssertEqual(Tuple0.count, 0)+ XCTAssertEqual(Tuple1<I>.count, 1)+ XCTAssertEqual(Tuple2<I, I>.count, 2)+ XCTAssertEqual(Tuple3<I, I, I>.count, 3)+ }++ func test_DefaultInitializable() {+ XCTAssertEqual(Tuple0(), Empty())+ XCTAssertEqual(Tuple1<X<Int>>(), Tuple(.init()))+ XCTAssertEqual(Tuple2<X<Int>, X<String>>(), Tuple(.init(), .init()))+ }+ + func test_Equatable() {+ XCTAssertEqual(Tuple0(), Tuple0())+ + XCTAssertEqual(Tuple(1), Tuple(1))+ XCTAssertNotEqual(Tuple(1), Tuple(2))+ + XCTAssertEqual(Tuple(1, 2), Tuple(1, 2))+ XCTAssertNotEqual(Tuple(0, 2), Tuple(1, 2))+ XCTAssertNotEqual(Tuple(1, "2"), Tuple(1, "XXX"))+ + XCTAssertEqual(Tuple(1, 2.3, "4"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(1, 2.3, "5"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(1, 2.9, "4"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(0, 2.3, "4"), Tuple(1, 2.3, "4"))+ // This effectively tests Hashable too; we know the conformance synthesizer

I'll take your word on this one...

comment created time in 2 days

Pull request review commentsaeta/penguin

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++import XCTest+import PenguinStructures++// ************************************************************************+// A demonstration that we can write straightforward algorithms over tuples+// ************************************************************************++/// Function-like types that can be passed any argument type.+protocol GenericFunction {+ associatedtype Result+ func callAsFunction<T>(_: T) -> Result+}++/// Returns `[T].self`, given an argument of type `T`+struct ArrayType: GenericFunction {+ func callAsFunction<T>(_ x: T) -> Any.Type {+ [T].self+ }+}++struct X<T>: DefaultInitializable, Equatable {}++extension TupleProtocol {+ /// Returns `start` reduced via `reducer` into the result of calling `mapper` + /// on each element.+ func mapReduce<M: GenericFunction, R>(+ _ start: R, _ mapper: M, _ reducer: (R, M.Result)->R+ ) -> R {+ if Self.self == Empty.self { return start }+ let next = reducer(start, mapper(head))+ return tail.mapReduce(next, mapper, reducer)+ }+}++class TupleTests: XCTestCase {+ func test_mapReduce() {+ XCTAssertEqual(+ Tuple(2, 3.4, "Bar").mapReduce([], ArrayType()) { $0 + [$1] }+ .lazy.map(ObjectIdentifier.init),+ [[Int].self, [Double].self, [String].self]+ .lazy.map(ObjectIdentifier.init)+ )+ }++ func test_head() {+ XCTAssertEqual(Tuple(0).head, 0)+ XCTAssertEqual(Tuple(0.0, 1).head, 0.0)+ XCTAssertEqual(Tuple("foo", 0.0, 1).head, "foo")+ }+ + func test_tail() {+ XCTAssertEqual(Tuple(0).tail, Empty())+ XCTAssertEqual(Tuple(0.0, 1).tail, Tuple(1))+ XCTAssertEqual(Tuple("foo", 0.0, 1).tail, Tuple(0.0, 1))+ }++ func test_count() {+ typealias I = Int+ XCTAssertEqual(Tuple0.count, 0)+ XCTAssertEqual(Tuple1<I>.count, 1)+ XCTAssertEqual(Tuple2<I, I>.count, 2)+ XCTAssertEqual(Tuple3<I, I, I>.count, 3)+ }++ func test_DefaultInitializable() {+ XCTAssertEqual(Tuple0(), Empty())+ XCTAssertEqual(Tuple1<X<Int>>(), Tuple(.init()))+ XCTAssertEqual(Tuple2<X<Int>, X<String>>(), Tuple(.init(), .init()))+ }+ + func test_Equatable() {+ XCTAssertEqual(Tuple0(), Tuple0())+ + XCTAssertEqual(Tuple(1), Tuple(1))+ XCTAssertNotEqual(Tuple(1), Tuple(2))+ + XCTAssertEqual(Tuple(1, 2), Tuple(1, 2))+ XCTAssertNotEqual(Tuple(0, 2), Tuple(1, 2))+ XCTAssertNotEqual(Tuple(1, "2"), Tuple(1, "XXX"))+ + XCTAssertEqual(Tuple(1, 2.3, "4"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(1, 2.3, "5"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(1, 2.9, "4"), Tuple(1, 2.3, "4"))+ XCTAssertNotEqual(Tuple(0, 2.3, "4"), Tuple(1, 2.3, "4"))+ // This effectively tests Hashable too; we know the conformance synthesizer+ // is considering all the fields in order.+ }++ func test_Comparable() {+ XCTAssertLessThanOrEqual(Tuple0(), Tuple0())+ XCTAssertGreaterThanOrEqual(Tuple0(), Tuple0())+ + XCTAssertLessThan(Tuple(1), Tuple(2))+ // Consistency with equality+ XCTAssertLessThanOrEqual(Tuple(2), Tuple(2))+ XCTAssertGreaterThanOrEqual(Tuple(2), Tuple(2))++ XCTAssertLessThan(Tuple(1, 0.1), Tuple(2, 1.1))+ XCTAssertLessThan(Tuple(1, 1.1), Tuple(2, 1.1))+ XCTAssertLessThan(Tuple(2, 0.1), Tuple(2, 1.1))+ + // Consistency with equality+ XCTAssertLessThanOrEqual(Tuple(2, 1.1), Tuple(2, 1.1))+ XCTAssertGreaterThanOrEqual(Tuple(2, 1.1), Tuple(2, 1.1))++ XCTAssertLessThan(Tuple(1, 0.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(1, 0.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(1, 1.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(1, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(2, 0.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(2, 0.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertLessThan(Tuple(2, 1.1, 0 as UInt), Tuple(2, 1.1, 1 as UInt))+ + // Consistency with equality+ XCTAssertLessThanOrEqual(Tuple(2, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+ XCTAssertGreaterThanOrEqual(Tuple(2, 1.1, 1 as UInt), Tuple(2, 1.1, 1 as UInt))+ }++ func test_CustomStringConvertible() {+ XCTAssertEqual("\(Empty())", "Empty()")+ XCTAssertEqual("\(Tuple(1))", "Tuple(1)")+ XCTAssertEqual("\(Tuple(1, 2.5, "foo"))", "Tuple(1, 2.5, \"foo\")")+ }++ func test_conveniences() {+ struct X0 { }; let x0 = X0()+ struct X1 { }; let x1 = X1()+ struct X2 { }; let x2 = X2()+ struct X3 { }; let x3 = X3()+ struct X4 { }; let x4 = X4()+ struct X5 { }; let x5 = X5()+ struct X6 { }; let x6 = X6()++ XCTAssert(type(of: Tuple(x0)) == Tuple1<X0>.self)+ XCTAssert(type(of: Tuple(x0, x1)) == Tuple2<X0, X1>.self)+ XCTAssert(type(of: Tuple(x0, x1, x2)) == Tuple3<X0, X1, X2>.self)+ XCTAssert(type(of: Tuple(x0, x1, x2, x3)) == Tuple4<X0, X1, X2, X3>.self)+ XCTAssert(+ type(of: Tuple(x0, x1, x2, x3, x4)) == Tuple5<X0, X1, X2, X3, X4>.self)+ XCTAssert(+ type(of: Tuple(x0, x1, x2, x3, x4, x5))+ == Tuple6<X0, X1, X2, X3, X4, X5>.self)+ XCTAssert(+ type(of: Tuple(x0, x1, x2, x3, x4, x5, x6))+ == Tuple7<X0, X1, X2, X3, X4, X5, X6>.self)+ }+ + static var allTests = [+ ("test_mapReduce", test_mapReduce),+ ]

```
static var allTests = [
("test_mapReduce", test_mapReduce),
("test_head", test_head),
("test_tail", test_tail),
("test_count", test_count),
("test_DefaultInitializable", test_DefaultInitializable),
("test_Equatable", test_Equatable),
("test_Comparable", test_Comparable),
("test_CustomStringConvertible", test_CustomStringConvertible),
("test_conveniences", test_conveniences),
]
```

comment created time in 2 days

Pull request review commentsaeta/penguin

+//******************************************************************************+// Copyright 2019 Google LLC+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// https://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.+//++/// Generalized algebraic product types+///+/// Swift's built-in tuple types are algebraic product types, but since they are+/// not nominal and not easily decomposed, they don't lend themselves to many+/// types of useful processing. Models of `TupleProtocol` don't have those+/// problems.+public protocol TupleProtocol {+ /// The type of the first element.+ associatedtype Head++ /// An algebriac product formed by composing the types of all but the first+ /// element.+ associatedtype Tail: TupleProtocol++ /// The first element.+ var head: Head { get set }+ + /// All elements but the first.+ var tail: Tail { get set }++ /// The number of elements+ static var count: Int { get }+}++extension Empty: TupleProtocol {+ /// The first element, when `self` is viewed as an instance of algebraic+ /// product type.+ public var head: Never { get { fatalError() } set { } }+ + /// All elements but the first, when `self` is viewed as an instance of+ /// algebraic product type.+ public var tail: Self { get { self } set { } }++ /// The number of elements, when `self` is viewed as an instance of+ /// algebraic product type.+ public static var count: Int { 0 }+}++/// An algebraic product type whose first element is of type `Head` and+/// whose remaining elements can be stored in `Tail`.+public struct Tuple<Head, Tail: TupleProtocol>: TupleProtocol {+ /// The first element.+ public var head: Head+ + /// All elements but the first.+ public var tail: Tail+ + /// The number of elements+ public static var count: Int { Tail.count + 1 }+}++extension Tuple: DefaultInitializable+ where Head: DefaultInitializable, Tail: DefaultInitializable+{+ // Initialize `self`.+ public init() {+ head = Head()+ tail = Tail()+ }+}++extension Tuple: Equatable where Head: Equatable, Tail: Equatable {}+extension Tuple: Hashable where Head: Hashable, Tail: Hashable {}+extension Tuple: Comparable where Head: Comparable, Tail: Comparable {+ public static func < (lhs: Self, rhs: Self) -> Bool {+ if lhs.head < rhs.head { return true }+ if lhs.head > rhs.head { return false }+ return lhs.tail < rhs.tail+ }+}++private let prefixLength = "Tuple(".count++extension Tuple: CustomStringConvertible {+ public var description: String {+ if Tail.self == Empty.self {+ return "Tuple(\(String(reflecting:head )))"+ }+ else {+ return "Tuple(\(String(reflecting: head)), "+ + String(reflecting: tail).dropFirst(prefixLength)+ }+ }+}++// ======== Conveniences ============++public typealias Tuple0 = Empty

Might we want to add doc comments to these in some form?

comment created time in 2 days

push eventsaeta/penguin

commit sha 64d233521e0334c88a59e2aa27c4de1658a1a5a4

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… (#32) * ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/confusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. (Note: `NonBlockingThreadPool`'s `parallelFor` is re-implemented in terms of `join` again, but better.) 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. Instead, methods are explicitly overloaded with throwing and non-throwing variants. 5. Adds a vectorized API (which improves performance). Performance measurements: After: name time std iterations -------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457 NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115 NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849 NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763 NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581 NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390 Before: name time std iterations --------------------------------------------------------------------------------------------------------------------- NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554 NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425 NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157 NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255 NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302 NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313 Co-authored-by: Dave Abrahams <dabrahams@google.com>

push time in 2 days

PR merged saeta/penguin

…nfusing extensions and implementations.

This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt:

- Removes the Naive thread pool implementation from
`PJoin`

. - Removes the unnecessary
`TypedComputeThreadPool`

protocol refinement. - Removes the badly implemented extensions that implemented
`parallelFor`

in terms of`join`

. - Removes use of
`rethrows`

, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. - Adds a vectorized API (which improves performance).

Performance measurements:

After:

```
name time std iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457
NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115
NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849
NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763
NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581
NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390
```

Before:

```
name time std iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554
NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425
NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157
NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255
NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302
NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313
```

pr closed time in 2 days

issue openedsaeta/penguin

Review Compute Thread Pool API

Context: https://github.com/saeta/penguin/pull/32 (Deferred until after performance issues from https://github.com/saeta/penguin/issues/33 are resolved.)

CC @dabrahams

created time in 3 days

push eventsaeta/penguin

commit sha ac6afb3010aedb7fadeb934a074754abb90e827c

Fix up tests.

push time in 3 days

push eventsaeta/penguin

commit sha ad6eb9b5af391635c924439d9ead18ccaca24735

Remove stale comment. (#29) The functionality referred to in the comment has been implemented in the `PenguinParallelWithFoundation` package, which leverages the `Foundation` dependency to query the number of available processors.

commit sha 9dd365d5e4a37efce149dfad42f339dfaf4d81ed

ComputeThreadPool cleanup (4/n): move NonBlockingSpinningState to its own file. (#34) This change moves `NonBlockingSpinningState` to its own file to reduce the size of the `NonBlockingThreadPool.swift` file.

commit sha 4c41ec8912d9e0691a4b7a0943fc6c8cb13de209

Factor out number operations from NonBlockingThreadPool.swift (#30) The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

commit sha 43f726bc5870426007197a3692190b32a10b9d1c

Merge branch 'master' into threadpool-cleanup

commit sha 590ebd0eee76430ca3d6caf85dd022b5f8d5dba7

Improve the code based on feedback.

commit sha a91c527a81eb8bfa094a1d3686296c962346bb3a

Merge branch 'threadpool-cleanup' of github.com:saeta/penguin into threadpool-cleanup

push time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public struct InlineComputeThreadPool: TypedComputeThreadPool { /// Dispatch `fn` to be run at some point in the future (immediately). /// /// Note: this implementation just executes `fn` immediately.- public func dispatch(_ fn: (Self) -> Void) {- fn(self)+ public func dispatch(_ fn: () -> Void) {

I didn't think you needed to add a completion handler at this level of abstraction. (Concretely, if you want to run something after `fn`

completes, just wrap `fn`

in a closure yourself, and put your code right there!) You can build up whatever you want on top, with whatever arbitrary synchronization primitives you'd like. (Concretely, I'm intentionally avoiding binding to any particular lock implementation.)

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public protocol ComputeThreadPool { } extension ComputeThreadPool {- /// A default implementation of the non-throwing variation in terms of the throwing one.- public func join(_ a: () -> Void, _ b: () -> Void) {- withoutActuallyEscaping(a) { a in- let throwing: () throws -> Void = a- try! join(throwing, b)- }- }-} -/// Holds a parallel for function; this is used to avoid extra refcount overheads on the function-/// itself.-fileprivate struct ParallelForFunctionHolder {- var fn: ComputeThreadPool.ParallelForFunc-}--/// Uses `ComputeThreadPool.join` to execute `fn` in parallel.-fileprivate func runParallelFor<C: ComputeThreadPool>(- pool: C,- start: Int,- end: Int,- total: Int,- fn: UnsafePointer<ParallelForFunctionHolder>-) throws {- if start + 1 == end {- try fn.pointee.fn(start, total)- } else {- assert(end > start)- let distance = end - start- let midpoint = start + (distance / 2)- try pool.join(- { try runParallelFor(pool: pool, start: start, end: midpoint, total: total, fn: fn) },- { try runParallelFor(pool: pool, start: midpoint, end: end, total: total, fn: fn) })- }-}--extension ComputeThreadPool {- public func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows {- try withoutActuallyEscaping(fn) { fn in- var holder = ParallelForFunctionHolder(fn: fn)- try withUnsafePointer(to: &holder) { holder in- try runParallelFor(pool: self, start: 0, end: n, total: n, fn: holder)+ /// Convert a non-vectorized operation to a vectorized operation.

Hmmm, I thought that comments on extension methods that are implementations of methods on the protocols themselves don't show up in typical doc-generation, I tried to write something different & more specific here. I can certainly just copy-pasta the doc comment from the protocol method itself if you think that's more appropriate... :-)

That said, I've attempted to refine this a bit (in the same direction, however).

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public struct InlineComputeThreadPool: TypedComputeThreadPool { /// Dispatch `fn` to be run at some point in the future (immediately). /// /// Note: this implementation just executes `fn` immediately.- public func dispatch(_ fn: (Self) -> Void) {- fn(self)+ public func dispatch(_ fn: () -> Void) {+ fn()+ }++ /// Executes `a` and `b` optionally in parallel, and returns when both are complete.+ ///+ /// Note: this implementation simply executes them serially.+ public func join(_ a: () -> Void, _ b: () -> Void) {

For context: I picked `join`

as the typical term-of-art in this space. I'm not fully sold on `concurrently`

yet, because `join`

represents *optional* concurrency, which is important for performance at scale.

I think that it would be good to go over this API and think hard about naming & how the abstractions compose, but only once we understand the performance limitations & constraints. (Concretely, some of the (internal) abstractions are being re-written due to performance limitations in the current structure of things.)

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public protocol ComputeThreadPool { /// This is the throwing overload func join(_ a: () throws -> Void, _ b: () throws -> Void) throws + /// A function that can be executed in parallel.+ ///+ /// The first argument is the index of the invocation, and the second argument is the total number+ /// of invocations.+ typealias ParallelForFunction = (Int, Int) -> Void+ /// A function that can be executed in parallel. /// /// The first argument is the index of the copy, and the second argument is the total number of /// copies being executed.- typealias ParallelForFunc = (Int, Int) throws -> Void+ typealias ThrowingParallelForFunction = (Int, Int) throws -> Void++ /// A vectorized function that can be executed in parallel.+ ///+ /// The first argument is the start index for the vectorized operation, and the second argument+ /// corresponds to the end of the range. The third argument contains the total size of the range.+ typealias VectorizedParallelForFunction = (Int, Int, Int) -> Void++ /// A vectorized function that can be executed in parallel.+ ///+ /// The first argument is the start index for the vectorized operation, and the second argument+ /// corresponds to the end of the range. The third argument contains the total size of the range.+ typealias ThrowingVectorizedParallelForFunction = (Int, Int, Int) throws -> Void /// Returns after executing `fn` `n` times. /// /// - Parameter n: The total times to execute `fn`.- func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows+ func parallelFor(n: Int, _ fn: ParallelForFunction)++ /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+ /// called with parameters that perfectly cover of the range `0..<n`.+ ///+ /// - Parameter n: The range of numbers `0..<n` to cover.+ func parallelFor(n: Int, _ fn: VectorizedParallelForFunction)++ /// Returns after executing `fn` `n` times.+ ///+ /// - Parameter n: The total times to execute `fn`.+ func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws++ /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+ /// called with parameters that perfectly cover of the range `0..<n`.+ ///+ /// - Parameter n: The range of numbers `0..<n` to cover.+ func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws+ // TODO: Add this & a default implementation! // /// Returns after executing `fn` `n` times. // /// // /// - Parameter n: The total times to execute `fn`. // /// - Parameter blocksPerThread: The minimum block size to subdivide. If unspecified, a good // /// value will be chosen based on the amount of available parallelism.- // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunc)- // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunc)+ // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunction)+ // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunction) /// The maximum amount of parallelism possible within this thread pool. var parallelism: Int { get }

lol had a similar thought after pondering the doc comment a bit further. 👍

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public protocol ComputeThreadPool { } extension ComputeThreadPool {- /// A default implementation of the non-throwing variation in terms of the throwing one.- public func join(_ a: () -> Void, _ b: () -> Void) {- withoutActuallyEscaping(a) { a in- let throwing: () throws -> Void = a- try! join(throwing, b)- }- }-} -/// Holds a parallel for function; this is used to avoid extra refcount overheads on the function-/// itself.-fileprivate struct ParallelForFunctionHolder {- var fn: ComputeThreadPool.ParallelForFunc-}--/// Uses `ComputeThreadPool.join` to execute `fn` in parallel.-fileprivate func runParallelFor<C: ComputeThreadPool>(- pool: C,- start: Int,- end: Int,- total: Int,- fn: UnsafePointer<ParallelForFunctionHolder>-) throws {- if start + 1 == end {- try fn.pointee.fn(start, total)- } else {- assert(end > start)- let distance = end - start- let midpoint = start + (distance / 2)- try pool.join(- { try runParallelFor(pool: pool, start: start, end: midpoint, total: total, fn: fn) },- { try runParallelFor(pool: pool, start: midpoint, end: end, total: total, fn: fn) })- }-}--extension ComputeThreadPool {- public func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows {- try withoutActuallyEscaping(fn) { fn in- var holder = ParallelForFunctionHolder(fn: fn)- try withUnsafePointer(to: &holder) { holder in- try runParallelFor(pool: self, start: 0, end: n, total: n, fn: holder)+ /// Convert a non-vectorized operation to a vectorized operation.+ public func parallelFor(n: Int, _ fn: ParallelForFunction) {+ parallelFor(n: n) { start, end, total in+ for i in start..<end {+ fn(i, total) } } }-} -/// Typed compute threadpools support additional sophisticated operations.-public protocol TypedComputeThreadPool: ComputeThreadPool {- /// Submit a task to be executed on the threadpool.- ///- /// `pRun` will execute task in parallel on the threadpool and it will complete at a future time.- /// `pRun` returns immediately.- func dispatch(_ task: (Self) -> Void)-- /// Run two tasks (optionally) in parallel.- ///- /// Fork-join parallelism allows for efficient work-stealing parallelism. The two non-escaping- /// functions will have finished executing before `pJoin` returns. The first function will execute on- /// the local thread immediately, and the second function will execute on another thread if resources- /// are available, or on the local thread if there are not available other resources.- func join(_ a: (Self) -> Void, _ b: (Self) -> Void)-- /// Run two throwing tasks (optionally) in parallel; if one task throws, it is unspecified- /// whether the second task is even started.- ///- /// This is the throwing overloaded variation.- func join(_ a: (Self) throws -> Void, _ b: (Self) throws -> Void) throws-}--extension TypedComputeThreadPool {- /// Implement the non-throwing variation in terms of the throwing one.- public func join(_ a: (Self) -> Void, _ b: (Self) -> Void) {- withoutActuallyEscaping(a) { a in- let throwing: (Self) throws -> Void = a- // Implement the non-throwing in terms of the throwing implementation.- try! join(throwing, b)+ /// Convert a non-vectorized operation to a vectorized operation.+ public func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws {+ try parallelFor(n: n) { start, end, total in+ for i in start..<end {+ try fn(i, total)+ } } } } -extension TypedComputeThreadPool {- public func dispatch(_ fn: @escaping () -> Void) {- dispatch { _ in fn() }- }-- public func join(_ a: () -> Void, _ b: () -> Void) {- join({ _ in a() }, { _ in b() })- }-- public func join(_ a: () throws -> Void, _ b: () throws -> Void) throws {- try join({ _ in try a() }, { _ in try b() })- }-}- /// A `ComputeThreadPool` that executes everything immediately on the current thread. /// /// This threadpool implementation is useful for testing correctness, as well as avoiding context /// switches when a computation is designed to be parallelized at a coarser level.-public struct InlineComputeThreadPool: TypedComputeThreadPool {+public struct InlineComputeThreadPool: ComputeThreadPool {

In terms of thread-pools, there can be a number of different designs with different properties. In the same way that you can implement a random access collection in terms of a collection (just really inefficiently), I wanted to clearly distinguish what properties the thread-pool has. Concretely, there are I/O-focused thread-pools, where you can blocking and/or non-blocking I/O. This thread pool abstraction is focused on compute-bound tasks, and is tuned / structured with APIs focused on that domain. Does that make sense?

Happy to ponder the names further... related work also uses `ConcurrentWorkQueue`

.

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public protocol ComputeThreadPool { /// This is the throwing overload func join(_ a: () throws -> Void, _ b: () throws -> Void) throws + /// A function that can be executed in parallel.+ ///+ /// The first argument is the index of the invocation, and the second argument is the total number+ /// of invocations.+ typealias ParallelForFunction = (Int, Int) -> Void+ /// A function that can be executed in parallel. /// /// The first argument is the index of the copy, and the second argument is the total number of /// copies being executed.- typealias ParallelForFunc = (Int, Int) throws -> Void+ typealias ThrowingParallelForFunction = (Int, Int) throws -> Void++ /// A vectorized function that can be executed in parallel.+ ///+ /// The first argument is the start index for the vectorized operation, and the second argument+ /// corresponds to the end of the range. The third argument contains the total size of the range.+ typealias VectorizedParallelForFunction = (Int, Int, Int) -> Void++ /// A vectorized function that can be executed in parallel.+ ///+ /// The first argument is the start index for the vectorized operation, and the second argument+ /// corresponds to the end of the range. The third argument contains the total size of the range.+ typealias ThrowingVectorizedParallelForFunction = (Int, Int, Int) throws -> Void /// Returns after executing `fn` `n` times. /// /// - Parameter n: The total times to execute `fn`.- func parallelFor(n: Int, _ fn: ParallelForFunc) rethrows+ func parallelFor(n: Int, _ fn: ParallelForFunction)++ /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+ /// called with parameters that perfectly cover of the range `0..<n`.+ ///+ /// - Parameter n: The range of numbers `0..<n` to cover.+ func parallelFor(n: Int, _ fn: VectorizedParallelForFunction)++ /// Returns after executing `fn` `n` times.+ ///+ /// - Parameter n: The total times to execute `fn`.+ func parallelFor(n: Int, _ fn: ThrowingParallelForFunction) throws++ /// Returns after executing `fn` an unspecified number of times, guaranteeing that `fn` has been+ /// called with parameters that perfectly cover of the range `0..<n`.+ ///+ /// - Parameter n: The range of numbers `0..<n` to cover.+ func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws+ // TODO: Add this & a default implementation! // /// Returns after executing `fn` `n` times. // /// // /// - Parameter n: The total times to execute `fn`. // /// - Parameter blocksPerThread: The minimum block size to subdivide. If unspecified, a good // /// value will be chosen based on the amount of available parallelism.- // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunc)- // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunc)+ // func parallelFor(blockingUpTo n: Int, blocksPerThread: Int, _ fn: ParallelForFunction)+ // func parallelFor(blockingUpTo n: Int, _ fn: ParallelForFunction) /// The maximum amount of parallelism possible within this thread pool.

Took a quick pass, although this can probably be refined further.

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr if let e = err { throw e } } + public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {+ let grainSize = n / parallelism // TODO: Make adaptive!++ func executeParallelFor(_ start: Int, _ end: Int) {+ if start + grainSize >= end {+ fn(start, end, n)+ } else {+ // Divide into 2 & recurse.+ let rangeSize = end - start+ let midPoint = start + (rangeSize / 2)+ self.join({ executeParallelFor(start, midPoint) }, { executeParallelFor(midPoint, end)})+ }+ }++ executeParallelFor(0, n)+ }++ public func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws {+ let grainSize = n / parallelism // TODO: Make adaptive!++ func executeParallelFor(_ start: Int, _ end: Int) throws {+ if start + grainSize >= end {+ try fn(start, end, n)+ } else {+ // Divide into 2 & recurse.+ let rangeSize = end - start+ let midPoint = start + (rangeSize / 2)+ try self.join({ try executeParallelFor(start, midPoint) }, { try executeParallelFor(midPoint, end) })

Ah, good point. That description is getting ahead of the actual implementation in this patch set. I'll update the description in the PR shortly.

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr if let e = err { throw e } } + public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {

+1 to doc comment. PTAL?

I believe that this should most often be accessed as an operation on a random access (or likely some form of "splittable") collection. But in any case, that will have to be generic over the thread pool itself, so we don't get away from having this method and coming up with a name for it.

Note: I started going in this direction a while back but I think that direction needs a "reboot". For now, I'd like to focus on getting this low-level API implemented correctly and efficiently, and we can then refactor and/or stack on the further abstractions.

FWIW: I started out by having `VectorizedParallelForFunction`

take a range instead of 2 integers representing the start and end, but that makes type inference not work as well (as code requires annotations because the alternative API induces an ambiguity between the non-vectorized and vectorized APIs).

comment created time in 3 days

Pull request review commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr if let e = err { throw e } } + public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {+ let grainSize = n / parallelism // TODO: Make adaptive!++ func executeParallelFor(_ start: Int, _ end: Int) {+ if start + grainSize >= end {+ fn(start, end, n)+ } else {+ // Divide into 2 & recurse.+ let rangeSize = end - start+ let midPoint = start + (rangeSize / 2)+ self.join({ executeParallelFor(start, midPoint) }, { executeParallelFor(midPoint, end)})+ }+ }++ executeParallelFor(0, n)+ }++ public func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws {

+1; done. (Although I suspect that this comment could be improved...)

comment created time in 3 days

push eventsaeta/penguin

commit sha 35e0208385c44945174a25be1cf6c92dfe45fd54

Update Sources/PenguinParallel/ThreadPool.swift Co-authored-by: Dave Abrahams <dabrahams@google.com>

push time in 3 days

issue openedsaeta/penguin

Polish fast RNG & make available publicly

https://github.com/saeta/penguin/pull/30 pulled the `PCGRandomNumberGenerator`

out from `NonBlockingThreadPool.swift`

, but it (and `fastFit`

) should really move to `PenguinStructures`

(or some similar library) and be made publicly available (and generic where appropriate).

Note: https://github.com/saeta/penguin/pull/36 pulls out the other numerical bits and cleans them up.

created time in 3 days

pull request commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

This relates to umbrella issue: https://github.com/saeta/penguin/issues/33

comment created time in 3 days

pull request commentsaeta/penguin

Refactor co-primes algorithm to be generic & avoid allocation.

This is a follow-up to https://github.com/saeta/penguin/pull/30

comment created time in 3 days

PR opened saeta/penguin

Thank you @dabrahams for the suggestions!

pr created time in 3 days

push eventsaeta/penguin

commit sha 4c41ec8912d9e0691a4b7a0943fc6c8cb13de209

Factor out number operations from NonBlockingThreadPool.swift (#30) The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

push time in 3 days

PR merged saeta/penguin

The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

pr closed time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+/// https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+ let l = UInt32(lhs)+ let r = UInt32(size)+ return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential+/// generators](https://en.wikipedia.org/wiki/Permuted_congruential_generator)

I've definitely looked over that site a fair bit. That said, you're definitely right that this link should go to the proper source and not wikipedia. FWIW, I didn't port an implementation from Wikipedia, but I did skimp on some aspects of the implementation that could be improved. I consider this good-enough for now, as I don't need a high-quality source of random number generation, but when we factor this out into a more reusable spot, we should definitely ensure we have a good implementation.

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+/// https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+ let l = UInt32(lhs)+ let r = UInt32(size)+ return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential+/// generators](https://en.wikipedia.org/wiki/Permuted_congruential_generator)+internal struct PCGRandomNumberGenerator {

For some reason, I thought I could only conform if I returned `UInt64`

's. But as it turns out, apparently I don't need do... Thanks for the pointer!

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+/// https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {

Definitely; will follow-up to pull this out. One question to ponder: what should the name of this be, and what operator should we define for it (if any)? ;-)

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.+///+/// This is a faster variation than computing `x % size`. For additional context, please see:+/// https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction+internal func fastFit(_ lhs: Int, into size: Int) -> Int {+ let l = UInt32(lhs)+ let r = UInt32(size)+ return Int(l.multipliedFullWidth(by: r).high)+}++/// Fast random number generator using [permuted congruential

Good point!

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.

Excellent. Thank you!

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {+ var coprimes = [Int]()+ for i in 1...n {+ var a = i+ var b = n+ // If GCD(a, b) == 1, then a and b are coprimes.+ while b != 0 {+ let tmp = a+ a = b+ b = tmp % b+ }+ if a == 1 { coprimes.append(i) }+ }+ return coprimes+}++/// Reduce `lhs` into `[0, size)`.

Yeah, that's definitely much nicer.

comment created time in 3 days

Pull request review commentsaeta/penguin

Factor out number operations from NonBlockingThreadPool.swift

+// Copyright 2020 Penguin Authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++/// Returns an array of all positive integers that are co-prime with `n`.+///+/// Two numbers are co-prime if their GCD is 1.+internal func makeCoprimes(upTo n: Int) -> [Int] {

Yes, I will do this in a follow-up PR. Great suggestion! (I will likely move this over to `PenguinStructures`

as that's probably the best spot to put this for now (unless you have a better suggestion)?

comment created time in 3 days

push eventsaeta/penguin

commit sha 52d266126e30b0e6582c5346b4148c07fffcb5b6

Add ArrayBuffer basics. (#25)

commit sha 45223e965b7ff965f5d1be8d66e344af23b8126f

Add FixedSizeArray. (#24) * Add FixedSizeArray. * Use new accessors * Guaranteed O(1) complexity for subscript. Also some doc comment updates.

commit sha 16fa3976ebd1c38ef8ffa1f9efad1457e534a846

Improve doc comments.

commit sha e2da4d6bfecc921cfd0c60f730e784b09c10add8

Fix indentation

commit sha 81d80b876449e29a33d174bb257a58ec381ac0f9

Value semantics verification for withUnsafeMutableBufferPointer. (#26)

commit sha ad6eb9b5af391635c924439d9ead18ccaca24735

Remove stale comment. (#29) The functionality referred to in the comment has been implemented in the `PenguinParallelWithFoundation` package, which leverages the `Foundation` dependency to query the number of available processors.

commit sha 9dd365d5e4a37efce149dfad42f339dfaf4d81ed

ComputeThreadPool cleanup (4/n): move NonBlockingSpinningState to its own file. (#34) This change moves `NonBlockingSpinningState` to its own file to reduce the size of the `NonBlockingThreadPool.swift` file.

commit sha b1fa87ce7e2c3f94bd137d099f0b8f46cf086cf1

Merge remote-tracking branch 'origin/master' into factor-out-numbers

commit sha 216e4bdaf22e7a30ebc0ab22c0002fbd7b65414b

Improve the code based on reviewer feedback.

push time in 3 days

pull request commentsaeta/penguin

Rename `WorkItem` to `JoinWorkItem`.

https://github.com/saeta/penguin/issues/33

comment created time in 5 days

pull request commentsaeta/penguin

ComputeThreadPool cleanup (4/n): NonBlockingSpinningState to new file.

https://github.com/saeta/penguin/issues/33

comment created time in 5 days

PR opened saeta/penguin

This is in preparation for changing the TaskQueues to not hold closures. The key insight is that closures, being reference types, induce a lot of ARC traffic. In order to achieve good performance, we need to move away from that, where possible.

pr created time in 5 days

PR opened saeta/penguin

This change moves `NonBlockingSpinningState`

to its own file to reduce the size of the
`NonBlockingThreadPool.swift`

file.

pr created time in 5 days

issue commentsaeta/penguin

Improve the `ComputeThreadPool` abstraction

Experimental PR/branch with some of the key ideas: https://github.com/saeta/penguin/pull/11

comment created time in 5 days

issue commentsaeta/penguin

Improve the `ComputeThreadPool` abstraction

https://github.com/saeta/penguin/pull/29, https://github.com/saeta/penguin/pull/30, https://github.com/saeta/penguin/pull/32

comment created time in 5 days

issue openedsaeta/penguin

Improve the `ComputeThreadPool` abstraction

Creating a tracking issue to group related PRs.

created time in 5 days

pull request commentsaeta/penguin

ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co…

Note: I'm trying to break up a very large refactoring I've been working on in #11 (and related branches) into more easily reviewable pieces. Happy to explain how they all fit together out-of-band as appropriate.

comment created time in 5 days

PR opened saeta/penguin

…nfusing extensions and implementations.

This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt:

- Removes the Naive thread pool implementation from
`PJoin`

. - Removes the unnecessary
`TypedComputeThreadPool`

protocol refinement. - Removes the badly implemented extensions that implemented
`parallelFor`

in terms of`join`

. - Removes use of
`rethrows`

, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case. - Adds a vectorized API (which improves performance).

Performance measurements:

After:

```
name time std iterations
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 700.0 ns ± 70289.84998218225 127457
NonBlockingThreadPool: join, two levels 2107.0 ns ± 131041.5070696377 31115
NonBlockingThreadPool: join, three levels 4960.0 ns ± 178122.9562964306 15849
NonBlockingThreadPool: join, four levels, three on thread pool thread 5893.0 ns ± 224021.47900401088 13763
NonBlockingThreadPool: parallel for, one level 22420.0 ns ± 203689.69689780468 7581
NonBlockingThreadPool: parallel for, two levels 500985.5 ns ± 642136.0139757036 1390
```

Before:

```
name time std iterations
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level 728.0 ns ± 78662.43173968921 115554
NonBlockingThreadPool: join, two levels 2149.0 ns ± 144611.11773139169 30425
NonBlockingThreadPool: join, three levels 5049.0 ns ± 188450.6773907647 15157
NonBlockingThreadPool: join, four levels, three on thread pool thread 5951.0 ns ± 229270.51587738466 10255
NonBlockingThreadPool: parallel for, one level 4919427.5 ns ± 887590.5386061076 302
NonBlockingThreadPool: parallel for, two levels 4327151.0 ns ± 855302.611386676 313
```

pr created time in 5 days

PR closed saeta/penguin

This was an attempt to refactor `ComputeThreadPool`

into a class. It ended up slowing things down by 2x, so this should not be merged (as-is), but making this PR for posterity.

pr closed time in 5 days

PR opened saeta/penguin

This was an attempt to refactor `ComputeThreadPool`

into a class. It ended up slowing things down by 2x, so this should not be merged (as-is), but making this PR for posterity.

pr created time in 5 days

PR opened saeta/penguin

The file NonBlockingThreadPool.swift is a little long, and has grown a few utility functions within it that aren't tightly coupled to the implementation. This change factors them out into an adjacent file to simplify maintenance of the NonBlockingThreadPool.

pr created time in 5 days