profile
viewpoint

czarlos/OpenBitCL 0

BitCoin App

issue commentgoogle/jax

Fused gather-scatter-add implementation

aren't input_indices already O(N)?

irhum

comment created time in a month

issue commentgoogle/jax

Fused gather-scatter-add implementation

If N and M are a similar size and are less are less than 8k a matmul is very likely the fastest solution.

irhum

comment created time in a month

issue commenttensorflow/tpu

Support for tf.where without x and y arguments

dynamic sized tf.where with a single argument should work now

aidangomez

comment created time in 4 months

CommitCommentEvent

pull request commenttensorflow/tensorflow

[XLA] Add DiagSlice HLO instruction

Do you have an example where the 1024x1024 kernel launch was not already the result of another operations where the select and reduce could not be fused into the previous operation?

Either way this should be split into two parts.

  1. Add the Hlo and make an evaluator implementation and move the expander into an Hlo pass.
  2. Add the implementation for the CPU/GPU backends.
xinan-jiang

comment created time in 5 months

more