profile
viewpoint

KhronosGroup/MoltenVK 2531

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on iOS and macOS.

amethyst/specs 1673

Specs - Parallel ECS

gfx-rs/wgpu 1220

Native WebGPU implementation based on gfx-hal

brendanzab/gl-rs 495

An OpenGL function pointer loader for Rust

gfx-rs/naga 156

Universal shader translation in Rust

kvark/claymore 49

Just another tactical RPG in dark fantasy setting

jrmuizel/glsl-to-spirv 19

A glsl to spirv compiler

brendanzab/sax-rs 13

DEPRECATED - use https://github.com/netvl/rust-xml/ instead.

kvark/binary-space-partition 9

Abstract BSP tree in Rust

PR opened gfx-rs/wgpu

Reviewers
Implicit layout

Connections Closes #868

Description The implementation can be split into 3 parts:

  1. reflecting the shader for binding expectations, and building a bind entry map from it, merging them between stages. This is only done for shaders that can be reflected, and we error on the rest, for now.
  2. based on this info, create new bind group layouts and pipeline layouts. The tricky part here is that we can't generate the ID out of thin air, so we have to pass them into the create_xx_pipeline function, which now also returns the number of IDs it consumed, allowing the client to free the rest.
  3. API changes in the descriptors, new methods to obtain the bind group layouts from a pipeline

Testing This isn't tested, but I think it's fine: it doesn't affect the old path, and we'll be testing the new path while improving Naga and our reflection anyway.

+644 -267

0 comment

6 changed files

pr created time in 35 minutes

create barnchkvark/wgpu

branch : implicit-layout

created branch time in an hour

Pull request review commentgpuweb/gpuweb

Add some restrictions about resource usage tracking/validation

 Issue(gpuweb/gpuweb#296): Consider merging all read-only usages. Textures may consist of separate [=mipmap levels=] and [=array layers=], which can be used differently at any given time. Each such <dfn dfn>subresource</dfn> is uniquely identified by a-[=texture=], [=mipmap level=], and-(for {{GPUTextureDimension/2d}} textures only) [=array layer=].+[=texture=], [=mipmap level=],+(for {{GPUTextureDimension/2d}} textures only) [=array layer=],+and [=aspect=].

Yes, I agree. Looking forward, specifying them to be tracked separately is better.

Richard-Yunchao

comment created time in 5 hours

Pull request review commentgpuweb/gpuweb

Add some restrictions about resource usage tracking/validation

 Issue(gpuweb/gpuweb#296): Consider merging all read-only usages. Textures may consist of separate [=mipmap levels=] and [=array layers=], which can be used differently at any given time. Each such <dfn dfn>subresource</dfn> is uniquely identified by a-[=texture=], [=mipmap level=], and-(for {{GPUTextureDimension/2d}} textures only) [=array layer=].+[=texture=], [=mipmap level=],+(for {{GPUTextureDimension/2d}} textures only) [=array layer=],+and [=aspect=]. -The **main usage rule** is that any [=subresource=]-at any given time can only be in either:+<dfn dfn>atom resource</dfn> means a whole buffer or a single subresource of a texture.++The **main usage rule** is that any [=atom resource=]+at any given time within [=usage scope=] can only be in either:   - a combination of [=read-only usage=]s   - a single [=mutating usage=] +The only exception is that a combination of writeonly-storage-texture usages, or a combination

Hmm, I was thinking about that differently. So supposing your suggestion is (1), here are the other 2:

(2) just specify beginRenderPass in a way that there has to be no collisions between subresources used for different attachments

(3) for each usage, we can have a property of whether it's "exclusive" or not. The only "exclusive" usage is OUTPUT_ATTACHMENT, I think? Exclusiveness means that multiple uses of the same resource with this usage can't be combined together.

Richard-Yunchao

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 pub type Bytes = u8;  /// Number of components in a vector. #[repr(u8)]-#[derive(Clone, Copy, Debug, PartialEq)]+#[derive(Clone, Copy, Debug, PartialEq, Hash, Eq)]

yes, sure :)

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {                             crate::TypeInner::Vector { size, kind, width } => {                                 vector_id = Some(left_id);                                 for (k, v) in self.lookup_type.iter() {

why are we not just issuing a lookup into the map? we shouldn't need to iterate the arenas anywhere more than once, really

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {                             crate::TypeInner::Vector { size, kind, width } => {                                 vector_id = Some(right_id);                                 for (k, v) in self.lookup_type.iter() {

same here

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {          let instruction; -        match ty.inner {+        match ty.inner.clone() {

We shouldn't need cloning here. What was the problem that made you clone?

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {     /// Primitive Instructions     /// -    fn parse_type_declaration(+    fn write_scalar(&self, id: Word, kind: crate::ScalarKind, width: u8) -> Instruction {+        match kind {+            crate::ScalarKind::Sint => {+                self.instruction_type_int(id, (width * BITS_PER_BYTE) as u32, Signedness::Signed)

nit: let's move out this as let bits = (width * ...) as u32;

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {     /// Primitive Instructions     /// -    fn parse_type_declaration(+    fn write_scalar(&self, id: Word, kind: crate::ScalarKind, width: u8) -> Instruction {+        match kind {+            crate::ScalarKind::Sint => {+                self.instruction_type_int(id, (width * BITS_PER_BYTE) as u32, Signedness::Signed)+            }+            crate::ScalarKind::Uint => {+                self.instruction_type_int(id, (width * BITS_PER_BYTE) as u32, Signedness::Unsigned)+            }+            crate::ScalarKind::Float => {+                self.instruction_type_float(id, (width * BITS_PER_BYTE) as u32)+            }+            crate::ScalarKind::Bool => self.instruction_type_bool(id),+        }+    }++    fn get_or_create_scalar(+        &mut self,+        arena: &crate::Arena<crate::Type>,+        kind: crate::ScalarKind,+        width: u8,+    ) -> Word {+        if let Some(handle) = self.find_scalar_handle(arena, kind, width) {+            self.get_type_id(arena, handle)+        } else {+            let (instruction, id) =+                self.write_type_declaration_local(arena, LocalType::Scalar { kind, width });+            self.lookup_type+                .insert(LookupType::Local(LocalType::Scalar { kind, width }), id);+            instruction.to_words(&mut self.logical_layout.declarations);+            id+        }+    }++    fn get_or_create_vector(+        &mut self,+        arena: &crate::Arena<crate::Type>,+        size: crate::VectorSize,+        kind: crate::ScalarKind,+        width: u8,+    ) -> Word {+        if let Some(handle) = self.find_vector_handle(arena, size, kind, width) {+            self.get_type_id(arena, handle)+        } else {+            let (instruction, id) = self.write_type_declaration_local(+                arena,+                LocalType::Vector {+                    size,+                    kind,+                    width,+                },+            );+            self.lookup_type.insert(+                LookupType::Local(LocalType::Vector {+                    size,+                    kind,+                    width,+                }),+                id,+            );+            instruction.to_words(&mut self.logical_layout.declarations);+            id+        }+    }++    fn parse_to_spirv_storage_class(&self, class: crate::StorageClass) -> spirv::StorageClass {+        match class {+            crate::StorageClass::Constant => spirv::StorageClass::UniformConstant,+            crate::StorageClass::Function => spirv::StorageClass::Function,+            crate::StorageClass::Input => spirv::StorageClass::Input,+            crate::StorageClass::Output => spirv::StorageClass::Output,+            crate::StorageClass::Private => spirv::StorageClass::Private,+            crate::StorageClass::StorageBuffer => spirv::StorageClass::StorageBuffer,+            crate::StorageClass::Uniform => spirv::StorageClass::Uniform,+            crate::StorageClass::WorkGroup => spirv::StorageClass::Workgroup,+        }+    }++    fn write_type_declaration_local(+        &mut self,+        arena: &crate::Arena<crate::Type>,+        local_ty: LocalType,+    ) -> (Instruction, Word) {+        let id = self.generate_id();+        match local_ty {+            LocalType::Scalar { kind, width } => (self.write_scalar(id, kind, width), id),+            LocalType::Vector { size, kind, width } => (+                {+                    let scalar_id = self.get_or_create_scalar(arena, kind, width);+                    self.instruction_type_vector(id, scalar_id, size)+                },+                id,+            ),+            _ => unimplemented!(),

this would be just the Pointer, right? Let's list it explicitly instead of catch-all?

Napokue

comment created time in 5 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {                 _ => continue,             }         }-        scalar_handle.unwrap()+        scalar_handle+    }++    fn find_vector_handle(

perhaps, we could have another lookup map FastHashMap<LocalType, crate::Handle<crate::Type>>, to make this faster and simpler?

Napokue

comment created time in 5 hours

push eventkvark/webgpu-debate

Mehmet Oguz Derin

commit sha 03ac3a83d382e6d16b01471988062bd1475c3c0c

Subgroup: Add motivation and exclusions (#6) * Add motivation and exclusions This PR aims to make exclusions more visible * Format * Improve wording and do linking * Correct location

view details

push time in 5 hours

PR merged kvark/webgpu-debate

Add motivation and exclusions

This PR aims to make exclusions more visible

+56 -20

1 comment

1 changed file

mehmetoguzderin

pr closed time in 5 hours

pull request commentkvark/webgpu-debate

Add motivation and exclusions

next time, please mark the comments as resolved once you address them :)

mehmetoguzderin

comment created time in 5 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - <Quadgroup Extension>: If quad operations are made into their own                            extension, both its potential market becomes                            larger and subgroup operations' market grows.+    +> <Viable>

could you clarify this link for me? does this statement make the original motivation more viable? I'm confused, because now there appears to be a loop. Motivation -> operation, and operation -> viable

mehmetoguzderin

comment created time in 8 hours

push eventgfx-rs/naga

Pelle Johnsen

commit sha f98054afdd7503e2ee27f9952f98f07fd72ebe59

[glsl-new] add vec2 and vec3

view details

Pelle Johnsen

commit sha 482c6d042d43c2ca9d5753ec7045709d5660ba1b

[glsl.new] Add initial declaration support Focus on handling global in/out vars

view details

Pelle Johnsen

commit sha ede04ba4dc4037904fa22f7051279a5aef14589f

[glsl-new] Simplify declaration code - Fix clippy issues

view details

push time in 8 hours

PR merged gfx-rs/naga

[glsl-new] initial declarations support

Still lots of todos, but can now parse global in/out vars 😄

+164 -9

0 comment

5 changed files

pjoe

pr closed time in 8 hours

issue commentgpuweb/gpuweb

GPUTextureDataLayout.offset shouldn't have to be a multiple of blockSize in writeTexture

Nice find! One more restriction that writeTexture eliminates :)

kainino0x

comment created time in 8 hours

issue commentgpuweb/gpuweb

Texture copies between buffers

I don't see a reason for us to support this case :)

should we generalize our copy API to look more like D3D12's, where T2T, T2B, B2T, and "texture style" B2B all go through the same entry point

same entry point, effectively different overloads. I think what we have today is clear, and trying to converge to fewer methods with more complex definitions isn't going to improve it.

kainino0x

comment created time in 8 hours

issue commentgfx-rs/wgpu-rs

Depth value of textures with formats Depth24Plus or Depth24PlusStencil8 cannot be sampled in shader

Interesting, thank you for reporting! Could you push the changes to a branch, so that we can easily test this?

DasEtwas

comment created time in 9 hours

issue commentgpuweb/gpuweb

bytesPerRow should be validated only if copyExtent.height is greater than one *block* height

Yes, and It should also be validated if height is 1 but depth is not 1.

kainino0x

comment created time in 9 hours

issue closedgfx-rs/gfx

DescriptorInit error on unaligned shader uniforms

When creating a pipeline defined with:

gfx_defines!{
    vertex Vertex {
        pos: [f32; 2] = "a_Pos",
        uv: [f32; 2] = "a_Uv",
    }

    constant Transform {
        transform: [[f32; 4];4] = "u_Transform",
    }

    // Values that are different for each rect.
    constant RectProperties {
        src: [f32; 4] = "u_Src",
        rotation: f32 = "u_Rotation",
        dest: [f32; 2] = "u_Dest",
        scale: [f32;2] = "u_Scale",
        offset: [f32;2] = "u_Offset",
        shear: [f32;2] = "u_Shear",
    }

    pipeline pipe {
        ...
    }
}

My shader:

#version 150 core

in vec2 a_Pos;
in vec2 a_Uv;

uniform Transform {
    mat4 u_Transform;
};

uniform RectProperties {
    vec4 u_Src;
    float u_Rotation;
    vec2 u_Dest;
    vec2 u_Scale;
    vec2 u_Offset;
    vec2 u_Shear;
};

out vec2 v_Uv;

void main() {
    v_Uv = a_Uv;
    gl_Position = vec4((a_Pos * u_Scale) + u_Dest, 0.0, 1.0) * u_Transform;
}

When I create a pipeline with Factory::create_pipeline_simple() I get: DescriptorInit(ConstantBuffer("RectProperties", Some(Offset("u_Dest", 24))))

When I move rotation to the end of the uniform list it works, so I expect this is the driver refusing to handle non-aligned types. Can this either be padded automatically, or at least detected and warned against?

Thanks.

closed time in 9 hours

icefoxen

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.

also, I think it needs "+> [Xxx Operations]" for each of "Shuffle", "Quad", and "Explicit". That would make the graph more connected, and I think it makes sense: the only reason we have these other statements, is because there is a motivation for performance

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.+  + <Standard Library>: Since subgroup operations are common across+                        all three APIs, they were to be considered+                        for [issue #667](https://github.com/gpuweb/gpuweb/issues/667).+  + <Performance>: [Proven contribution](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/)+                   to general purpose algorithms.+  + <Viable>: There is a safe subset of subgroup operations.+++[Extension]: Subgroup operations should be exposed as an extension.+  + <Target>: Subgroup operations are not available on all WebGPU+              target hardware.++[Host Interface]: Subgroup size control and statistics should be exposed.

would be good to add "exposed to the API" or something, to clarify that we aren't talking about the shaders here

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.+  + <Standard Library>: Since subgroup operations are common across+                        all three APIs, they were to be considered+                        for [issue #667](https://github.com/gpuweb/gpuweb/issues/667).+  + <Performance>: [Proven contribution](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/)+                   to general purpose algorithms.+  + <Viable>: There is a safe subset of subgroup operations.+++[Extension]: Subgroup operations should be exposed as an extension.+  + <Target>: Subgroup operations are not available on all WebGPU+              target hardware.++[Host Interface]: Subgroup size control and statistics should be exposed.+  - <Pipeline Statistics>: Exact subgroup size can't be queried in DirectX 12.

"statistics" term is not appropriate here. You are talking about pipeline execution properties, not statistics

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.+  + <Standard Library>: Since subgroup operations are common across+                        all three APIs, they were to be considered+                        for [issue #667](https://github.com/gpuweb/gpuweb/issues/667).+  + <Performance>: [Proven contribution](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/)

Would be worth putting a little bit more info on what amount of improvement is here. My original wording had "multiple times". Did you intentionally remove that detail?

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.

it sounds like we should add "+> [Extension]" at the end to link the statement

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.+  + <Standard Library>: Since subgroup operations are common across+                        all three APIs, they were to be considered+                        for [issue #667](https://github.com/gpuweb/gpuweb/issues/667).+  + <Performance>: [Proven contribution](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/)+                   to general purpose algorithms.+  + <Viable>: There is a safe subset of subgroup operations.

how do we know that there is a safe subset?

mehmetoguzderin

comment created time in 9 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.+[Motivation]: We should have subgroup operations.+  + <Standard Library>: Since subgroup operations are common across+                        all three APIs, they were to be considered

" thee APIs, see investigation in ". Saying "they were to be considered" is a consequence, while you are describing an argument.

mehmetoguzderin

comment created time in 9 hours

pull request commentservo/servo

Update WebGPU CTS to main branch

@bors-servo r+

kunalmohan

comment created time in 9 hours

issue commentgfx-rs/wgpu-rs

Depth + Stencil attachment is discarded after RenderPass regardless of `stencil_ops::store` value

Hmm strange. I'm looking at the code, and it appears to be respecting the stencil ops in both wgpu and gfx-rs:

  • https://github.com/gfx-rs/wgpu/blob/430b29d781200009ef02839e41136718ff62456a/wgpu-core/src/command/render.rs#L633
  • https://github.com/gfx-rs/gfx/blob/14b32e5aed7f79ca309880f479e96517758d70b5/src/backend/vulkan/src/device.rs#L585

Would you be able to debug your case further and see where we are losing the values? Also, RenderDoc should be able to show you exactly what the store ops are on the render pass. I think you can click on the render pass object, and then see usage history, and it will give you the descriptor, with which the render pass was created.

DasEtwas

comment created time in 9 hours

issue commentgfx-rs/rspirv

Publish a new version of rspirv

@antiagainst thank you! I'm fine publishing spirv_headers, but I'm not feeling positive about publishing rspirv itself, since we aren't using it anywhere. Perhaps, it would make sense to share the publishing joy of it with @Jasper-Bekkers ?

Jasper-Bekkers

comment created time in 9 hours

issue commentrustgd/cgmath

Rotation matrix handedness

Did you see https://github.com/rustgd/cgmath/pull/508 that is deprecating look_at?

josh65536

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 pub enum MemberOrigin {  /// Member of a user-defined structure. // Clone is used only for error reporting and is not intended for end users-#[derive(Clone, Debug, PartialEq)]+#[derive(Clone, Debug, PartialEq, Hash, Eq)]

these shouldn't be needed either

Napokue

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 pub struct Type {  /// Enum with additional information, depending on the kind of type. // Clone is used only for error reporting and is not intended for end users-#[derive(Clone, Debug, PartialEq)]+#[derive(Clone, Debug, PartialEq, Hash, Eq)]

I think with the changes to LocalType that I suggested, these derives are no longer going to be needed

Napokue

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 enum Signedness {     Signed = 1, } -#[derive(Debug)]+#[derive(Hash, Eq, PartialEq, Debug)] enum LocalType {     Scalar(crate::TypeInner),

why are we storing TypeInner here? I.e. if we are going to store TypeInner, we wouldn't need LocalType at all :) The idea was to have these types fully specified, i.e.

enum LocalType {
  Scalar { kind, width },
  Vector { size, kind, width },
  Pointer { base, class },
}
Napokue

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 spirv = { package = "spirv_headers", version = "1.4.2", optional = true } glsl = { version = "4.1", optional = true } pomelo = { version = "0.1.4", optional = true } thiserror = "1.0"+daggy = "*"

Also, it should be totally optional, behind the spirv feature. Currently, spirv the feature comes from the optional package. I think we'd want to avoid renaming spirv_headers to spirv here in the manifest, and just have it specified as:

spirv = ["daggy", "spirv_headers"]
MatusT

comment created time in 10 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 impl<I: Iterator<Item = u32>> Parser<I> {                 Instruction { op, .. } => return Err(Error::InvalidParameter(op)),             }         }-        // read body++        // Read body         let mut local_function_calls = FastHashMap::default();-        let mut control_flow_graph = FastHashMap::default();+        let mut control_flow_graph = Dag::<ControlFlowNode, ControlFlowEdge, u32>::new();+        let mut id_to_node = std::collections::HashMap::<u32, NodeIndex<u32>>::new();++        // Scan the blocks and add them as nodes         loop {             let fun_inst = self.next_inst()?;             log::debug!("\t\t{:?}", fun_inst.op);             match fun_inst.op {                 spirv::Op::Label => {+                    // Read the label ID                     fun_inst.expect(2)?;                     let label_id = self.next()?;+                     let node = self.next_block(+                        label_id,                         &mut fun.expressions,                         &mut fun.local_variables,                         &module.types,                         &module.constants,                         &module.global_variables,                         &mut local_function_calls,                     )?;-                    // temp until the CFG is fully processed-                    for assign in node.assignments.iter() {-                        fun.body.push(crate::Statement::Store {-                            pointer: assign.to,-                            value: assign.value,-                        });-                    }-                    match node.terminator {-                        Terminator::Return { value } => {-                            fun.body.push(crate::Statement::Return { value });-                        }-                        Terminator::Branch {-                            label_id,-                            condition,-                        } => {-                            let _ = (label_id, condition); //TODO-                        }-                    }-                    control_flow_graph.insert(label_id, node);++                    id_to_node.insert(label_id, control_flow_graph.add_node(node));                 }                 spirv::Op::FunctionEnd => {                     fun_inst.expect(1)?;                     break;                 }-                _ => return Err(Error::UnsupportedInstruction(self.state, fun_inst.op)),+                _ => panic!("SHOULD NOT HAPPEN"),+            }+        }++        // Create Edges.+        let mut edges = Vec::new();+        for node in control_flow_graph.node_weights_mut() {

is it feasible to try adding the edges in the same place where we add the nodes, so that we avoid this phase completely?

MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 struct LookupSampledImage {     image: Handle<crate::Expression>,     sampler: Handle<crate::Expression>, }- struct DeferredFunctionCall {     source_handle: Handle<crate::Function>,     expr_handle: Handle<crate::Expression>,     dst_id: spirv::Word, }--enum Terminator {+pub enum Merge {+    Selection {+        merge_block_id: spirv::Word,

how do you feel about doing something like this:

type BlockId = std::num::NonZeroU32;

If this is used in all the places where you have Option<spirv::Word> (and the Word is a block ID), we'd save a bunch of bytes, and make the data cache utilization better

MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 struct LookupSampledImage {     image: Handle<crate::Expression>,     sampler: Handle<crate::Expression>, }- struct DeferredFunctionCall {     source_handle: Handle<crate::Function>,     expr_handle: Handle<crate::Expression>,     dst_id: spirv::Word, }--enum Terminator {+pub enum Merge {

Nice enum!

Where are we matching it to anything? If we have to extract the block id somewhere, and we don't expect the enum to grow, we can refactor it into a struct:

pub enum Merge {
  block_id: spirv::Word,
  continue_id: Option<spirv::Word>,
}
MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 impl<I: Iterator<Item = u32>> Parser<I> {                 Instruction { op, .. } => return Err(Error::InvalidParameter(op)),             }         }-        // read body++        // Read body         let mut local_function_calls = FastHashMap::default();-        let mut control_flow_graph = FastHashMap::default();+        let mut control_flow_graph = Dag::<ControlFlowNode, ControlFlowEdge, u32>::new();+        let mut id_to_node = std::collections::HashMap::<u32, NodeIndex<u32>>::new();++        // Scan the blocks and add them as nodes         loop {             let fun_inst = self.next_inst()?;             log::debug!("\t\t{:?}", fun_inst.op);             match fun_inst.op {                 spirv::Op::Label => {+                    // Read the label ID                     fun_inst.expect(2)?;                     let label_id = self.next()?;+                     let node = self.next_block(+                        label_id,                         &mut fun.expressions,                         &mut fun.local_variables,                         &module.types,                         &module.constants,                         &module.global_variables,                         &mut local_function_calls,                     )?;-                    // temp until the CFG is fully processed-                    for assign in node.assignments.iter() {-                        fun.body.push(crate::Statement::Store {-                            pointer: assign.to,-                            value: assign.value,-                        });-                    }-                    match node.terminator {-                        Terminator::Return { value } => {-                            fun.body.push(crate::Statement::Return { value });-                        }-                        Terminator::Branch {-                            label_id,-                            condition,-                        } => {-                            let _ = (label_id, condition); //TODO-                        }-                    }-                    control_flow_graph.insert(label_id, node);++                    id_to_node.insert(label_id, control_flow_graph.add_node(node));                 }                 spirv::Op::FunctionEnd => {                     fun_inst.expect(1)?;                     break;                 }-                _ => return Err(Error::UnsupportedInstruction(self.state, fun_inst.op)),+                _ => panic!("SHOULD NOT HAPPEN"),+            }+        }++        // Create Edges.+        let mut edges = Vec::new();+        for node in control_flow_graph.node_weights_mut() {+            let source_node_index = id_to_node[&node.id];++            match node.terminator {+                Terminator::Branch { target_id } => {+                    let target_node_index = id_to_node[&target_id];++                    edges.push((+                        source_node_index,+                        target_node_index,+                        ControlFlowEdge::Forward,+                    ));+                }+                Terminator::BranchConditional {+                    true_id, false_id, ..+                } => {+                    let true_node_index = id_to_node[&true_id];+                    let false_node_index = id_to_node[&false_id];++                    edges.push((source_node_index, true_node_index, ControlFlowEdge::IfTrue));+                    edges.push((+                        source_node_index,+                        false_node_index,+                        ControlFlowEdge::IfFalse,+                    ));+                }+                _ => {}             }         }+        for edge in edges {

can we have this written as for (src, dst, mode) in edges or something like this?

MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 TODO: would be nice to find ways that avoid looking up as much  !*/ +use std::convert::TryFrom;++use crate::front::spv_dag::*; use crate::{     arena::{Arena, Handle},     FastHashMap, FastHashSet, }; +use daggy::petgraph::*;

Would be great to avoid doing this as well. I know this is a quick prototype, just leaving a note for us to not forget about this :)

MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 impl<I: Iterator<Item = u32>> Parser<I> {                 Instruction { op, .. } => return Err(Error::InvalidParameter(op)),             }         }-        // read body++        // Read body         let mut local_function_calls = FastHashMap::default();-        let mut control_flow_graph = FastHashMap::default();+        let mut control_flow_graph = Dag::<ControlFlowNode, ControlFlowEdge, u32>::new();+        let mut id_to_node = std::collections::HashMap::<u32, NodeIndex<u32>>::new();++        // Scan the blocks and add them as nodes         loop {             let fun_inst = self.next_inst()?;             log::debug!("\t\t{:?}", fun_inst.op);             match fun_inst.op {                 spirv::Op::Label => {+                    // Read the label ID                     fun_inst.expect(2)?;                     let label_id = self.next()?;+                     let node = self.next_block(+                        label_id,                         &mut fun.expressions,                         &mut fun.local_variables,                         &module.types,                         &module.constants,                         &module.global_variables,                         &mut local_function_calls,                     )?;-                    // temp until the CFG is fully processed-                    for assign in node.assignments.iter() {-                        fun.body.push(crate::Statement::Store {-                            pointer: assign.to,-                            value: assign.value,-                        });-                    }-                    match node.terminator {-                        Terminator::Return { value } => {-                            fun.body.push(crate::Statement::Return { value });-                        }-                        Terminator::Branch {-                            label_id,-                            condition,-                        } => {-                            let _ = (label_id, condition); //TODO-                        }-                    }-                    control_flow_graph.insert(label_id, node);++                    id_to_node.insert(label_id, control_flow_graph.add_node(node));                 }                 spirv::Op::FunctionEnd => {                     fun_inst.expect(1)?;                     break;                 }-                _ => return Err(Error::UnsupportedInstruction(self.state, fun_inst.op)),+                _ => panic!("SHOULD NOT HAPPEN"),

we shouldn't panic here. Any reason you switched this to a panic?

MatusT

comment created time in 9 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 TODO: would be nice to find ways that avoid looking up as much  !*/ +use std::convert::TryFrom;++use crate::front::spv_dag::*;

we really need to turn SPV front-end into a real module, i.e. have src/front/spv/mod.rs for this file, then src/front/spv/dag.rs for your new one

MatusT

comment created time in 10 hours

Pull request review commentgfx-rs/naga

Draft: SPIR-V front-end Control Flow Graph

 spirv = { package = "spirv_headers", version = "1.4.2", optional = true } glsl = { version = "4.1", optional = true } pomelo = { version = "0.1.4", optional = true } thiserror = "1.0"+daggy = "*"

let's not do wildcard dependencies :)

MatusT

comment created time in 10 hours

push eventgfx-rs/naga

Timo de Kort

commit sha 6db5b373f87658daa36fe0aff3dec5387e910725

Add support for other matrices keywords (#121) Add mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, and mat4x3 keywords

view details

push time in 10 hours

PR merged gfx-rs/naga

Add support for other matrices keywords

Add mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, and mat4x3 keywords.

Updating the WGSL front-end to support all the matrices keywords, see keywords.

+54 -0

0 comment

1 changed file

Napokue

pr closed time in 10 hours

issue commentkvark/vange-rs

Enabling `game.physics.gpu_collision` causes a segfault

Thank you for filing! I haven't tested this code path with latest changes lately.

suhr

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.

where did this go?

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.

where did this go?

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).

where did this go?

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?

where did this go? the point of this document is to capture the debate. Erasing concerns from history is not that.

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,+              subgroup operations are common+              across all three APIs.+  + <Performance>: Proven contribution to+                   general purpose algorithms.+  + <Safe>: There is a safe subset of+            subgroup operations.+++[Extension]: Subgroup operations should be+             exposed as an extension.+  + <Target>: Subgroup operations are not available+              on all WebGPU target hardware.++[Device Only]: Host statistics for subgroup+               operations are not common+               across all three APIs.++[Compute Only]: Subgroup operations should only+                work in compute kernels.+  + <Hardware>: Restricting to compute only increases+                market (e.g., Adreno).+  + <Definition>: Operations are better defined+                  for compute. And this makes helper+                  invocations irrelevant.++[No Shuffle Operations]: Shuffle operations should be excluded.

I'm feeling unsure about whether this is a good idea to have "[No Xxx]" statements, in general. After all, arguments are symmetric: you can always flip the "+" into "-" thus negating the statement. And in this case, having "[Xxx]" means one less redirection, so sounds like we could always avoid "No" negation in statements.

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,+              subgroup operations are common+              across all three APIs.+  + <Performance>: Proven contribution to+                   general purpose algorithms.+  + <Safe>: There is a safe subset of+            subgroup operations.+++[Extension]: Subgroup operations should be+             exposed as an extension.+  + <Target>: Subgroup operations are not available+              on all WebGPU target hardware.++[Device Only]: Host statistics for subgroup

what is "host statistics" here? needs clarification. What is this trying to state exactly? The "Device Only" hints that you are trying to say that there should be no host API exposed, but the description doesn't say that. Instead, the description sounds more like an argument here.

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,+              subgroup operations are common+              across all three APIs.+  + <Performance>: Proven contribution to+                   general purpose algorithms.+  + <Safe>: There is a safe subset of

This sounds more like a statement, since it's not obvious, it needs arguments. We can't use it in plain like this.

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,+              subgroup operations are common+              across all three APIs.+  + <Performance>: Proven contribution to+                   general purpose algorithms.+  + <Safe>: There is a safe subset of+            subgroup operations.+++[Extension]: Subgroup operations should be+             exposed as an extension.+  + <Target>: Subgroup operations are not available+              on all WebGPU target hardware.++[Device Only]: Host statistics for subgroup+               operations are not common+               across all three APIs.++[Compute Only]: Subgroup operations should only+                work in compute kernels.+  + <Hardware>: Restricting to compute only increases+                market (e.g., Adreno).+  + <Definition>: Operations are better defined+                  for compute. And this makes helper+                  invocations irrelevant.++[No Shuffle Operations]: Shuffle operations should be excluded.+  + <DirectX>: DirectX doesn't have shuffle operations.+  + <Support>: Not having shuffle operations+               increases support (e.g., ARM).++[No Quad Operations]: Quad operations should be their+                      own extension.+  + <Market>: If quad operations are made into their+              own extension, both its potential market+              becomes larger and subgroup operations'+              market grows.++[No Explicit Operations]: Indexed or masked operations

here, again, you are using the statement description for the argument. Let's not do that:) The description should be saying "We shouldn't expose explicit broadcast operations" or something like that

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,

the "[]" syntax is for statements. "This PR works towards ..." is not a statement that we are going to be proving, or use to prove anything else.

mehmetoguzderin

comment created time in 10 hours

Pull request review commentkvark/webgpu-debate

Add motivation and exclusions

 Expose a subset of subgroup operations in WGSL shaders:   - PR: https://github.com/gpuweb/gpuweb/pull/954 --> -[Extension]: We should have an extension to expose subgroup operations.-  + <Performance>: depending on a workload and the platform, it's possible to run [multiple times](https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/) faster.-  + <Support>: all 3 target APIs have some variation of subgroup operations, optionally.--[Size Control]: We should let the developer to control the size of subgroups (in compute workloads).-  + Vulkan has [VK_EXT_subgroup_size_control](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_subgroup_size_control.html) extension to control the size.-  - D3D12 and Metal don't allow to control the size.--[Size Query]: We should allow the developer to query the subgroup size from a compute pipeline.-  + Vulkan has [VK_KHR_pipeline_executable_properties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_pipeline_executable_properties.html) extension to query the size.-  + Metal has [threadExecutionWidth](https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414911-threadexecutionwidth) to query the size.-  - D3D12 doesn't have anything to query the exact size.-  - <Useless>: it's not actionable in Vulkan and D3D12, unlike Metal where we can control the workgroup size from the API.--[Quad Ops]: We should expose the quad operations in fragment shaders.-  - new Adreno and PowerVR hardware doesn't support this-//TODO: fill this out more--[Non-uniform]: We should support the non-uniform subgroup model.-  + <Everywhere>: Vulkan, D3D, and Metal use a non-uniform model.-  - <Ambiguous Divergence>: when invocations in a subgroup are executing "together", how long is that guaranteed?-  - <Ambiguous Reconvergence>: once invocations diverge, what are the guarantees about where you reconverge?-    + Vulkan has weak guarantees, D3D12 and Metal don't have anything.-    - we can just say invocations never reconverge, which matches AMD ISA and CUDA models.-  - <Ambiguous Forward Progress>: how blocks affect progress of other blocks, and invocations within blocks affect progress on other invocations?-    + D3D, Metal, and Vulkan are silent on both of these.-  - <Ambiguous Helpers>: do helper invocations participate in subgroup operations?+[Motivation]: This PR works towards issue #667,+              subgroup operations are common+              across all three APIs.+  + <Performance>: Proven contribution to

links missing!

mehmetoguzderin

comment created time in 10 hours

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {         arena: &crate::Arena<crate::Type>,         handle: crate::Handle<crate::Type>,     ) -> Word {-        match self.lookup_type.lookup_id(handle) {-            Some(word) => word,-            None => {-                let (instruction, id) = self.parse_type_declaration(arena, handle);-                instruction.to_words(&mut self.logical_layout.declarations);-                id-            }+        let ty = &arena[handle];++        if let Some(id) = self+            .lookup_type_new+            .iter()+            .find(|&(_, v)| match v {

The way I think this would work is that you construct the LookupType::Handle(handle) as a key, and do lookup_types.entry(key). If it's Occupied, just return the value, which is SPIR-V ID. If it's vacant, do the write_type_declaration stuff, which produces a new ID, and fill the vacant entry with this.

Napokue

comment created time in a day

pull request commentservo/servo

Major fixes in error reporting in GPUCommandEncoder and ErrorScope Model

@bors-servo r+

kunalmohan

comment created time in 2 days

issue openedchances/deno-wgpu

Switch to using wgpu-core directly

The point of wgpu crate (developed in https://github.com/gfx-rs/wgpu-rs) is to allow users to have a convenient API that is targeted at both native and the Web. Since Deno wraps it into its own JS API, and it doesn't care about targeting the Web (obviously), it would make more sense to use wgpu-core and wgpu-types directly, which are developed in https://github.com/gfx-rs/wgpu .

I created https://github.com/gfx-rs/wgpu/issues/869 on our side in case anybody from the community wants to help with this.

created time in 2 days

issue openedgfx-rs/wgpu

Switch deno-wgpu from wgpu-rs to wgpu-core

There is a project of getting wgpu to work with Deno: https://github.com/chances/deno-wgpu

I think it's important, because there are users who would want to work with wgpu or WebGPU in JS on native. They can do it via NodeJS bindings of webgpu-headers, that is in progress, and we aim to get compatible with it in wgpu-native. There is a special sub-group now called WebGPU-NAPI that is also looking into this, see https://github.com/Kings-Distributed-Systems/webgpu-napi

It would be nice to explore possible collaboration with Deno further. First step would be changing deno-wgpu to use wgpu-core and wgpu-types directly instead of wgpu-rs.

created time in 2 days

GollumEvent

startedchances/deno-wgpu

started time in 2 days

issue commentservo/webrender

What is stacking context?

Yes, that's the idea

kaiwk

comment created time in 2 days

issue commentgfx-rs/rspirv

Publish a new version of rspirv

Just make sure to triage the issues that could have impacted the API, to see if they need to be solved before publishing

Jasper-Bekkers

comment created time in 2 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha 0b47a7d21e12aade2cd80706f366f8a6f3ee193f

Fix web components deployment

view details

push time in 3 days

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 pub struct Writer {     annotations: Vec<Instruction>,     writer_flags: WriterFlags,     void_type: Option<u32>,-    lookup_type: FastHashMap<Word, crate::Handle<crate::Type>>,+    lookup_type_new: FastHashMap<Word, LookupType>,

is the name change accidental?

Napokue

comment created time in 3 days

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {     /// Primitive Instructions     /// +    fn parse_type_declaration_local(&mut self, local_ty: LocalType) -> (Instruction, Word) {

we shouldn't have the "parse" in the names, since the backend isn't supposed to parse anything :) consider "write" or "convert", or something along the lines

Napokue

comment created time in 3 days

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 impl Writer {         arena: &crate::Arena<crate::Type>,         handle: crate::Handle<crate::Type>,     ) -> Word {-        match self.lookup_type.lookup_id(handle) {-            Some(word) => word,-            None => {-                let (instruction, id) = self.parse_type_declaration(arena, handle);-                instruction.to_words(&mut self.logical_layout.declarations);-                id-            }+        let ty = &arena[handle];++        if let Some(id) = self+            .lookup_type_new+            .iter()+            .find(|&(_, v)| match v {

once we reverse the table, the iteration here will go away, and a single lookup remain

Napokue

comment created time in 3 days

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 pub struct Writer {     annotations: Vec<Instruction>,     writer_flags: WriterFlags,     void_type: Option<u32>,-    lookup_type: FastHashMap<Word, crate::Handle<crate::Type>>,+    lookup_type_new: FastHashMap<Word, LookupType>,

I believe all of the lookup tables here need to be reversed (were they copied from the front-end?). I.e. you should never need to lookup IR stuff by a spirv ID. You need the opposite: given IR handles and stuff, be able to produce spirv IDs that you are saving to the file. So this table (and all others) need to look like this:

lookup_type: FastHashMap<LocalType, Word>,
Napokue

comment created time in 3 days

Pull request review commentgfx-rs/naga

Spirv lookup refactor

 enum Signedness {     Signed = 1, } +#[derive(Debug)]+enum LocalType {

we probably want vectors as well here, since it's implicitly required by the matrix types in our IR

Napokue

comment created time in 3 days

Pull request review commentservo/servo

Major fixes in error reporting in GPUCommandEncoder and ErrorScope Model

 pub enum WebGPURequest {     },     CopyBufferToBuffer {

nit: might be good to move them together as Copy { command_encoder_id: id::CommandEncoderId, op: Option<CopyOperation> } or something like this. That would allow the server side to handle the errors more uniformly.

kunalmohan

comment created time in 3 days

pull request commentservo/servo

Major fixes in error reporting in GPUCommandEncoder and ErrorScope Model

@kunalmohan reviewing the spec now, I don't think it says anything about errors being deferred to finish(). e.g. copyBufferToTexture says:

If any of the following conditions are unsatisfied, generate a validation error and stop.

So it doesn't look like any spec changes are needed?

kunalmohan

comment created time in 3 days

pull request commentgpuweb/gpuweb

Introduce Subgroup Operations Extension

I tried to crystallize the debate (at its current state, to be updated) in https://github.com/kvark/webgpu-debate/ , see plain or web-component views.

mehmetoguzderin

comment created time in 3 days

issue commentgpuweb/gpuweb

Summary of current robust buffer access opinions

I tried to crystalize the debate (at its current state, to be updated) in https://github.com/kvark/webgpu-debate/ , see plain or web-component views.

iraiter2

comment created time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha a3d9897ef5cd8c669943399188548474f60b5c9a

Add links to RBA and subgroups

view details

push time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha 9d0904339cde8331207f7a4e774b7b6689c72611

Fix web components deployment

view details

push time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha 8f76f1ce918bb0b0684acb0e4e3815f76b3c0ec8

Fix web components deployment

view details

push time in 3 days

pull request commentservo/servo

Major fixes in error reporting in GPUCommandEncoder and ErrorScope Model

@bors-servo try=wpt-mac

kunalmohan

comment created time in 3 days

issue commentchristianvoigt/argdown

Error on HTML generation from multiple files

Thank you for fixing this promptly!

kvark

comment created time in 3 days

pull request commentgpuweb/gpuweb

No explicit grammar for decorations

Sorry for raising these concerns over the PR, I now noticed that it's marked as resolved. I haven't put enough thought into it previously when we were discussing this, but feel free to proceed if the concerns are not ringing any bells :)

jdashg

comment created time in 3 days

issue openedgfx-rs/wgpu

Make pipeline layout objects optional

Is your feature request related to a problem? Please describe. See https://github.com/gpuweb/gpuweb/pull/543 getBindGroupLayout. Apparently, WebGPU CTS relies a lot on this behavior :/

Describe the solution you'd like When creating pipelines, the layout would now be optional. If not provided, it's derived from the shader source according to the set of rules outlined in the spec. Users can then get the bind group layout object, to create the bind groups with.

Describe alternatives you've considered Not much we can do - this is in upstream spec now.

Additional context This may be blocked on https://github.com/gfx-rs/naga progress and WGSL adoption - it's easier to extract the data from naga modules.

created time in 3 days

issue openedgfx-rs/wgpu

Use a new ID on command_encoder_finish()

Is your feature request related to a problem? Please describe. When implementing the error model in Servo, @kunalmohan faced a problem of tracking which encoders are error encoders, and how to work with the corresponding command buffers.

Describe the solution you'd like Instead of re-using the same ID for command encoder and command buffer, we can always accept a new ID, and just move the entry over (possibly marking the old entry as Error).

Describe alternatives you've considered Don't do anything, let Servo handle it. That's the temporary plan for now until this issue is resolved.

Additional context

created time in 3 days

pull request commentgpuweb/gpuweb

Updated createBindGroup to new method style

Are these intended to be actions (Add these subresources to this map now) or tests (Fail if these subresources are not already in this map with that usage?)

Answer here is "actions".

toji

comment created time in 3 days

pull request commentgfx-rs/rspirv

GitHub: Add dependabot configuration for update notifications

It's definitely good to track your duplicated dependencies. This is what would be driving the need to update derive_more. It's a better motivator than the hint from dependabot saying "hey, there is a new release of something".

MarijnS95

comment created time in 3 days

Pull request review commentgpuweb/gpuweb

Adding validation for GPURenderPassEncoderBase methods

 enum GPUStoreOp {      : <dfn>drawIndirect(indirectBuffer, indirectOffset)</dfn>     ::+        Draws primitives using parameters read from a {{GPUBuffer}}.++        The <dfn dfn>indirect draw parameters</dfn> encoded in the buffer must be a tightly+        packed block of **four 32-bit unsigned integer values (16 bytes total)**, given in the same+        order as the arguments for {{GPURenderEncoderBase/draw()}}. Written as a C struct,+        the data layout would be:++        ```c+        struct GPUDrawIndirectParameters {+            uint32_t vertexCount;+            uint32_t instanceCount;+            uint32_t firstVertex;+            uint32_t firstInstance;+        };+        ```          <div algorithm="GPURenderEncoderBase.drawIndirect">             **Called on:** {{GPURenderEncoderBase}} this.              **Arguments:**             <pre class=argumentdef for="GPURenderEncoderBase/drawIndirect(indirectBuffer, indirectOffset)">-                indirectBuffer:-                indirectOffset:+                |indirectBuffer|: Buffer containing the [=indirect draw parameters=].+                |indirectOffset|: Offset in bytes into |indirectBuffer| where the drawing data begins.             </pre>              **Returns:** void -            Issue: Describe {{GPURenderEncoderBase/drawIndirect()}} algorithm steps.+            Issue the following steps on the [=Queue timeline=] of |this|:+            <div class=queue-timeline>+                1. If any of the following conditions are unsatisfied, generate a+                    {{GPUValidationError}} error and stop.+                    <div class=validusage>+                        - |indirectBuffer| is [=valid=].+                        - |indirectBuffer|.{{GPUObjectBase/[[device]]}} is |this|.+                        - |indirectOffset| + sizeof([=indirect draw parameters=]) &le;+                            |indirectBuffer|.{{GPUBuffer/[[size]]}}.++                        Issue: Does |indirectOffset| need a particular alignment?

Yes, it needs to be aligned to 4 in Vulkan and Metal at least, likely in D3D12 as well

toji

comment created time in 3 days

Pull request review commentgpuweb/gpuweb

Adding validation for GPURenderPassEncoderBase methods

 enum GPUStoreOp {      : <dfn>drawIndexed(indexCount, instanceCount, firstIndex, baseVertex, firstInstance)</dfn>     ::+        Draws indexed primitives.          <div algorithm="GPURenderEncoderBase.drawIndexed">             **Called on:** {{GPURenderEncoderBase}} this.              **Arguments:**             <pre class=argumentdef for="GPURenderEncoderBase/drawIndexed(indexCount, instanceCount, firstIndex, baseVertex, firstInstance)">-                indexCount:-                instanceCount:-                firstIndex:-                baseVertex:-                firstInstance:+                indexCount: The number of indices to draw.+                instanceCount: The number of instances to draw.+                firstIndex: Index to begin drawing from.

we should be careful here. We are in "indexed" drawing, which means we have indices of vertices. But this "firstIndex" is not that index. It's an index of an index.

toji

comment created time in 3 days

Pull request review commentgpuweb/gpuweb

Adding validation for GPURenderPassEncoderBase methods

 enum GPUStoreOp { <dl dfn-type=method dfn-for=GPURenderEncoderBase>     : <dfn>setPipeline(pipeline)</dfn>     ::+        Sets the current {{GPURenderPipeline}}.          <div algorithm="GPURenderEncoderBase.setPipeline">             **Called on:** {{GPURenderEncoderBase}} this.              **Arguments:**             <pre class=argumentdef for="GPURenderEncoderBase/setPipeline(pipeline)">-                pipeline:+                |pipeline|: The render pipeline to use for subsequent drawing commands.             </pre>              **Returns:** void -            Issue: Describe {{GPURenderEncoderBase/setPipeline()}} algorithm steps.+            Issue the following steps on the [=Queue timeline=] of |this|:

This looks more like "device timeline": validation of commands may happen during recording, not submission. Queue only cares about submissions.

The tricky part here is that it's not going to be ordered within the other device operations on this timeline, but I think it's fine, since it doesn't depend on any other operations.

toji

comment created time in 3 days

Pull request review commentgpuweb/gpuweb

Adding validation for GPURenderPassEncoderBase methods

 enum GPUStoreOp {      : <dfn>drawIndirect(indirectBuffer, indirectOffset)</dfn>     ::+        Draws primitives using parameters read from a {{GPUBuffer}}.++        The <dfn dfn>indirect draw parameters</dfn> encoded in the buffer must be a tightly+        packed block of **four 32-bit unsigned integer values (16 bytes total)**, given in the same+        order as the arguments for {{GPURenderEncoderBase/draw()}}. Written as a C struct,

I wonder how appropriate it is to refer to C structs when describing a JS API?

toji

comment created time in 3 days

issue commentchristianvoigt/argdown

Error on HTML generation from multiple files

See full log in https://github.com/kvark/webgpu-debate/runs/954040535?check_suite_focus=true

kvark

comment created time in 3 days

issue openedchristianvoigt/argdown

Error on HTML generation from multiple files

[Error: EEXIST: file already exists, copyfile '/usr/local/lib/node_modules/@argdown/cli/node_modules/@argdown/core/dist/plugins/argdown.css' -> '/Users/dmalyshau/Code/webgpu-debate/html/argdown.css'] {
  errno: -17,
  code: 'EEXIST',
  syscall: 'copyfile',
  path: '/usr/local/lib/node_modules/@argdown/cli/node_modules/@argdown/core/dist/plugins/argdown.css',
  dest: '/Users/dmalyshau/Code/webgpu-debate/html/argdown.css'
}

Looks like it's trying to copy the CSS for each argdown file, and then bites its own tail?

created time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha 972d2fb55a3ab8ffbfc94a88a4c1a3c2a13835ad

Add subgroup ops debate (#5)

view details

push time in 3 days

delete branch kvark/webgpu-debate

delete branch : subgroup

delete time in 3 days

PR merged kvark/webgpu-debate

Add subgroup ops debate

Mostly captured from https://github.com/gpuweb/gpuweb/pull/954 Merging is currently blocked by https://github.com/christianvoigt/argdown/issues/179

+38 -0

0 comment

2 changed files

kvark

pr closed time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha e20a68651f13c64b0c81222c8c6a60cc73549090

Add RBA debate (#4)

view details

Dzmitry Malyshau

commit sha 885713dcbad7c87a2be9f3894004b9a462eb694b

Add subgroup ops debate

view details

push time in 3 days

push eventkvark/webgpu-debate

Dzmitry Malyshau

commit sha e20a68651f13c64b0c81222c8c6a60cc73549090

Add RBA debate (#4)

view details

push time in 3 days

delete branch kvark/webgpu-debate

delete branch : rba

delete time in 3 days

PR merged kvark/webgpu-debate

Add RBA debate

See https://github.com/gpuweb/gpuweb/issues/955 It's still in progress.

+47 -5

0 comment

5 changed files

kvark

pr closed time in 3 days

pull request commentgfx-rs/rspirv

GitHub: Add dependabot configuration for update notifications

I haven't worked on a project with a good demonstration of dependabot, and I'm struggling to see its usefulness for Rust projects. Let me try to explain, why.

If you have an application, and you want to always use the latest patch version of the up, having dependabot asking you to do this in every Cargo toml of every dependency of yours is a huge waste. Instead, you can simply do cargo update in a cron job or something on your side, and be done with it. And if we are talking about the breaking non-patch versions, dependabot can't provide a working patch anyway, it can at most notify you that a new version is out. Updating to every breaking release urgently is also a waste, so typically there is a big gap between a breaking version published and picked up by big applications. In this case, the benefit of quick notification is not obvious.

Note: rspirv includes Cargo lock for the binary target rspirv-dis, not for the libraries.

MarijnS95

comment created time in 3 days

more