Q2_0 group 64: Vulkan backend#42
Draft
khosravipasha wants to merge 1 commit into
Draft
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the ggml Vulkan backend to support GGML_TYPE_Q2_0 (group size 64) end-to-end by adding the necessary GLSL type definitions, dequant/quant kernels, matmul load path, cooperative-matrix dequant helper, shader generation entries, and Vulkan pipeline wiring. It also introduces a CPU-side shader simulator test intended to validate the GLSL bit-extraction logic against the CPU reference, and enables Q2_0 in the backend ops test matrix.
Changes:
- Add Vulkan shader support for Q2_0: type/layout (
block_q2_0), dequant kernel, dequant helpers, quantize kernel, matmul shared-memory load path, and CM2 decode helper. - Wire Q2_0 into Vulkan pipeline creation/selection and shader generation.
- Add a CPU shader-simulation test and enable Q2_0 coverage in
test-backend-ops.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test-vulkan-q2_0-shader-sim.cpp | New CPU-based simulator + tests intended to prove Q2_0 shader decoding matches CPU reference. |
| tests/test-backend-ops.cpp | Adds GGML_TYPE_Q2_0 to tested type sets. |
| tests/CMakeLists.txt | Builds/runs the new Q2_0 shader simulator test. |
| ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp | Adds q2_0 to shader generation type list and copy/set_rows loops. |
| ggml/src/ggml-vulkan/vulkan-shaders/types.glsl | Introduces block_q2_0 + QUANT_K_Q2_0/QUANT_R_Q2_0 and DATA_A_Q2_0 wiring. |
| ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_funcs.glsl | Adds Q2_0 branch to load A into shared memory for matmul. |
| ggml/src/ggml-vulkan/vulkan-shaders/dequant_q2_0.comp | New standalone dequant kernel for Q2_0 (group 64). |
| ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl | Adds Q2_0 dequantize/dequantize4 + get_dm. |
| ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.glsl | Adds Q2_0 CM2 decode buffer + scalar dequant function, and macro selection. |
| ggml/src/ggml-vulkan/vulkan-shaders/copy_to_quant.comp | Adds Q2_0 quantize path (f32 → q2_0). |
| ggml/src/ggml-vulkan/ggml-vulkan.cpp | Wires Q2_0 into shader pipeline creation and capability/dispatch selection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+42
to
+50
| // block format (matches ggml-common.h:187-192 and types.glsl:207-214) | ||
|
|
||
| static const int QK2_0 = 128; | ||
|
|
||
| struct block_q2_0 { | ||
| uint16_t d; // fp16 raw bits | ||
| uint8_t qs[QK2_0/4]; // 32 bytes | ||
| }; | ||
| static_assert(sizeof(block_q2_0) == 2 + 32, "block_q2_0 must be 34 bytes"); |
Comment on lines
+339
to
+343
| // GLSL (dequant_q2_0.comp): | ||
| // #version 450 | ||
| // #include "dequant_head.glsl" | ||
| // layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in; | ||
| // layout (binding = 0) readonly buffer A {block_q2_0 data_a[];}; |
Comment on lines
+19
to
+20
| const uint ib = elem / 64; // block index | ||
| const uint bp = (elem % 64) / 8; // byte-pair within the block (0..7) |
7c6c628 to
0f07ba4
Compare
f5b0db5 to
88f72e5
Compare
0f07ba4 to
a69cff5
Compare
88f72e5 to
3c698af
Compare
a69cff5 to
dc7c932
Compare
3c698af to
8a9cac1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DRAFT PR for testing and review