Skip to content

Q2_0 group 64: Vulkan backend#42

Draft
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-vulkan
Draft

Q2_0 group 64: Vulkan backend#42
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-vulkan

Conversation

@khosravipasha

Copy link
Copy Markdown
Collaborator

DRAFT PR for testing and review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ggml Vulkan backend to support GGML_TYPE_Q2_0 (group size 64) end-to-end by adding the necessary GLSL type definitions, dequant/quant kernels, matmul load path, cooperative-matrix dequant helper, shader generation entries, and Vulkan pipeline wiring. It also introduces a CPU-side shader simulator test intended to validate the GLSL bit-extraction logic against the CPU reference, and enables Q2_0 in the backend ops test matrix.

Changes:

  • Add Vulkan shader support for Q2_0: type/layout (block_q2_0), dequant kernel, dequant helpers, quantize kernel, matmul shared-memory load path, and CM2 decode helper.
  • Wire Q2_0 into Vulkan pipeline creation/selection and shader generation.
  • Add a CPU shader-simulation test and enable Q2_0 coverage in test-backend-ops.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test-vulkan-q2_0-shader-sim.cpp New CPU-based simulator + tests intended to prove Q2_0 shader decoding matches CPU reference.
tests/test-backend-ops.cpp Adds GGML_TYPE_Q2_0 to tested type sets.
tests/CMakeLists.txt Builds/runs the new Q2_0 shader simulator test.
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp Adds q2_0 to shader generation type list and copy/set_rows loops.
ggml/src/ggml-vulkan/vulkan-shaders/types.glsl Introduces block_q2_0 + QUANT_K_Q2_0/QUANT_R_Q2_0 and DATA_A_Q2_0 wiring.
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_funcs.glsl Adds Q2_0 branch to load A into shared memory for matmul.
ggml/src/ggml-vulkan/vulkan-shaders/dequant_q2_0.comp New standalone dequant kernel for Q2_0 (group 64).
ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl Adds Q2_0 dequantize/dequantize4 + get_dm.
ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.glsl Adds Q2_0 CM2 decode buffer + scalar dequant function, and macro selection.
ggml/src/ggml-vulkan/vulkan-shaders/copy_to_quant.comp Adds Q2_0 quantize path (f32 → q2_0).
ggml/src/ggml-vulkan/ggml-vulkan.cpp Wires Q2_0 into shader pipeline creation and capability/dispatch selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +50
// block format (matches ggml-common.h:187-192 and types.glsl:207-214)

static const int QK2_0 = 128;

struct block_q2_0 {
uint16_t d; // fp16 raw bits
uint8_t qs[QK2_0/4]; // 32 bytes
};
static_assert(sizeof(block_q2_0) == 2 + 32, "block_q2_0 must be 34 bytes");
Comment on lines +339 to +343
// GLSL (dequant_q2_0.comp):
// #version 450
// #include "dequant_head.glsl"
// layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;
// layout (binding = 0) readonly buffer A {block_q2_0 data_a[];};
Comment on lines +19 to +20
const uint ib = elem / 64; // block index
const uint bp = (elem % 64) / 8; // byte-pair within the block (0..7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants