Q2_0 group 64: Vulkan backend by khosravipasha · Pull Request #42 · PrismML-Eng/llama.cpp

khosravipasha · 2026-06-10T23:01:39Z

DRAFT PR for testing and review

Copilot

Pull request overview

This PR extends the ggml Vulkan backend to support GGML_TYPE_Q2_0 (group size 64) end-to-end by adding the necessary GLSL type definitions, dequant/quant kernels, matmul load path, cooperative-matrix dequant helper, shader generation entries, and Vulkan pipeline wiring. It also introduces a CPU-side shader simulator test intended to validate the GLSL bit-extraction logic against the CPU reference, and enables Q2_0 in the backend ops test matrix.

Changes:

Add Vulkan shader support for Q2_0: type/layout (block_q2_0), dequant kernel, dequant helpers, quantize kernel, matmul shared-memory load path, and CM2 decode helper.
Wire Q2_0 into Vulkan pipeline creation/selection and shader generation.
Add a CPU shader-simulation test and enable Q2_0 coverage in test-backend-ops.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test-vulkan-q2_0-shader-sim.cpp	New CPU-based simulator + tests intended to prove Q2_0 shader decoding matches CPU reference.
tests/test-backend-ops.cpp	Adds `GGML_TYPE_Q2_0` to tested type sets.
tests/CMakeLists.txt	Builds/runs the new Q2_0 shader simulator test.
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp	Adds `q2_0` to shader generation type list and copy/set_rows loops.
ggml/src/ggml-vulkan/vulkan-shaders/types.glsl	Introduces `block_q2_0` + `QUANT_K_Q2_0/QUANT_R_Q2_0` and `DATA_A_Q2_0` wiring.
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_funcs.glsl	Adds Q2_0 branch to load A into shared memory for matmul.
ggml/src/ggml-vulkan/vulkan-shaders/dequant_q2_0.comp	New standalone dequant kernel for Q2_0 (group 64).
ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl	Adds Q2_0 dequantize/dequantize4 + `get_dm`.
ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.glsl	Adds Q2_0 CM2 decode buffer + scalar dequant function, and macro selection.
ggml/src/ggml-vulkan/vulkan-shaders/copy_to_quant.comp	Adds Q2_0 quantize path (f32 → q2_0).
ggml/src/ggml-vulkan/ggml-vulkan.cpp	Wires Q2_0 into shader pipeline creation and capability/dispatch selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+// block format (matches ggml-common.h:187-192 and types.glsl:207-214)
+
+static const int QK2_0 = 128;
+
+struct block_q2_0 {
+    uint16_t d;          // fp16 raw bits
+    uint8_t  qs[QK2_0/4]; // 32 bytes
+};
+static_assert(sizeof(block_q2_0) == 2 + 32, "block_q2_0 must be 34 bytes");


+// GLSL (dequant_q2_0.comp):
+//   #version 450
+//   #include "dequant_head.glsl"
+//   layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;
+//   layout (binding = 0) readonly buffer A {block_q2_0 data_a[];};


+    const uint ib = elem / 64;          // block index
+    const uint bp = (elem % 64) / 8;    // byte-pair within the block (0..7)


khosravipasha requested a review from Copilot June 10, 2026 23:05

Copilot started reviewing on behalf of khosravipasha June 10, 2026 23:05 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

khosravipasha force-pushed the pr/q2_0-cpu branch from 7c6c628 to 0f07ba4 Compare June 11, 2026 00:08

khosravipasha force-pushed the pr/q2_0-vulkan branch from f5b0db5 to 88f72e5 Compare June 11, 2026 00:08

khosravipasha force-pushed the pr/q2_0-cpu branch from 0f07ba4 to a69cff5 Compare June 11, 2026 00:28

khosravipasha force-pushed the pr/q2_0-vulkan branch from 88f72e5 to 3c698af Compare June 11, 2026 00:28

Q2_0 group 64: Vulkan backend

8a9cac1

khosravipasha force-pushed the pr/q2_0-cpu branch from a69cff5 to dc7c932 Compare June 11, 2026 00:37

khosravipasha force-pushed the pr/q2_0-vulkan branch from 3c698af to 8a9cac1 Compare June 11, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q2_0 group 64: Vulkan backend#42

Q2_0 group 64: Vulkan backend#42
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-vulkan

khosravipasha commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		const uint ib = elem / 64; // block index
		const uint bp = (elem % 64) / 8; // byte-pair within the block (0..7)

Conversation

khosravipasha commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants