[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding) by JulianCloudNTH · Pull Request #20263 · pytorch/executorch

JulianCloudNTH · 2026-06-13T00:08:45Z

Stack from ghstack (oldest at bottom):

[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading #20265
[ExecuTorch][WebGPU] Add et_vk.apply_rotary_emb (interleaved RoPE) + ValueList multi-output #20264
-> [ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding) #20263
[ExecuTorch][WebGPU] linear_q4gsw test suite: Llama-1B shapes + 4k/8k sweep #20227

Adds the WebGPU backend handler for et_vk.embedding_q4gsw.default (a 4-bit groupwise-symmetric quantized embedding gather) plus the host-side integer-input infra it requires.

The op is a single compute dispatch composed of one stage: one thread per 32-element block of each gathered row dequantizes the packed 4-bit table (q = (nibble - 8) * scale; even dim = high nibble, odd dim = low) into the fp32 output, mirroring the Vulkan embedding_q4gsw reference (flat buffer-backed weight; is_linear_weight=true is unsupported and throws). The workgroup size is a wg_size pipeline-override constant clamped to the device limit via WebGPUUtils::clamp_workgroup_size, the 1D dispatch count goes through WebGPUUtils::compute_1d_workgroup_count (validated before any GPU-object allocation), and the embedded WGSL string header is generated by gen_wgsl_headers.py.

Embedding indices arrive as int64 at the program boundary but the serialized graph stores them as int32, so the shared input path is extended with a host-side InputData view ({data, nbytes, host_is_int64}) and copy_inputs gains three branches: a byte-for-byte fast path when host and GPU sizes match, an int64->int32 narrowing copy when the buffer is int32 and the host input is twice as wide (mirrors the Vulkan kLong->kInt staging cast), and a fail-loud throw otherwise. WebGPUTensor gains elem_size/is_int to drive the narrowing decision, and update_symints_from_inputs takes the same InputData vector so execute() builds a single input list consumed by both.

Differential Revision: D108428753

[ghstack-poisoned]

pytorch-bot · 2026-06-13T00:08:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20263

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

❌ 22 New Failures, 1 Unrelated Failure

As of commit 0d9b542 with merge base 5526971 ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t ea3b83ac0da168a180774df2558f1eeeabc4fa82931d851373007698b2d38ed2 /exec failed with exit code 1
pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t ae7e2e44fe0f09e31e79095e653ca2e58f76fd49a3dd02c49ac6c110ee25302f /exec failed with exit code 1
pull / test-qnn-buck-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t f45300f4b38ccbabf07cf43dd9f97cf56d5b6224d959e911033e184d5418a9bf /exec failed with exit code 1
pull / test-qnn-delegate-linux / linux-job (gh)
RuntimeError: Command docker exec -t 16b8b68ae0b358f4173b9c6185988d6fe888eb15c72328cafc8758e20e5d862b /exec failed with exit code 1
pull / test-qnn-direct-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t c1e72a60b1a1d874b6ddef14962603ce9ba7b6ee79d9c5d5cce4e148331fa1ce /exec failed with exit code 1
pull / test-qnn-models-linux (dl3) / linux-job (gh)
RuntimeError: Command docker exec -t 5893cee5e4503995fcd2e61e02435557562cf68c131842351caa7e7529d06c88 /exec failed with exit code 1
pull / test-qnn-models-linux (mv2) / linux-job (gh)
RuntimeError: Command docker exec -t a5673c13282cb534455d1bc448a739d221f77e5829f6dae87f02ca010c968628 /exec failed with exit code 1
pull / test-qnn-models-linux (mv3) / linux-job (gh)
RuntimeError: Command docker exec -t 84feb2a4d75d82cfeedb308a9e63f80bb96f8a36ae4b2958fa1ccfda1a8cd92e /exec failed with exit code 1
pull / test-qnn-passes-linux / linux-job (gh)
RuntimeError: Command docker exec -t 8111aeb06562458ee59e6b7029d5a1cda6a88a671bec717408342eb58bc9ff79 /exec failed with exit code 1
pull / test-qnn-python-imports-linux / linux-job (gh)
RuntimeError: Command docker exec -t ef077f4c56afd4df6a4ced21d48a555b44364aeeadcdb42e83e5099c83a949d0 /exec failed with exit code 1
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t b9ba2904b889d2024d76b47bc0791b940217d0cf0a844fa5b9a6055d42e1ee92 /exec failed with exit code 1
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 5d371fc5c1554e298f263baadfb08af2af763d49e835087309777b69c9ac700f /exec failed with exit code 92
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 2fd41fb3c3532406727de3c35e66b5cc378ce2ee5ee5cdcaa0607a9e80efce10 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t 12b1df0058de97d8cb01a4ec5e91334502eb4ca6ed1d969ce48a5d8d63cf2ca1 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t e4c3de056bc1548234b2485631c5dd6efe22b7b813d2e2341a5e30e5e0964dc1 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh)
RuntimeError: Command docker exec -t 3b2c990d43d0844799800cf56cd386835fbaf80f0e8d06995ad067ce64f61402 /exec failed with exit code 1
pull / test-sqnr-static-llm-qnn-linux (smollm2_135m) / linux-job (gh)
RuntimeError: Command docker exec -t 6cc5573b9fa47cd25544cdcb347d5574e434ae0f404c5637cf56cc7083fbe98b /exec failed with exit code 1
pull / test-static-llama-qnn-linux (stories_110m) / linux-job (gh)
RuntimeError: Command docker exec -t 684aee35927bdacddf5a7089081007e8d455953ad2c7ae0f635cd8ff10e306db /exec failed with exit code 1
pull / test-static-llama-qnn-linux (stories_260k_bc) / linux-job (gh)
RuntimeError: Command docker exec -t e6b3b26e3e812cccb77aae236d8b9d4daeb8d604ba297500c6c9fc84affc9647 /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch
Test QNN Backend / test-qnn / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t 32d385d0450935144c553e0f9d524d4c0376a01bd8165ba9350b2c1e9aea73bc /exec failed with exit code 1
Test QNN Backend / test-qnn / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 11b96a05d48149a26be7a9536e56dc09d5f9ba5a512cba3f2e9c9a738203b74a /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / build-android (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-13T00:09:36Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Update

0d9b542

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 13, 2026 00:08

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20263

[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20263
JulianCloudNTH wants to merge 1 commit into
gh/JulianCloudNTH/25/basefrom
gh/JulianCloudNTH/25/head

JulianCloudNTH commented Jun 13, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulianCloudNTH commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20263

❗ 1 Active SEVs

❌ 22 New Failures, 1 Unrelated Failure

Uh oh!

github-actions Bot commented Jun 13, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulianCloudNTH commented Jun 13, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 13, 2026 •

edited

Loading

This PR needs a `release notes:` label