[ExecuTorch][WebGPU] Add fused SDPA (sdpa_with_kv_cache) with dynamic input_pos by pytorchbot · Pull Request #20261 · pytorch/executorch

pytorchbot · 2026-06-13T00:01:01Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20086 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/19/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/19/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/22/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/19/orig

@diff-train-skip-merge

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

…ation) Pull Request resolved: #20201 Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of: `WebGPUQueryPool`, a faithful re-port of Vulkan's `vkapi::QueryPool` (`backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}`) — same `ShaderDuration` data model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-pass `timestampWrites` descriptor since there is no mid-encoder `writeTimestamp`; readback via `resolveQuerySet` + buffer map rather than host-side `vkGetQueryPoolResults`; the `TimestampQuery` capability requested as an explicit device feature, fail-open if the adapter lacks it). `WebGPUDevice` gains timestamp-feature detection, and `WebGPUGraph` gains a per-dispatch `kernel_name` label plus `execute()` bracketing of each compute pass when the pool is active. Opt-in via the `WEBGPU_TIMESTAMP_QUERY` env var; off by default, so the production `execute()` path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op. Co-authored with Claude. ghstack-source-id: 392975889 @exported-using-ghexport Differential Revision: [D108188287](https://our.internmc.facebook.com/intern/diff/D108188287/)

… input_pos Pull Request resolved: #20086 Adds the fused `sdpa_with_kv_cache` op (QK attention-weights, softmax, attention-output sub-kernels over the KV cache), composing the three enablers below it: the base graph's inter-dispatch buffer passing (scratch buffers + multi-pass execute), the `update_cache` op, and the SymInt live-scalar mechanism. The QK/softmax/AV kernels mirror the Vulkan reference's flat-index/GQA/causal-mask math (NCHW, buffer-only, fp32). `input_pos` is consumed dynamically via the SymInt mechanism: the op reads `symint_buffer()` as a uniform, sizes its scratch + dispatches for the max context length, and registers a resize hook so a single delegate runs an autoregressive decode loop (feed only the new token + advancing `input_pos`) instead of a fixed baked position. Mirrors the Vulkan SymInt = live uniform-buffer design. Tests live in the stacked test-suite diff above (clean op diff here). Authored with assistance from Claude. ghstack-source-id: 392609088 @exported-using-ghexport Differential Revision: [D107595125](https://our.internmc.facebook.com/intern/diff/D107595125/)

pytorch-bot · 2026-06-13T00:01:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20261

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-13T00:37:30Z

The committers listed above are authorized under a signed CLA.

✅ login: JulianCloudNTH / name: Julian Ng-Thow-Hing (49c6160, b7d4d31)
✅ login: JulianCloudNTH / name: JulianCloudNTH (1e81b4f)

JulianCloudNTH added 2 commits June 12, 2026 15:35

pytorchbot requested review from kirklandsign and larryliu0820 as code owners June 13, 2026 00:01

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 13, 2026

Merge branch 'main' into gh/JulianCloudNTH/19/orig

1e81b4f

JulianCloudNTH requested review from JacobSzwejbka, SS-JIA, abhinaykukkadapu, digantdesai, manuelcandales, mergennachin, psiddh, rascani and robert-kalmar as code owners June 13, 2026 00:37

JulianCloudNTH temporarily deployed to cadence June 13, 2026 00:37 — with GitHub Actions Inactive

JulianCloudNTH had a problem deploying to cadence June 13, 2026 00:37 — with GitHub Actions Error

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Jun 13, 2026

JulianCloudNTH closed this Jun 13, 2026

JulianCloudNTH reopened this Jun 13, 2026

JulianCloudNTH had a problem deploying to cadence June 13, 2026 00:42 — with GitHub Actions Error

JulianCloudNTH closed this Jun 13, 2026

JulianCloudNTH self-requested a review June 13, 2026 00:58

JulianCloudNTH reopened this Jun 13, 2026

JulianCloudNTH approved these changes Jun 13, 2026

View reviewed changes

JulianCloudNTH temporarily deployed to cadence June 13, 2026 00:59 — with GitHub Actions Inactive

JulianCloudNTH closed this Jun 13, 2026

JulianCloudNTH temporarily deployed to upload-benchmark-results June 13, 2026 02:06 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add fused SDPA (sdpa_with_kv_cache) with dynamic input_pos#20261

[ExecuTorch][WebGPU] Add fused SDPA (sdpa_with_kv_cache) with dynamic input_pos#20261
pytorchbot wants to merge 3 commits into
gh/JulianCloudNTH/22/origfrom
gh/JulianCloudNTH/19/orig

pytorchbot commented Jun 13, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 13, 2026

Uh oh!

linux-foundation-easycla Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Jun 13, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20261

❗ 1 Active SEVs

Uh oh!

linux-foundation-easycla Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorchbot commented Jun 13, 2026 •

edited by pytorch-bot Bot

Loading

linux-foundation-easycla Bot commented Jun 13, 2026 •

edited

Loading