Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772
Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772dgarske wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves hardware-accelerated crypto offload behavior for Intel QuickAssist (QAT) and Cavium/Nitrox, focusing on better multi-device utilization, more robust software fallback when QAT isn’t available, and a Nitrox polling safety fix.
Changes:
- Reorders QAT crypto instances to interleave across devices by default (opt-out via
QAT_NO_DEV_INTERLEAVE) to improve utilization at lower thread counts. - Fixes software-fallback behavior in the QAT NUMA allocator when the QAT service isn’t started, allowing crypto to proceed in software.
- Fixes a Cavium/Nitrox multi-request polling OOB condition by resetting
req_countafter buffer flush; also corrects an RSA public free heap parameter and expands Intel QAT documentation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| wolfssl/wolfcrypt/port/intel/quickassist_mem.h | Adds an internal “is QAT started” query used by the QAT memory layer to decide when to fall back to regular memory. |
| wolfcrypt/src/port/intel/README.md | Updates Intel QAT usage docs (non-sudo operation, serialized testing guidance, multi-device benchmarking, diagnostics). |
| wolfcrypt/src/port/intel/quickassist.c | Adds IntelQaIsStarted() and instance interleaving logic; fixes RSA public free heap usage. |
| wolfcrypt/src/port/intel/quickassist_mem.c | Adds fallback to regular malloc when NUMA allocation fails and QAT service is not started. |
| wolfcrypt/src/async.c | Resets Cavium req_count after flushing multi-request poll buffer to avoid OOB writes. |
| Makefile.am | Serializes make execution when Intel QAT is enabled via .NOTPARALLEL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
retest this please |
|
Jenkins retest this please - no history |
|
Retest this please Jenkins : Windows ACVP test failed to find file. |
wolfSSL-Fenrir-bot
left a comment
There was a problem hiding this comment.
Fenrir Automated Review — PR #10772
Scan targets checked: wolfcrypt-bugs, wolfcrypt-port-bugs, wolfcrypt-rs-bugs, wolfcrypt-src, wolfssl-bugs, wolfssl-src
No new issues found in the changed files. ✅
| #ifdef WOLFSSL_ASYNC_CRYPT_SW | ||
| WOLFSSL_API int wc_AsyncSwInit(WC_ASYNC_DEV* dev, int type); | ||
| /* Test hook: force the given WC_ASYNC_SW_TYPE to complete synchronously | ||
| * (do not suspend) so the software simulator can reproduce a specific | ||
| * suspend ordering. Pass ASYNC_SW_NONE to disable. */ | ||
| WOLFSSL_API void wolfAsync_SwForceSyncType(int type); | ||
| #endif |
| /* Returns nonzero when the QAT crypto service is running. The memory layer | ||
| * uses this to decide whether a failed NUMA allocation should fall back to | ||
| * regular memory (service not started -> software mode) or remain NULL (real | ||
| * NUMA exhaustion while the device is in use). */ | ||
| int IntelQaIsStarted(void) | ||
| { | ||
| return (g_cyServiceStarted == CPA_TRUE) ? 1 : 0; | ||
| } |
Changes
cpaCyGetInstances()returns instances grouped by device, so the per-thread round-robin piled thread counts below the instance count onto device 0.IntelQaInterleaveInstances()reorders by device so consecutive threads land on different devices. Default on; opt-outQAT_NO_DEV_INTERLEAVE.-142/-140/-173) whenever the device couldn't be opened. It now falls back to regular memory so crypto runs in software, gated byIntelQaIsStarted()so a live device still gets a clean error on real NUMA exhaustion.req_countOOB write.wolfAsync_EventQueuePoll()did not resetreq_countafter the multi-request flush, indexing pastmulti_req.req[CAVIUM_MAX_POLL].HAVE_CAVIUM-gated. (CWE-787, Project Vanessa.)devinstead ofdev->heap.port/intel/README.md): sudo-free operation, serialmake check, multi-device benchmark guidance, and a QAT health-diagnostics section.Performance (3x Intel C62x, RSA-2048 sign, ops/sec)
The interleave spreads load across all 3 devices at thread counts below the instance count (18); neutral above that. AES unchanged vs master.