Reduce FLUX int8 test peak memory with sequential offload by jiqing-feng · Pull Request #13776 · huggingface/diffusers

jiqing-feng · 2026-05-21T02:01:49Z

Summary

Update the slow FLUX bitsandbytes int8 tests to use sequential CPU offload instead of model CPU offload.

enable_model_cpu_offload() can move an entire sub-model onto the GPU at once. For black-forest-labs/FLUX.1-dev, this can OOM on <=24 GB cards even when the T5 encoder and transformer are loaded from the pre-quantized int8 test checkpoint. Sequential CPU offload keeps peak memory lower by materializing one layer at a time, which lets the int8 FLUX tests run in more constrained environments.

The LoRA-loading assertion tolerance is also relaxed from 1e-3 to 2e-3 to account for small backend-specific numerical differences observed in the slow int8 path.

Changes

Switch SlowBnb8bitFluxTests setup from enable_model_cpu_offload() to enable_sequential_cpu_offload().
Document why sequential offload is needed for the FLUX int8 slow tests.
Relax the test_lora_loading cosine-distance tolerance to 2e-3.

Validation

Run the affected slow tests:

RUN_SLOW=1 python -m pytest \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_quality \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_lora_loading \
  -x -s

Observed result:

2 passed

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-05-21T02:08:30Z

require change: huggingface/accelerate#4044 merged.

jiqing-feng · 2026-05-29T02:25:36Z

Hi @sayakpaul . Would you please review the PR? Thanks!

sayakpaul · 2026-05-29T06:17:04Z

+        # enable_model_cpu_offload moves an entire sub-model to GPU at once, which OOMs on
+        # <=24 GB cards for FLUX.1-dev even with int8 quantization.
+        # This requires the bitsandbytes fix that preserves Int8Params.SCB across .to() calls.
+        self.pipeline_8bit.enable_sequential_cpu_offload()


Why do we keep making the same kind of changes i.e., if something fails on your particular environment, it's always better to guard them accordingly rather than doing it in a straightforward way like this.

Thanks for the feedback! Updated to guard by device memory instead of unconditionally switching:

_, total_mem = torch.accelerator.get_memory_info(0) if total_mem <= 25 * (1024**3): self.pipeline_8bit.enable_sequential_cpu_offload() else: self.pipeline_8bit.enable_model_cpu_offload()

This keeps the original enable_model_cpu_offload path on large-memory devices and only falls back to sequential offload on ≤24 GB cards. torch.accelerator works across CUDA/XPU/ROCm.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-06-02T01:18:32Z

Hi @sayakpaul . I have fixed your comment, please review the new change. Thanks!

jiqing-feng · 2026-06-05T02:44:54Z

Hi @sayakpaul . Would you please review the PR? Thanks!

jiqing-feng added 3 commits May 20, 2026 14:12

fix oom

6d36ba9

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

revert

53e0a7c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

adjust tol

862eb67

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions Bot added tests size/S PR with diff < 50 LOC labels May 21, 2026

jiqing-feng changed the title ~~Fix OOM on int8 tests~~ Reduce FLUX int8 test peak memory with sequential offload May 21, 2026

Merge branch 'main' into test_xpu

1ee339d

Merge branch 'main' into test_xpu

622d830

sayakpaul reviewed May 29, 2026

View reviewed changes

jiqing-feng added 3 commits May 29, 2026 14:32

fix memory check

84eda18

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into test_xpu

5eb61bf

Merge branch 'main' into test_xpu

7e22527

Merge branch 'main' into test_xpu

cd46c81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce FLUX int8 test peak memory with sequential offload#13776

Reduce FLUX int8 test peak memory with sequential offload#13776
jiqing-feng wants to merge 9 commits into
huggingface:mainfrom
jiqing-feng:test_xpu

jiqing-feng commented May 21, 2026 •

edited

Loading

Uh oh!

jiqing-feng commented May 21, 2026

Uh oh!

jiqing-feng commented May 29, 2026

Uh oh!

sayakpaul May 29, 2026

Uh oh!

jiqing-feng May 29, 2026

Uh oh!

jiqing-feng commented Jun 2, 2026

Uh oh!

jiqing-feng commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiqing-feng commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Uh oh!

jiqing-feng commented May 21, 2026

Uh oh!

jiqing-feng commented May 29, 2026

Uh oh!

sayakpaul May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jiqing-feng May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jiqing-feng commented Jun 2, 2026

Uh oh!

jiqing-feng commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiqing-feng commented May 21, 2026 •

edited

Loading