Skip to content

Feat: Integrate Gemma3 weight mappings, vLLM adapter, and E2E pipelines#4068

Open
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/gemma3-lora-e2e-integration
Open

Feat: Integrate Gemma3 weight mappings, vLLM adapter, and E2E pipelines#4068
RexBearIU wants to merge 1 commit into
mainfrom
jackyf/gemma3-lora-e2e-integration

Conversation

@RexBearIU

@RexBearIU RexBearIU commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Description

Integrates Gemma3 with the MaxText-vLLM adapter and end-to-end post-training pipelines, adding full weight conversions, logical partition rules, and verification scripts for post-SFT serving.

Note: This PR depends on PR #4066 (Gemma3/4 decoders fixes) and PR #4067 (LoRA restore refactoring) being merged first.

Key Changes:

  • Gemma3 Weight Mapping (src/maxtext/integration/tunix/weight_mapping/gemma3.py): Added full weight mappings for attention queries, keys,
    values, output projections, and gated MLPs.
  • vLLM Adapter Integration (src/maxtext/integration/vllm/maxtext_vllm_adapter/adapter.py): Integrated context initialization and logical
    partition rules for Gemma3 serving.
  • E2E Integration Script (tests/end_to_end/tpu/gemma3/4b/test_gemma3_lora.sh): Added end-to-end shell test script validating full-pass
    generation correctness, logit matches, and successful decoding post-SFT.

Tests

  • Validated full post-SFT weight loading and end-to-end decoding steps on TPU.
  • Executed integration script:
bash tests/end_to_end/tpu/gemma3/4b/test_gemma3_lora.sh 

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@RexBearIU RexBearIU force-pushed the jackyf/gemma3-lora-e2e-integration branch from e16648e to faf1c26 Compare June 4, 2026 14:23
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

@RexBearIU RexBearIU requested a review from darisoy as a code owner June 4, 2026 15:02
@RexBearIU RexBearIU changed the title feat(lora): Add Gemma3 weight mappings, vLLM adapter serving, and end… Integrate Gemma3 weight mappings, vLLM adapter, and E2E pipelines Jun 8, 2026
@RexBearIU RexBearIU changed the title Integrate Gemma3 weight mappings, vLLM adapter, and E2E pipelines Feat: Integrate Gemma3 weight mappings, vLLM adapter, and E2E pipelines Jun 8, 2026
@RexBearIU RexBearIU force-pushed the jackyf/gemma3-lora-e2e-integration branch from faf1c26 to 50963ee Compare June 8, 2026 07:19
@RexBearIU RexBearIU force-pushed the jackyf/gemma3-lora-e2e-integration branch 2 times, most recently from 27fc440 to eadf862 Compare June 26, 2026 08:14
Comment thread src/maxtext/inference/vllm_decode.py Outdated
rollout_vllm_additional_config = {
"maxtext_config": {
"model_name": config.model_name,
"weight_dtype": "bfloat16",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good idea hardcoded weight dtype being bf16? waht about config.weight_dtype

vllm_hf_overrides='{architectures: ["MaxTextForCausalLM"]}' \
hbm_utilization_vllm=0.6 \
prompt="Suggest some famous landmarks in London." \
use_chat_template=True scan_layers=false

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line for sca_layers=false


# Step 4: Run inference on the checkpoint generated from the previous run
python3 -m maxtext.inference.vllm_decode \
--use_tunix=True \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove --?

@RexBearIU RexBearIU force-pushed the jackyf/gemma3-lora-e2e-integration branch from eadf862 to f93d241 Compare July 1, 2026 08:43
@RexBearIU RexBearIU requested a review from xibinliu as a code owner July 1, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants