ci: add test duration tracking workflow by hsuan-lun-chiang · Pull Request #4290 · AI-Hypercomputer/maxtext

hsuan-lun-chiang · 2026-06-29T09:44:22Z

Description

This PR implements a comprehensive automated mechanism to detect and prevent test suite run-time inflation before code is merged into main, addressing the need for a fast, reliable, and responsive CI pipeline.

Changes Made:

Macro-Level Suite Tracking (gh-pages):
- Integrated github-action-benchmark to track macro-level execution times (suite/shard level).
- Added tests/utils/parse_junit_to_benchmark.py to convert JUnit XML test results into the JSON format required by the benchmark action.
- Results are automatically pushed to the gh-pages branch for historical trend visualization.
Micro-Level Individual Test Tracking (Codecov):
- Integrated codecov/test-results-action@v1 to track fine-grained execution times of individual test functions.
- This avoids UI/artifact bloat while giving developers deep visibility into exactly which test function slowed down.
Absolute Hard Limit Enforcement:
- Added tests/utils/enforce_test_limits.py to automatically flag and block any new test that exceeds absolute execution time limits.
- Current limits: 240.0s for Unit Tests, 300.0s for Integration Tests, based on existing run-time.
Dummy Demonstration Test (This will be deleted before final merge.):
- Added tests/unit/dummy_timeout_test.py (which sleeps for 250s) to demonstrate the CI blocking mechanism.

Tests

Validated macro-level tracking via github-action-benchmark in CI.
Validated micro-level Codecov upload step.
Tested enforce_test_limits.py locally and in CI. The pipeline successfully catches dummy_timeout_test.py and emits the correct GitHub Actions error annotation.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

google-cla · 2026-06-29T09:44:34Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

codecov · 2026-07-01T03:08:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

xibinliu · 2026-07-01T15:19:10Z

+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-${{ inputs.device_type }}-${{ inputs.worker_group }}


I think we should append ${{ github.run_id }} to the artifact name. This is to avoid mixing files from different runs.

For download, we should download files of the same run_id.

Can we also add the expiration time of these temporary files, such as:

retention-days: 1 # Automatically delete the artifact from the cloud after 24 hours

Thanks for the suggestions, I have addresses all of these points:

Added ${{ github.run_id }} to the artifact name in the upload step and updated the download pattern in ci_pipeline.yml to match.

Ensure temporary files are automatically cleaned up.

xibinliu · 2026-07-01T15:57:48Z

+          comment-on-alert: true
+          fail-on-alert: true
+
+      - name: Upload Updated Baseline to GCS


Can we use gh-pages to store the benchmark data? github-action-benchmark can generate a index.html so we can see the interactive chart of the benchmark history.

Yes, we used a GCS bucket as a temporary method to test the core logic. Now we have configured the github-action-benchmark step to automatically push the generated index.html to the gh-pages branch (under the dev/bench directory)."

For the live github.io URL, we need to manually flips the switch in repo settings to tell GitHub to host the site. Until then, we can download the artifacts to view the interactive chart instead.

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'MaxText Test Execution Times'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite	Current: `fa4f50f`	Previous: `b92a594`	Ratio
`[CPU] File: test-results-cpu-4.xml`	`2523.7775` sec	`1069.585` sec	`2.36`
`[CPU] File: test-results-cpu-3.xml`	`585.0975` sec	`262.184` sec	`2.23`
`[CPU] File: test-results-cpu-2.xml`	`366.72249999999997` sec	`167.654` sec	`2.19`
`[CPU] File: test-results-cpu-1.xml`	`452.88` sec	`183.82` sec	`2.46`
`Total CPU Tests Duration`	`3928.4775` sec	`1683.243` sec	`2.33`

This comment was automatically generated by workflow using github-action-benchmark.

hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch 2 times, most recently from b8d8e85 to 4473112 Compare July 1, 2026 03:03

xibinliu reviewed Jul 1, 2026

View reviewed changes

github-actions Bot reviewed Jul 2, 2026

View reviewed changes

hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch 5 times, most recently from 6707cd8 to b2781a8 Compare July 3, 2026 10:59

ci: add test duration tracking workflow

9fc2de3

hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch from b2781a8 to 9fc2de3 Compare July 3, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: add test duration tracking workflow#4290

ci: add test duration tracking workflow#4290
hsuan-lun-chiang wants to merge 1 commit into
mainfrom
ci/test-duration-tracking

hsuan-lun-chiang commented Jun 29, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Jun 29, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

xibinliu Jul 1, 2026 •

edited

Loading

Uh oh!

hsuan-lun-chiang Jul 2, 2026 •

edited

Loading

Uh oh!

xibinliu Jul 1, 2026

Uh oh!

hsuan-lun-chiang Jul 2, 2026

Uh oh!

hsuan-lun-chiang Jul 2, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hsuan-lun-chiang commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made:

Tests

Checklist

Uh oh!

google-cla Bot commented Jun 29, 2026

Uh oh!

codecov Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xibinliu Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsuan-lun-chiang Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xibinliu Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

hsuan-lun-chiang Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

hsuan-lun-chiang Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hsuan-lun-chiang commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jul 1, 2026 •

edited

Loading

xibinliu Jul 1, 2026 •

edited

Loading

hsuan-lun-chiang Jul 2, 2026 •

edited

Loading

github-actions Bot left a comment •

edited

Loading