Skip to content

ci: add test duration tracking workflow#4290

Draft
hsuan-lun-chiang wants to merge 1 commit into
mainfrom
ci/test-duration-tracking
Draft

ci: add test duration tracking workflow#4290
hsuan-lun-chiang wants to merge 1 commit into
mainfrom
ci/test-duration-tracking

Conversation

@hsuan-lun-chiang

@hsuan-lun-chiang hsuan-lun-chiang commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR implements a comprehensive automated mechanism to detect and prevent test suite run-time inflation before code is merged into main, addressing the need for a fast, reliable, and responsive CI pipeline.

Changes Made:

  1. Macro-Level Suite Tracking (gh-pages):

    • Integrated github-action-benchmark to track macro-level execution times (suite/shard level).
    • Added tests/utils/parse_junit_to_benchmark.py to convert JUnit XML test results into the JSON format required by the benchmark action.
    • Results are automatically pushed to the gh-pages branch for historical trend visualization.
  2. Micro-Level Individual Test Tracking (Codecov):

    • Integrated codecov/test-results-action@v1 to track fine-grained execution times of individual test functions.
    • This avoids UI/artifact bloat while giving developers deep visibility into exactly which test function slowed down.
  3. Absolute Hard Limit Enforcement:

    • Added tests/utils/enforce_test_limits.py to automatically flag and block any new test that exceeds absolute execution time limits.
    • Current limits: 240.0s for Unit Tests, 300.0s for Integration Tests, based on existing run-time.
  4. Dummy Demonstration Test (This will be deleted before final merge.):

    • Added tests/unit/dummy_timeout_test.py (which sleeps for 250s) to demonstrate the CI blocking mechanism.

Tests

  • Validated macro-level tracking via github-action-benchmark in CI.
  • Validated micro-level Codecov upload step.
  • Tested enforce_test_limits.py locally and in CI. The pipeline successfully catches dummy_timeout_test.py and emits the correct GitHub Actions error annotation.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@google-cla

google-cla Bot commented Jun 29, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@hsuan-lun-chiang hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch 2 times, most recently from b8d8e85 to 4473112 Compare July 1, 2026 03:03
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ inputs.device_type }}-${{ inputs.worker_group }}

@xibinliu xibinliu Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should append ${{ github.run_id }} to the artifact name. This is to avoid mixing files from different runs.

For download, we should download files of the same run_id.

Can we also add the expiration time of these temporary files, such as:

retention-days: 1 # Automatically delete the artifact from the cloud after 24 hours

@hsuan-lun-chiang hsuan-lun-chiang Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions, I have addresses all of these points:

  1. Added ${{ github.run_id }} to the artifact name in the upload step and updated the download pattern in ci_pipeline.yml to match.
  2. Ensure temporary files are automatically cleaned up.

Comment thread .github/workflows/ci_pipeline.yml Outdated
comment-on-alert: true
fail-on-alert: true

- name: Upload Updated Baseline to GCS

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use gh-pages to store the benchmark data? github-action-benchmark can generate a index.html so we can see the interactive chart of the benchmark history.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we used a GCS bucket as a temporary method to test the core logic. Now we have configured the github-action-benchmark step to automatically push the generated index.html to the gh-pages branch (under the dev/bench directory)."

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the live github.io URL, we need to manually flips the switch in repo settings to tell GitHub to host the site. Until then, we can download the artifacts to view the interactive chart instead.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'MaxText Test Execution Times'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: fa4f50f Previous: b92a594 Ratio
[CPU] File: test-results-cpu-4.xml 2523.7775 sec 1069.585 sec 2.36
[CPU] File: test-results-cpu-3.xml 585.0975 sec 262.184 sec 2.23
[CPU] File: test-results-cpu-2.xml 366.72249999999997 sec 167.654 sec 2.19
[CPU] File: test-results-cpu-1.xml 452.88 sec 183.82 sec 2.46
Total CPU Tests Duration 3928.4775 sec 1683.243 sec 2.33

This comment was automatically generated by workflow using github-action-benchmark.

@hsuan-lun-chiang hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch 5 times, most recently from 6707cd8 to b2781a8 Compare July 3, 2026 10:59
@hsuan-lun-chiang hsuan-lun-chiang force-pushed the ci/test-duration-tracking branch from b2781a8 to 9fc2de3 Compare July 3, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants