ci: add test duration tracking workflow#4290
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
b8d8e85 to
4473112
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: test-results-${{ inputs.device_type }}-${{ inputs.worker_group }} |
There was a problem hiding this comment.
I think we should append ${{ github.run_id }} to the artifact name. This is to avoid mixing files from different runs.
For download, we should download files of the same run_id.
Can we also add the expiration time of these temporary files, such as:
retention-days: 1 # Automatically delete the artifact from the cloud after 24 hours
There was a problem hiding this comment.
Thanks for the suggestions, I have addresses all of these points:
- Added
${{ github.run_id }}to the artifact name in the upload step and updated the download pattern in ci_pipeline.yml to match. - Ensure temporary files are automatically cleaned up.
| comment-on-alert: true | ||
| fail-on-alert: true | ||
|
|
||
| - name: Upload Updated Baseline to GCS |
There was a problem hiding this comment.
Can we use gh-pages to store the benchmark data? github-action-benchmark can generate a index.html so we can see the interactive chart of the benchmark history.
There was a problem hiding this comment.
Yes, we used a GCS bucket as a temporary method to test the core logic. Now we have configured the github-action-benchmark step to automatically push the generated index.html to the gh-pages branch (under the dev/bench directory)."
There was a problem hiding this comment.
For the live github.io URL, we need to manually flips the switch in repo settings to tell GitHub to host the site. Until then, we can download the artifacts to view the interactive chart instead.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'MaxText Test Execution Times'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.
| Benchmark suite | Current: fa4f50f | Previous: b92a594 | Ratio |
|---|---|---|---|
[CPU] File: test-results-cpu-4.xml |
2523.7775 sec |
1069.585 sec |
2.36 |
[CPU] File: test-results-cpu-3.xml |
585.0975 sec |
262.184 sec |
2.23 |
[CPU] File: test-results-cpu-2.xml |
366.72249999999997 sec |
167.654 sec |
2.19 |
[CPU] File: test-results-cpu-1.xml |
452.88 sec |
183.82 sec |
2.46 |
Total CPU Tests Duration |
3928.4775 sec |
1683.243 sec |
2.33 |
This comment was automatically generated by workflow using github-action-benchmark.
6707cd8 to
b2781a8
Compare
b2781a8 to
9fc2de3
Compare
Description
This PR implements a comprehensive automated mechanism to detect and prevent test suite run-time inflation before code is merged into
main, addressing the need for a fast, reliable, and responsive CI pipeline.Changes Made:
Macro-Level Suite Tracking (
gh-pages):github-action-benchmarkto track macro-level execution times (suite/shard level).tests/utils/parse_junit_to_benchmark.pyto convert JUnit XML test results into the JSON format required by the benchmark action.gh-pagesbranch for historical trend visualization.Micro-Level Individual Test Tracking (Codecov):
codecov/test-results-action@v1to track fine-grained execution times of individual test functions.Absolute Hard Limit Enforcement:
tests/utils/enforce_test_limits.pyto automatically flag and block any new test that exceeds absolute execution time limits.Dummy Demonstration Test (This will be deleted before final merge.):
tests/unit/dummy_timeout_test.py(which sleeps for 250s) to demonstrate the CI blocking mechanism.Tests
github-action-benchmarkin CI.enforce_test_limits.pylocally and in CI. The pipeline successfully catchesdummy_timeout_test.pyand emits the correct GitHub Actions error annotation.Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.