feat(EC-1816): add multi-component stress benchmark#3331
Conversation
📝 WalkthroughWalkthroughThree new files are added under ChangesStress Benchmark Infrastructure
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
ReviewFindingsMedium
Low
Info
Previous runReviewFindingsLow
Info
Previous run (2)ReviewFindingsLow
Info
|
| } | ||
| n, err := strconv.Atoi(v) | ||
| if err != nil { | ||
| panic(fmt.Sprintf("invalid %s value %q: %v", name, v, err)) |
There was a problem hiding this comment.
[low] edge-case
envInt does not validate that the returned integer is positive. Setting EC_STRESS_COMPONENTS=0 produces a snapshot with zero components, and EC_STRESS_WORKERS=0 or a negative value is passed directly to --workers. These degenerate inputs silently produce meaningless benchmark results rather than failing fast.
Suggested fix: Add a lower-bound check after parsing: if n < 1 { panic(fmt.Sprintf("%s must be >= 1, got %d", name, n)) }
There was a problem hiding this comment.
already addressed.
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 13 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Add a stress benchmark under benchmark/stress/ that validates a multi-component snapshot with configurable worker count, simulating real-world release pipeline workloads that caused OOM (EC-1805). - Component count controlled via EC_STRESS_COMPONENTS (default 10) - Worker count controlled via EC_STRESS_WORKERS (default 35) - Uses the same golden-container image as the simple benchmark, duplicated across components at runtime - Reuses the existing benchmark/internal/suite harness - Includes prepare_data.sh to regenerate offline data archive - Automatically supported by make benchmark_stress via Makefile wildcard rules Resolves: EC-1816 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🤖 Finished Review · ✅ Success · Started 12:58 PM UTC · Completed 1:06 PM UTC |
| } | ||
|
|
||
| closer, err := registry.Launch(path.Join(dir, "data/registry/data")) | ||
| if err != nil { |
There was a problem hiding this comment.
[low] resource-leak
In setup(), if registry.Launch() fails, the temporary directory created by untar.UnTar() is never cleaned up. The panic exits without removing the temp dir. This matches the existing pattern in benchmark/simple/simple.go and the OS reclaims the directory on process exit, so practical risk is minimal.
Suggested fix: Call os.RemoveAll(dir) before panicking on registry.Launch failure, or defer cleanup unconditionally.
There was a problem hiding this comment.
Since the current code matches the current setup in simple.go file, and the OS reclaims the dir on process exit, let's not apply this fix.
| func envInt(name string, fallback int) int { | ||
| v, ok := os.LookupEnv(name) | ||
| if !ok { | ||
| return fallback |
There was a problem hiding this comment.
[low] missing-input-validation
envInt accepts zero and negative values for EC_STRESS_COMPONENTS and EC_STRESS_WORKERS. Setting these to zero or negative values could produce confusing benchmark results.
Suggested fix: Add a check that the returned value is at least 1.
There was a problem hiding this comment.
good point, addressed.
Pull pre-built data.tar.gz from quay.io/conforma/benchmark-data in prepare_data.sh, falling back to upstream regeneration. Add push_data.sh for uploading the archive. Resolves: EC-1816 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🤖 Review · Started 12:25 PM UTC |
Reject zero and negative values for EC_STRESS_COMPONENTS and EC_STRESS_WORKERS to fail fast instead of producing meaningless benchmark results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
🤖 Finished Review · ✅ Success · Started 12:34 PM UTC · Completed 12:44 PM UTC |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmark/stress/prepare_data.sh`:
- Around line 30-36: The oras pull command suppresses error output with
2>/dev/null and the script always falls back to regenerating from upstream on
failure, making CI runs non-reproducible and hiding infrastructure issues.
Remove the error suppression (2>/dev/null) from the oras pull command on line 30
and restructure the logic so that if the oras pull fails, the script exits with
an error rather than continuing to the regeneration fallback. This ensures
benchmark input remains deterministic and surfaces any Quay or authentication
failures instead of silently working around them.
In `@benchmark/stress/stress.go`:
- Around line 26-38: The imports in the stress.go file are not properly ordered
according to the gci formatting standards. Run the project's Go import
formatting tool (typically gci write or go fmt) on the stress.go file to
automatically reorder the imports into the correct grouping: standard library
imports first, followed by blank line, then third-party imports (like
golang.org/x/benchmarks), followed by blank line, then local package imports
(like github.com/conforma/cli). This will resolve the gci formatting check
failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 03e32d5c-c048-4271-92dd-eba0be016eaa
📒 Files selected for processing (3)
benchmark/stress/prepare_data.shbenchmark/stress/push_data.shbenchmark/stress/stress.go
| if command -v oras &>/dev/null && oras pull "${quay_ref}" -o . 2>/dev/null; then | ||
| echo "Downloaded data.tar.gz from ${quay_ref}" | ||
| exit 0 | ||
| fi | ||
|
|
||
| echo "Quay pull failed or oras not available, regenerating from upstream..." | ||
|
|
There was a problem hiding this comment.
Fail closed on Quay pull failures to keep benchmark input deterministic
Line 30 suppresses pull errors and Line 35 always regenerates from upstream on any failure. That makes CI runs non-reproducible and can hide artifact-hosting/auth outages behind a “successful” local rebuild.
Suggested change
-if command -v oras &>/dev/null && oras pull "${quay_ref}" -o . 2>/dev/null; then
+if command -v oras &>/dev/null && oras pull "${quay_ref}" -o .; then
echo "Downloaded data.tar.gz from ${quay_ref}"
exit 0
fi
-echo "Quay pull failed or oras not available, regenerating from upstream..."
+if [[ "${EC_STRESS_ALLOW_REGEN:-0}" != "1" ]]; then
+ echo "Failed to pull ${quay_ref}. Set EC_STRESS_ALLOW_REGEN=1 to regenerate from upstream." >&2
+ exit 1
+fi
+echo "Quay pull failed or oras not available, regenerating from upstream (EC_STRESS_ALLOW_REGEN=1)..."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if command -v oras &>/dev/null && oras pull "${quay_ref}" -o . 2>/dev/null; then | |
| echo "Downloaded data.tar.gz from ${quay_ref}" | |
| exit 0 | |
| fi | |
| echo "Quay pull failed or oras not available, regenerating from upstream..." | |
| if command -v oras &>/dev/null && oras pull "${quay_ref}" -o .; then | |
| echo "Downloaded data.tar.gz from ${quay_ref}" | |
| exit 0 | |
| fi | |
| if [[ "${EC_STRESS_ALLOW_REGEN:-0}" != "1" ]]; then | |
| echo "Failed to pull ${quay_ref}. Set EC_STRESS_ALLOW_REGEN=1 to regenerate from upstream." >&2 | |
| exit 1 | |
| fi | |
| echo "Quay pull failed or oras not available, regenerating from upstream (EC_STRESS_ALLOW_REGEN=1)..." | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark/stress/prepare_data.sh` around lines 30 - 36, The oras pull command
suppresses error output with 2>/dev/null and the script always falls back to
regenerating from upstream on failure, making CI runs non-reproducible and
hiding infrastructure issues. Remove the error suppression (2>/dev/null) from
the oras pull command on line 30 and restructure the logic so that if the oras
pull fails, the script exits with an error rather than continuing to the
regeneration fallback. This ensures benchmark input remains deterministic and
surfaces any Quay or authentication failures instead of silently working around
them.
| import ( | ||
| "encoding/json" | ||
| "fmt" | ||
| "os" | ||
| "path" | ||
| "strconv" | ||
|
|
||
| "golang.org/x/benchmarks/driver" | ||
|
|
||
| "github.com/conforma/cli/benchmark/internal/registry" | ||
| "github.com/conforma/cli/benchmark/internal/suite" | ||
| "github.com/conforma/cli/benchmark/internal/untar" | ||
| ) |
There was a problem hiding this comment.
stress.go is currently failing gci formatting checks
Static analysis reports a gci formatting error (reported at Line 82). Please run the repo’s Go formatting/import-order step for this file to unblock lint/CI.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark/stress/stress.go` around lines 26 - 38, The imports in the
stress.go file are not properly ordered according to the gci formatting
standards. Run the project's Go import formatting tool (typically gci write or
go fmt) on the stress.go file to automatically reorder the imports into the
correct grouping: standard library imports first, followed by blank line, then
third-party imports (like golang.org/x/benchmarks), followed by blank line, then
local package imports (like github.com/conforma/cli). This will resolve the gci
formatting check failure.
Source: Linters/SAST tools
| } | ||
|
|
||
| type snapshot struct { | ||
| Components []component `json:"components"` |
There was a problem hiding this comment.
[medium] stale-reference
The git source URL for golden-container uses the old organization name enterprise-contract (https://github.com/enterprise-contract/golden-container.git) while the rest of the codebase has migrated to https://github.com/conforma/golden-container. The git revision also differs from the simple benchmark, suggesting it may reference a commit in the old repo.
Suggested fix: Use https://github.com/conforma/golden-container to match the existing simple benchmark pattern, and verify the revision hash exists in the conforma fork.
|
|
||
| const ( | ||
| defaultComponents = 10 | ||
| defaultWorkers = 35 |
There was a problem hiding this comment.
[low] edge-case
The envInt function panics on values < 1 but does not guard against unreasonably large values. An extremely large EC_STRESS_COMPONENTS value would cause buildSnapshot to allocate a massive slice, likely causing OOM.
| if err != nil { | ||
| panic(err) | ||
| } | ||
| return string(data) |
There was a problem hiding this comment.
[info] pattern-violation
The policy JSON string uses well-formed JSON (no trailing commas) while the simple benchmark has trailing commas. The stress benchmark is more correct, but the inconsistency is notable.
benchmark/stress/that validates a multi-component snapshot with 35 workers, simulating the workload that caused the OOM incident (EC-1805)EC_STRESS_COMPONENTS, default 10) and worker count (EC_STRESS_WORKERS, default 35) are parameterized via env vars for CI tuningbenchmark/internal/suite, registry, untar) and the same golden-container image data, duplicated across components at runtimeResolves: EC-1816