USHIFT-6808: C2CC Disruption Tests (Restarting Services, toggling ethernet)#6930
USHIFT-6808: C2CC Disruption Tests (Restarting Services, toggling ethernet)#6930pmtk wants to merge 10 commits into
Conversation
|
@pmtk: This pull request references USHIFT-6808 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR adds disruptive C2CC test coverage, shared Robot helpers for cluster validation and recovery, shell runner updates for test filtering, and new bootc scenario scripts that execute the disruptive suite. ChangesC2CC disruptive test flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/bin/c2cc_common.sh`:
- Around line 186-193: The `setup_hosts` logic is handling `get_host_ip`
failures with early returns, but `full_vm_name` for `host2_vm` and `host3_vm` is
not checked the same way. Update the assignments around `full_vm_name`,
`host2_vm`, and `host3_vm` so a failure aborts immediately instead of
propagating empty `HOST2_VM_NAME`/`HOST3_VM_NAME` values into the NIC chaos
keywords. Keep the error handling consistent with the existing `get_host_ip`
pattern by returning from the function on lookup failure before the `readonly`
exports.
In `@test/resources/c2cc.resource`:
- Around line 562-563: The deregistration cleanup currently always calls
SSHLibrary.Close Connection even when SSHLibrary.Switch Connection fails, which
can close the wrong active SSH session on повторное deregistration. Update the
teardown logic in the affected resource so Close Connection only runs when the
switch to ${alias} succeeds, using the existing SSHLibrary.Switch Connection and
SSHLibrary.Close Connection keywords together with a conditional or status check
to preserve the current session when the alias is missing.
- Around line 541-549: In Disable All NICs For VM, add a fail-fast guard after
Get Vnet Devices For MicroShift Host returns ${vnet_ifaces} so the test errors
immediately when no vnet devices are found instead of silently doing nothing.
Check for an empty list before the FOR loop, and raise a clear failure that
includes ${vm_name} and the lack of discovered NICs so the issue is obvious in
the test output.
- Around line 483-488: The precondition gate in Ensure All Clusters Healthy only
verifies RemoteCluster CR status and can let a degraded local MicroShift/OVN-K
environment pass. Update this keyword to run the actual cluster healthcheck
before fault injection, alongside or instead of Verify RemoteCluster State, so
the setup blocks unless the real cluster health is healthy for each alias
(cluster-a, cluster-b, cluster-c).
- Around line 170-184: The IP rule checks in Verify Service IP Rules and Verify
All IP Rules only validate the rule targets, so they can miss wrong ordering.
Update these Robot Framework keywords to also assert the expected priority value
for each rule returned by Command On Cluster, using the existing `${stdout}`
checks with Should Contain against the full rule strings so table 200/201
entries are verified at the correct priority.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 9ab20d10-7e3c-468d-a472-1cea8e83ce11
📒 Files selected for processing (6)
test/bin/c2cc_common.shtest/resources/c2cc.resourcetest/scenarios-bootc/c2cc/el98-src@c2cc-chaos.shtest/suites/c2cc/chaos.robottest/suites/c2cc/cleanup.robottest/suites/c2cc/probe.robot
💤 Files with no reviewable changes (2)
- test/suites/c2cc/cleanup.robot
- test/suites/c2cc/probe.robot
- c2cc_common.sh: add || return 1 to full_vm_name calls for consistency - c2cc.resource: fail fast when no vnet interfaces found for VM - c2cc.resource: only close SSH connection after successful switch Co-Authored-By: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- c2cc_common.sh: add || return 1 to full_vm_name calls for consistency - c2cc.resource: fail fast when no vnet interfaces found for VM - c2cc.resource: only close SSH connection after successful switch
30371f0 to
0d99900
Compare
|
/test ? |
|
/test e2e-aws-tests-bootc-c2cc e2e-aws-tests-bootc-c2cc-arm |
MicroShift, NetworkManager, OVN-K restarts. NIC disabling.
- c2cc_common.sh: add || return 1 to full_vm_name calls for consistency - c2cc.resource: fail fast when no vnet interfaces found for VM - c2cc.resource: only close SSH connection after successful switch
0d99900 to
bcfed52
Compare
|
/test e2e-aws-tests-bootc-c2cc e2e-aws-tests-bootc-c2cc-arm |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/suites/c2cc/chaos.robot`:
- Line 55: The cluster command in the chaos test is too disruptive for Command
On Cluster because restarting NetworkManager can drop the SSH session before the
call returns. Update the test in chaos.robot to use Disruptive Command On
Cluster for the systemctl restart NetworkManager step, using the existing
cluster-c command block so the session loss is handled safely.
- Around line 64-76: The disabled-state flag is set too late in the chaos test
flow, so a failure inside Disable All NICs For VM can leave NICs down while
Restore NICs And Reconnect sees no disabled VM to clean up. In the c2cc chaos
scenario, set ${DISABLED_VM} before calling Disable All NICs For VM and keep the
existing reset after recovery; use the existing keywords Disable All NICs For
VM, Restore NICs And Reconnect, and ${DISABLED_VM} to locate the change.
- Around line 149-157: The setup flow is missing validation for VM name
environment variables, which lets NIC-outage tests fail later with unclear
errors. Update Check Required Env Variables to also require HOST2_VM_NAME and
HOST3_VM_NAME alongside the existing host/IP/port/kubeconfig checks, and ensure
the Setup sequence still calls that validation before Register Remote Cluster so
failures surface early with a clear message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8b370a02-f8f1-4408-b570-8eb1581bfc0e
📒 Files selected for processing (5)
test/bin/c2cc_common.shtest/resources/c2cc.resourcetest/scenarios-bootc/c2cc/el98-src@c2cc-chaos.shtest/suites/c2cc/chaos.robottest/suites/c2cc/probe.robot
💤 Files with no reviewable changes (1)
- test/suites/c2cc/probe.robot
🚧 Files skipped from review as they are similar to previous changes (2)
- test/bin/c2cc_common.sh
- test/resources/c2cc.resource
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/suites/c2cc/chaos.robot`:
- Around line 58-60: Persist the disabled NIC list before calling Disable All
NICs For VM in the NIC-outage test flow so teardown always has recovery data.
Update the chaos.robot test cases that use Restore NICs And Reconnect and the
@{DISABLED_IFACES} variable so the interface list is assigned to test scope
before the disruptive keyword runs, ensuring it remains available even if
Disable All NICs For VM fails partway through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6125ba91-d4ee-44ad-ba23-e69b06d6e318
📒 Files selected for processing (3)
test/resources/c2cc.resourcetest/scenarios-bootc/c2cc/el102-src@c2cc-chaos.shtest/suites/c2cc/chaos.robot
💤 Files with no reviewable changes (1)
- test/resources/c2cc.resource
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/suites/c2cc/disruptive.robot (1)
58-60: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftNIC recovery state is still not failure-safe
If
Disable All NICs For VMfails before returning (Line 58 / Line 96),${DISABLED_VM}and@{DISABLED_IFACES}are never populated, so teardown (Line 145-148) skips or cannot restore NICs. Please move recovery-state capture into a failure-safe path (ideally inside the keyword with try/finally-style handling) so teardown can always re-enable interfaces after partial disruption.Based on learnings: in this C2CC NIC-outage flow, setting only
${DISABLED_VM}earlier is insufficient; teardown must reliably receive disabled interface state even on failure paths.Also applies to: 96-99, 145-148
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/suites/c2cc/disruptive.robot` around lines 58 - 60, The NIC recovery state setup is not failure-safe because `${DISABLED_VM}` and `@{DISABLED_IFACES}` are only populated after `Disable All NICs For VM` returns, so failures leave teardown without the information it needs. Move the disabled-VM/interface capture into a failure-safe path inside `Disable All NICs For VM` (or wrap it with try/finally-style handling in the disruptive flow) so the disabled interface state is always preserved and available to the teardown/recovery logic in `disruptive.robot`.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/suites/c2cc/disruptive.robot`:
- Around line 58-60: The NIC recovery state setup is not failure-safe because
`${DISABLED_VM}` and `@{DISABLED_IFACES}` are only populated after `Disable All
NICs For VM` returns, so failures leave teardown without the information it
needs. Move the disabled-VM/interface capture into a failure-safe path inside
`Disable All NICs For VM` (or wrap it with try/finally-style handling in the
disruptive flow) so the disabled interface state is always preserved and
available to the teardown/recovery logic in `disruptive.robot`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4677d390-5c12-4593-b462-d134f95c615c
📒 Files selected for processing (3)
test/scenarios-bootc/c2cc/el102-src@c2cc-disruptive.shtest/scenarios-bootc/c2cc/el98-src@c2cc-disruptive.shtest/suites/c2cc/disruptive.robot
|
/lgtm |
|
/verified by CI |
|
@agullon: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: agullon, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary by CodeRabbit
New Features
Bug Fixes