Skip to content

Update in force.ha value configuration value obtention#13391

Open
erikbocks wants to merge 2 commits into
apache:4.20from
scclouds:update-force-ha-value-obtention
Open

Update in force.ha value configuration value obtention#13391
erikbocks wants to merge 2 commits into
apache:4.20from
scclouds:update-force-ha-value-obtention

Conversation

@erikbocks

Copy link
Copy Markdown
Collaborator

Description

Currently, workflows that use the force.ha configuration value only consider the global value of the configuration, even though it has a Cluster scope. To fix this, changes were made to consider the configuration's cluster scope value. If it is not configured at cluster level, the global value is used.

Sometimes, the cluster ID was not available at the method for the configuration value obtention, and the host could be null. Thus, the findClusterAndHostIdForVM was used, as it searches for the VM's host/last host and the returned host's cluster. If they (host and cluster) are both null, the cluster ID is obtained from the storage where the VM volume is allocated is returned.

This PR also removes the host's cluster cleanup that was executed during host's removal. It was observed that there was no impact in removing the host' cluster ID, and processes that could be affected by it already contained validations to prevent errors.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

First, I validated that without the changes, VMs without HA enabled in their offerings were not restarted, even though the force.ha configuration was enabled at cluster scope. After installing the packages with the new changes, I made the following tests. All the tests were executed one time with the configuration enabled globally, and one time with it enabled at cluster scope:

Test Result
Kill the VM's process with kill -9 <pid> The VM was identified as shutdown, and restarted automatically.
Hard power off the VM's host The host was identified as Disconnected, and the VM was restarted in another environment node.
Host's forced removal ACS identified that that were VMs running in the host, and restarted them automatically in another node.

Regarding the cluster ID cleanup removal, the following tests were conducted to check whether keeping it would cause inconsistencies:

Test Result
Host removal The host was removed, and its cluster ID was kept.
Host removal with running VMs The host was removed, the cluster ID was kept, and the VMs were restarted on another node.
Host reintroduction The host was reintroduced with success.
Cluster and host creation No issue was found during the creation of both resources. No issue was found during the removal of both resources.
Removing host from Cluster 1, deleting Cluster 1, and adding host to Cluster 2 The host was added with success.

Before the cluster ID cleanup removal, during the host's forced removal, HA restart jobs are created for the workers (ha.workers configuration) to process. However, if there aren't enough workers available to process the jobs and the force.ha configuration is enabled only at cluster scope, it is possible that the host's removal flow finishes before processing all HA jobs, leading to inconsistent VMs.

In order to validate if this was fixed after the cleanup removal, in a environment with 2 hosts, I provisioned 35 VMs in one host and set the ha.workers amount to 1. Then, I forcefully removed the host where the VMs were provisioned, and validated that all the VMs were stopped, and restarted on the other host.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.26%. Comparing base (a3970bb) to head (7463667).

Files with missing lines Patch % Lines
...n/java/com/cloud/vm/VirtualMachineManagerImpl.java 0.00% 3 Missing ⚠️
...java/com/cloud/ha/HighAvailabilityManagerImpl.java 0.00% 3 Missing ⚠️
...n/java/com/cloud/resource/ResourceManagerImpl.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #13391   +/-   ##
=========================================
  Coverage     16.26%   16.26%           
- Complexity    13434    13435    +1     
=========================================
  Files          5666     5666           
  Lines        500645   500646    +1     
  Branches      60801    60801           
=========================================
+ Hits          81426    81428    +2     
+ Misses       410112   410109    -3     
- Partials       9107     9109    +2     
Flag Coverage Δ
uitests 4.14% <ø> (ø)
unittests 17.12% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@winterhazel

Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@winterhazel winterhazel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, did not test.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants