Skip to content

[4.22.1.0-shapeblue1] KVM HA fixes #13373 and #13377#138

Open
harikrishna-patnala wants to merge 5 commits into
4.22.1.0-shapeblue1from
ha-checkonhostanswer-fix-4.22.1
Open

[4.22.1.0-shapeblue1] KVM HA fixes #13373 and #13377#138
harikrishna-patnala wants to merge 5 commits into
4.22.1.0-shapeblue1from
ha-checkonhostanswer-fix-4.22.1

Conversation

@harikrishna-patnala

@harikrishna-patnala harikrishna-patnala commented Jun 17, 2026

Copy link
Copy Markdown
Member

Description

This PR is a duplicate of upstream PRs apache#13373 and apache#13377 to address the same in the ShapeBlue custom patch 4.22.1.0-shapeblue1

cc @sureshanaparti

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@harikrishna-patnala

Copy link
Copy Markdown
Member Author

@blueorangutan package

@harikrishna-patnala harikrishna-patnala changed the title [4.22.1.0-shapeblue1] KVM HA: Fix CheckOnHostAnswer success flag when there is no heartbeat Jun 19, 2026
sureshanaparti and others added 5 commits June 19, 2026 09:54
KVMHAProvider.fence() declared a host fenced only when the out-of-band power-off
command reported success. Against an already-off chassis the BMC rejects the
power-off (e.g. Redfish returns HTTP 409), so fence() failed and the host stayed
stuck in the Fencing HA state, which maps to Disconnected (not Down). VM-HA
therefore never restarted the VMs until the dead host was powered back on.

Fencing now succeeds based on the actual chassis power state:
 - if the host is already powered off (OOBM STATUS == Off), treat it as fenced;
 - otherwise issue a best-effort power-off and confirm via OOBM STATUS;
 - only a confirmed Off state counts as success; if the state cannot be confirmed
   (e.g. unreachable BMC) the fence fails and is retried, to avoid split-brain.

Also map Redfish PowerOperation.OFF to ForceOff (hard power-off) instead of
GracefulShutdown, consistent with the ipmitool driver and appropriate for fencing
an unresponsive host (SOFT remains the graceful ACPI shutdown).

Fixes apache#13376
@harikrishna-patnala harikrishna-patnala force-pushed the ha-checkonhostanswer-fix-4.22.1 branch from 5025f5d to 1613b41 Compare June 19, 2026 04:24
@kiranchavala

kiranchavala commented Jun 19, 2026

Copy link
Copy Markdown
Member

@weizhouapache @NuxRo @rajujith @sureshanaparti @harikrishna-patnala @andrijapanicsb

what should be the expected behaviour of vm ha in case of soft power off the kvm host

Steps to reproduce the issue

  1. Create a HA enabled offering
  2. Deploy a vm with HA enabled offering on a kvm host 1
  3. Login to the kvm host 1
  4. Issue shutdown command
  5. VM HA doesn't get triggered
  6. Global setting value : commands.timeout = CheckHealthCommand=5,CheckOnHostCommand=5
[root@ref-trl-11991-k-Mol8-kiran-chavala-kvm1 ~]# virsh list
 Id   Name       State
--------------------------
 1    i-2-6-VM   running

[root@ref-trl-11991-k-Mol8-kiran-chavala-kvm1 ~]# shutdown now
Connection to 10.0.32.193 closed by remote host.
Connection to 10.0.32.193 closed.
  1. Host goes into disconnected state
  2. No entry created in ( select * from op_ha_work) table
  3. grep "status reported from itself" /var/log/cloudstack/management/management-server.log

VM HA gets triggered only if hard power-off a KVM host

@weizhouapache

Copy link
Copy Markdown
Member

if Host goes into disconnected state, VM HA will NOT be triggered.
It is a known issue, we will address in a customer FR.

@kiranchavala

Copy link
Copy Markdown
Member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants