Skip to content

KVM: return null state instead of Disconnected when investigate a host without NFS#10515

Merged
DaanHoogland merged 1 commit intoapache:4.19from
weizhouapache:4.19-fix-kvm-investigator
Mar 10, 2025
Merged

KVM: return null state instead of Disconnected when investigate a host without NFS#10515
DaanHoogland merged 1 commit intoapache:4.19from
weizhouapache:4.19-fix-kvm-investigator

Conversation

@weizhouapache
Copy link
Copy Markdown
Member

Description

Currently when kvm host does not have NFS, it is determined as Disconnected during agent/vm investigation.
The other investigators are not performed.

This PR fixes the issue so that the other investigators will be performed.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Below is an example of the investigation process with this PR

(on the kvm host, I added a firewall rule to drop the packets to port 8250 of management server)
image

How did you try to break this feature and the system with this change?

Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 15.16%. Comparing base (b41acf2) to head (f9fd642).
Report is 2 commits behind head on 4.19.

Files with missing lines Patch % Lines
...vm/src/main/java/com/cloud/ha/KVMInvestigator.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19   #10515      +/-   ##
============================================
- Coverage     15.17%   15.16%   -0.01%     
+ Complexity    11332    11328       -4     
============================================
  Files          5414     5414              
  Lines        474802   474802              
  Branches      57909    57909              
============================================
- Hits          72028    72008      -20     
- Misses       394718   394742      +24     
+ Partials       8056     8052       -4     
Flag Coverage Δ
uitests 4.28% <ø> (ø)
unittests 15.89% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@weizhouapache weizhouapache added this to the 4.19.3 milestone Mar 6, 2025
@weizhouapache weizhouapache marked this pull request as ready for review March 6, 2025 14:36
@weizhouapache
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12686

@yadvr yadvr modified the milestones: 4.19.3, 4.20.1 Mar 7, 2025
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Mar 7, 2025

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-12603)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 48986 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10515-t12603-kvm-ol8.zip
Smoke tests completed. 133 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Copy Markdown
Member

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Verified the issue manually by executing the following steps

  1. Create a cloudstack env with 2 hosts and no nfs primary storages.
  2. On one of the kvm host configure ha and enable HA.
  3. Add a firewall rule which drops the packets on port 8250

iptables -I OUTPUT -p tcp -m tcp --dport 8250 -j DROP

  1. Check the management server logs

Before fix,

Cloudstack doesn't pick up the HypervInvestigator VMwareInvestigator, ping investigator.

2025-03-06 13:36:30,022 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Investigating why host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} has disconnected with event PingTimeout
2025-03-06 13:36:30,023 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) checking if agent (Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}) is alive
2025-03-06 13:36:30,025 DEBUG [c.c.a.t.Request] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Sending  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,042 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Timed out on Seq 1-8864491441548689460:  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,047 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Cancelling.
2025-03-06 13:37:10,047 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Operation timed out: Commands 8864491441548689460 to Host 1 timed out after 100
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,083 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent investigation was requested on host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:37:10,083 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) KVMInvestigator was able to determine host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} is in Disconnected
2025-03-06 13:37:10,083 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) The agent from host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state determined is Disconnected
2025-03-06 13:37:10,083 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent is disconnected but the host is still up: Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state: Enabled

After fix

Cloudstack picks up the HypervInvestigator VMwareInvestigator, ping investigator.

 [root@ol8 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "logid:b39c7f05"
2025-03-06 13:08:59,485 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Investigating why host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} has disconnected with event PingTimeout
2025-03-06 13:08:59,485 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:08:59,487 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Sending  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:09:49,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,488 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Timed out on Seq 2-5748563449361727501:  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,488 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Cancelling.
2025-03-06 13:10:39,489 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Operation timed out: Commands 5748563449361727501 to Host 2 timed out after 100
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Agent investigation was requested on host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) KVMInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) HypervInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) VMwareInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,495 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:10:39,496 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) sending ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136)
2025-03-06 13:10:39,497 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Sending  { Cmd , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.0.35.136","wait":"20","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,511 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Received:  { Ans: , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 10, { Answer } }
2025-03-06 13:10:39,512 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) host (10.0.35.136) has been successfully pinged, returning that host is up
2025-03-06 13:10:39,512 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136) successful, returning that agent is disconnected
2025-03-06 13:10:39,512 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) PingInvestigator was able to determine host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} is in Disconnected

@weizhouapache
Copy link
Copy Markdown
Member Author

LGTM, Verified the issue manually by executing the following steps

  1. Create a cloudstack env with 2 hosts and no nfs primary storages.
  2. On one of the kvm host configure ha and enable HA.
  3. Add a firewall rule which drops the packets on port 8250

iptables -I OUTPUT -p tcp -m tcp --dport 8250 -j DROP

  1. Check the management server logs

Before fix,

Cloudstack doesn't pick up the HypervInvestigator VMwareInvestigator, ping investigator.

2025-03-06 13:36:30,022 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Investigating why host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} has disconnected with event PingTimeout
2025-03-06 13:36:30,023 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) checking if agent (Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}) is alive
2025-03-06 13:36:30,025 DEBUG [c.c.a.t.Request] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Sending  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,042 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Timed out on Seq 1-8864491441548689460:  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,047 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Cancelling.
2025-03-06 13:37:10,047 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Operation timed out: Commands 8864491441548689460 to Host 1 timed out after 100
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,083 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent investigation was requested on host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:37:10,083 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) KVMInvestigator was able to determine host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} is in Disconnected
2025-03-06 13:37:10,083 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) The agent from host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state determined is Disconnected
2025-03-06 13:37:10,083 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent is disconnected but the host is still up: Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state: Enabled

After fix

Cloudstack picks up the HypervInvestigator VMwareInvestigator, ping investigator.

 [root@ol8 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "logid:b39c7f05"
2025-03-06 13:08:59,485 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Investigating why host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} has disconnected with event PingTimeout
2025-03-06 13:08:59,485 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:08:59,487 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Sending  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:09:49,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,488 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Timed out on Seq 2-5748563449361727501:  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,488 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Cancelling.
2025-03-06 13:10:39,489 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Operation timed out: Commands 5748563449361727501 to Host 2 timed out after 100
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Agent investigation was requested on host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) KVMInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) HypervInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) VMwareInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,495 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:10:39,496 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) sending ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136)
2025-03-06 13:10:39,497 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Sending  { Cmd , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.0.35.136","wait":"20","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,511 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Received:  { Ans: , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 10, { Answer } }
2025-03-06 13:10:39,512 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) host (10.0.35.136) has been successfully pinged, returning that host is up
2025-03-06 13:10:39,512 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136) successful, returning that agent is disconnected
2025-03-06 13:10:39,512 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) PingInvestigator was able to determine host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} is in Disconnected

great, thanks @kiranchavala for testing !

@DaanHoogland DaanHoogland merged commit cd6d1a2 into apache:4.19 Mar 10, 2025
24 of 25 checks passed
@DaanHoogland DaanHoogland deleted the 4.19-fix-kvm-investigator branch March 10, 2025 08:06
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants