Fetch health info action#89820
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @masseyke, I've created a changelog YAML for you. |
…asseyke/elasticsearch into feature/health-api-fetch-health-info-2
server/src/main/java/org/elasticsearch/health/node/HealthInfo.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/health/node/FetchHealthInfoCacheActionTests.java
Show resolved
Hide resolved
What would the integration test test in this case? I thought about adding one but decided not to because the only thing beyond what FetchHealthInfoCacheActionTests is testing that be testing is TransportHealthNodeAction functionality. And that ought to have an integration test of its own (I'm not sure if it does). |
If |
Wouldn't that be really easy to test in a unit test? And really hard to test in an integration test (because I'm guessing that |
…asseyke/elasticsearch into feature/health-api-fetch-health-info-2
|
@masseyke I think the workflow for health reporting is rather complex (many moving parts in terms of reporting, caching, and fetching) and would benefit for as much testing as possible. |
|
@elasticsearchmachine run elasticsearch-ci/part-1 |
|
@elasticmachine run elasticsearch-ci/part-1 |
|
@elasticmachine run elasticsearch-ci/bwc |
|
I agree with @andreidan , maybe an integration test doesn't have to simulate a weird edge case but the edge case might present itself. What we are trying to achieve with such an integration test is to test a situation as close to reality as possible and effectively test our assumptions. @masseyke what do you think? Does this make sense? |
|
I strongly disagree with both of you, but I added several integration tests yesterday. |
|
That doesn't feel very good to me, you are leading this PR I wouldn't want you to do something you strongly disagree :). I would like to understand better your objection. Do you think that there is no value to have an integration test for this as a sanity check, or that you were planning to add it in the next PR that includes the disk indicator where it would test a more end to end scenario? |
|
The reason is that there is nothing to really integration test here that's not already covered elsewhere. It doesn't tell us anything new or mitigate any risk. There's no complex interaction in the code in this PR -- unlike the code that runs automatically on all nodes to generate and upload the health data, this just runs on a single node when someone calls it (and doesn't even transform the data in any way). Assuming TransportHealthNodeAction has been thoroughly tested and the unit test for FetchHealthInfoCacheAction is decent, I haven't figured out how to mitigate any additional risk with an integration test (other than maybe that HealthInfoCache can't be injected). |
| @@ -121,12 +135,14 @@ public void testMasterFailure() throws Exception { | |||
| assertThat(newHealthNode, notNullValue()); | |||
| assertBusy(() -> { | |||
| Map<String, DiskHealthInfo> healthInfoCache = internalCluster.getInstance(HealthInfoCache.class, newHealthNode.getName()) | |||
There was a problem hiding this comment.
I think we should replace this check all together, and compare the info we get from the action you introduced. What do you think?
There was a problem hiding this comment.
Makes sense (I just pushed that).
|
@elasticmachine update branch |
|
Thank you for sharing your view on the matter. I like your proposal to piggy back on the |
|
@elasticmachine run elasticsearch-ci/part-1 |
gmarouli
left a comment
There was a problem hiding this comment.
I had a few minor comments, but it looks good thanks for carrying it through. I hope you are happy with the result too.
server/src/internalClusterTest/java/org/elasticsearch/health/UpdateHealthInfoCacheIT.java
Outdated
Show resolved
Hide resolved
| * @param notExpectedNodeId A nullable nodeId that we do _not_ expect to see in the results | ||
| * @throws Exception | ||
| */ | ||
| private void assertResultsCanBeFetched( |
There was a problem hiding this comment.
To simplify this what if we pick a node randomly? Isn't this good enough?
There was a problem hiding this comment.
That's what it's doing right? It picks a random non-health node, and then also the health node (to cover the 2 main cases).
There was a problem hiding this comment.
Yes, it does both, I was wondering since as you said the routing is tested in the TransportHealthNodeAction, that maybe it's sufficient to just pick a node and fetch the data once. Without differentiating between the health and other nodes. WDYT?
…asseyke/elasticsearch into feature/health-api-fetch-health-info-2
|
@elasticmachine update branch |
* main: (34 commits) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) Fix segment stats in tsdb (elastic#89754) Synthetic _source: support dense_vector (elastic#89840) REST tests fetching fields with synthetic _source (elastic#89888) Do not deserialize back BytesTransportRequest to clone a request in MockTransportService (elastic#89926) Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (elastic#89912) Fix SnapshotStatusApisIT.testGetSnapshotsWithSnapshotInProgress (elastic#89925) Document synthetic source for text and keyword (elastic#89893) Fix CloneSnapshotIT.testRemoveFailedCloneFromCSWithQueuedSnapshotInProgress (elastic#89914) Add missing index.mapping.total_fields.limit setting to the target index (elastic#89875) ...
* main: (176 commits) Fix RandomSamplerAggregatorTests testAggregationSamplingNestedAggsScaled test failure (elastic#89958) [Downsampling] Replace document map with SMILE encoded doc (elastic#89495) Remove full cluster state from error logging in MasterService (elastic#89960) [ML] Truncate categorization fields (elastic#89827) [TSDB] Removed `summary` and `histogram` metric types (elastic#89937) Update testNodeSelectorRouting so that it does not depend on iteration order (elastic#89879) Make sure listener is resolved when file queue is cleared (elastic#89929) [Stable plugin api] Extensible annotation (elastic#89903) Fix double sending of response in TransportOpenIdConnectPrepareAuthenticationAction (elastic#89930) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) ...
This PR updates HealthService to fetch HealthInfo data using FetchHealthInfoCacheAction (#89820), and to pass it to the HealthIndicatorServices' calculate method. This will allow HealthIndicatorServices to use any data from the health node's HealthInfoCache.
This PR adds FetchHealthInfoCacheAction, which fetches the data sent to the health node by the UpdateHealthInfoCacheAction (#89275).
Relates to #84811