Issue-105420: Fix bug causing incorrect error on force deleting already deleted model#107188
Issue-105420: Fix bug causing incorrect error on force deleting already deleted model#107188batcity wants to merge 9 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
|
@elasticmachine test this please |
There was a problem hiding this comment.
Thanks for submitting this change! I have requested a few changes, but I am open to doing it a different way. Let me know what you think!
edit: please run this command once your next commit is ready ./gradlew :x-pack:plugin:core:spotlessApply :x-pack:plugin:core:precommit :x-pack:plugin:ml:spotlessApply :x-pack:plugin:ml:precommit :x-pack:plugin:inference:spotlessApply :x-pack:plugin:inference:precommit :x-pack:plugin:ml:qa:native-multi-node-tests:precommit
| IngestMetadata currentIngestMetadata = state.metadata().custom(IngestMetadata.TYPE); | ||
| Set<String> referencedModels = getReferencedModelKeys(currentIngestMetadata, ingestService); | ||
|
|
||
| if (modelExists(request.getId()) == false) { |
There was a problem hiding this comment.
nice! this looks very clean and readable.
| trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null, trainedModelListener); | ||
|
|
||
| try { | ||
| boolean latchReached = latch.await(5, TimeUnit.SECONDS); |
There was a problem hiding this comment.
I don't think we want a 5 second timeout here which will throw an exception. If this timeout occurs, I don't think it will be any clearer what happened for the end user.
There was a problem hiding this comment.
removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L242
| } | ||
| }; | ||
|
|
||
| trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null, trainedModelListener); |
There was a problem hiding this comment.
If we don't pass a parent task to this request, it wont be cancelable. I think it would be better to pass in the task from the masterOperation here.
There was a problem hiding this comment.
| protected boolean modelExists(String modelId) { | ||
| CountDownLatch latch = new CountDownLatch(1); | ||
| final AtomicBoolean modelExists = new AtomicBoolean(false); | ||
|
|
There was a problem hiding this comment.
To avoid using a latch and requiring a timeout, I think we could replace this function with an actionListener. What do you think?
There was a problem hiding this comment.
|
|
||
| @Override | ||
| public void onFailure(Exception e) { | ||
| logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e); |
There was a problem hiding this comment.
| logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e); | |
| logger.error("Failed to retrieve model [" + modelId + "]: [" + e.getMessage() + "]", e); |
There was a problem hiding this comment.
removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L234
| } | ||
| } | ||
|
|
||
| protected boolean modelExists(String modelId) { |
There was a problem hiding this comment.
You can avoid the countdown latch and hence blocking the calling thread by using a listener.
You don't have to timeout the call to trainedModelProvider.getTrainedModel() if it does timeout simply out let the error propagate from the call.
private void modelExists(String modelId, ActionListener<Boolean> listener) {
trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null,
ActionListener.wrap(
model -> listener.onResponse(Boolean.TRUE),
exception -> {
if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceNotFoundException) {
listener.onResponse(Boolean.FALSE);
} else {
listener.onFailure(exception);
}
}
)
);
}
There was a problem hiding this comment.
After you make the modelExists() function async with a listener you will need to chain the various processing steps together. The best way to do this is use a SubscribableListener
Here's an example if it being used:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java#L128
There was a problem hiding this comment.
Fixed in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825R223
Apologies about the delayed response, I was busy with work 😅
|
|
||
| @Override | ||
| public void onFailure(Exception e) { | ||
| logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e); |
There was a problem hiding this comment.
This isn't an error as we are checking the model's existence here. If the model doesn't exist then that is fine and it should be reported back to the caller rather than logged.
There was a problem hiding this comment.
Removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L234
| throw new ElasticsearchException("Timeout while waiting for trained model to be retrieved"); | ||
| } | ||
| } catch (InterruptedException e) { | ||
| throw new ElasticsearchException("Unexpected exception", e); |
There was a problem hiding this comment.
This code is not necessary if you take my suggestion but in Java it's best practice in to reset the interrupt flag with Thread.currentThread().interrupt();
There was a problem hiding this comment.
Removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L245

What I did:
Previously, attempting to force delete a referenced model in Elasticsearch that had already been deleted would result in an error indicating that the model is still being referenced by ingest processors. This behavior was misleading since the model no longer exists in the system.
Changes made:
deleteModelmethod to to properly check for the existence of the model before proceeding with the deletion process.With this fix, attempting to force delete an already deleted model will now correctly indicate that the model isn't found, rather than misleadingly reporting it as still being referenced.
Related issue:
Fixes #105420
How did I test my fix:
I built and started up elasticsearch locally, then I downloaded a new model using the following command:
curl --insecure -u "elastic:<password>" -X PUT "https://localhost:9200/_ml/trained_models/.elser_model_2?pretty" -H 'Content-Type: application/json' -d' { "input": { "field_names": ["text_field"] } } 'Then I attempted to delete it and it said that it's being referenced as expected:
curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2?pretty" { "error" : { "root_cause" : [ { "type" : "status_exception", "reason" : "Cannot delete model [.elser_model_2] as it is still referenced by ingest processors; use force to delete the model" } ], "type" : "status_exception", "reason" : "Cannot delete model [.elser_model_2] as it is still referenced by ingest processors; use force to delete the model" }, "status" : 409 }I force deleted it the first time:
curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2?force=true" {"acknowledged":true}%It now returns the right error response when I attempt to delete the same model again:
curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2" {"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"Could not find trained model [.elser_model_2]"}],"type":"resource_not_found_exception","reason":"Could not find trained model [.elser_model_2]"},"status":404}%PR checklist:
Have you signed the contributor license agreement? ✔️Yes
Have you followed the contributor guidelines? ✔️Yes
If submitting code, have you built your formula locally prior to submission with
gradle check?I ran tests in the package to make sure I didn't break anything, here's the command I ran:
./gradlew :x-pack:plugin:ml:testHere's a screenshot showing that the tests succeeded: