Issue-105420: Fix bug causing incorrect error on force deleting already deleted model by batcity · Pull Request #107188 · elastic/elasticsearch

batcity · 2024-04-08T00:07:18Z

What I did:

Previously, attempting to force delete a referenced model in Elasticsearch that had already been deleted would result in an error indicating that the model is still being referenced by ingest processors. This behavior was misleading since the model no longer exists in the system.

Changes made:

Modified the deleteModel method to to properly check for the existence of the model before proceeding with the deletion process.

With this fix, attempting to force delete an already deleted model will now correctly indicate that the model isn't found, rather than misleadingly reporting it as still being referenced.

Related issue:

Fixes #105420

How did I test my fix:

I built and started up elasticsearch locally, then I downloaded a new model using the following command:

curl --insecure -u "elastic:<password>" -X PUT "https://localhost:9200/_ml/trained_models/.elser_model_2?pretty" -H 'Content-Type: application/json' -d' { "input": { "field_names": ["text_field"] } } '

Then I attempted to delete it and it said that it's being referenced as expected:

curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2?pretty" { "error" : { "root_cause" : [ { "type" : "status_exception", "reason" : "Cannot delete model [.elser_model_2] as it is still referenced by ingest processors; use force to delete the model" } ], "type" : "status_exception", "reason" : "Cannot delete model [.elser_model_2] as it is still referenced by ingest processors; use force to delete the model" }, "status" : 409 }

I force deleted it the first time:

curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2?force=true" {"acknowledged":true}%

It now returns the right error response when I attempt to delete the same model again:

curl --insecure -u "elastic:<password>" -X DELETE "https://localhost:9200/_ml/trained_models/.elser_model_2" {"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"Could not find trained model [.elser_model_2]"}],"type":"resource_not_found_exception","reason":"Could not find trained model [.elser_model_2]"},"status":404}%

PR checklist:

Have you signed the contributor license agreement? ✔️Yes
Have you followed the contributor guidelines? ✔️Yes
If submitting code, have you built your formula locally prior to submission with gradle check?

I ran tests in the package to make sure I didn't break anything, here's the command I ran:

./gradlew :x-pack:plugin:ml:test

Here's a screenshot showing that the tests succeeded:

If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. ✔️Yes
If submitting code, have you checked that your submission is for an OS and architecture that we support? ✔️Yes
If you are submitting this code for a class then read our policy for that. N/A

…dy deleted model

elasticsearchmachine · 2024-04-10T09:12:55Z

Pinging @elastic/ml-core (Team:ML)

davidkyle · 2024-04-10T09:15:11Z

@elasticmachine test this please

maxhniebergall

Thanks for submitting this change! I have requested a few changes, but I am open to doing it a different way. Let me know what you think!

edit: please run this command once your next commit is ready ./gradlew :x-pack:plugin:core:spotlessApply :x-pack:plugin:core:precommit :x-pack:plugin:ml:spotlessApply :x-pack:plugin:ml:precommit :x-pack:plugin:inference:spotlessApply :x-pack:plugin:inference:precommit :x-pack:plugin:ml:qa:native-multi-node-tests:precommit

maxhniebergall · 2024-04-10T18:30:52Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

        IngestMetadata currentIngestMetadata = state.metadata().custom(IngestMetadata.TYPE);
        Set<String> referencedModels = getReferencedModelKeys(currentIngestMetadata, ingestService);

+        if (modelExists(request.getId()) == false) {


nice! this looks very clean and readable.

maxhniebergall · 2024-04-10T18:42:07Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+        trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null, trainedModelListener);
+
+        try {
+            boolean latchReached = latch.await(5, TimeUnit.SECONDS);


I don't think we want a 5 second timeout here which will throw an exception. If this timeout occurs, I don't think it will be any clearer what happened for the end user.

removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L242

maxhniebergall · 2024-04-10T18:42:46Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+            }
+        };
+
+        trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null, trainedModelListener);


If we don't pass a parent task to this request, it wont be cancelable. I think it would be better to pass in the task from the masterOperation here.

Fixed in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825R227

maxhniebergall · 2024-04-10T18:52:49Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+    protected boolean modelExists(String modelId) {
+        CountDownLatch latch = new CountDownLatch(1);
+        final AtomicBoolean modelExists = new AtomicBoolean(false);
+


To avoid using a latch and requiring a timeout, I think we could replace this function with an actionListener. What do you think?

Fixed in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825R223-R235

maxhniebergall · 2024-04-10T18:59:23Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+
+            @Override
+            public void onFailure(Exception e) {
+                logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e);


Suggested change

logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e);

logger.error("Failed to retrieve model [" + modelId + "]: [" + e.getMessage() + "]", e);

removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L234

davidkyle · 2024-04-10T17:19:05Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

        }
    }

+    protected boolean modelExists(String modelId) {


You can avoid the countdown latch and hence blocking the calling thread by using a listener.

You don't have to timeout the call to trainedModelProvider.getTrainedModel() if it does timeout simply out let the error propagate from the call.

private void modelExists(String modelId, ActionListener<Boolean> listener) { trainedModelProvider.getTrainedModel(modelId, GetTrainedModelsAction.Includes.empty(), null, ActionListener.wrap( model -> listener.onResponse(Boolean.TRUE), exception -> { if (ExceptionsHelper.unwrapCause(exception) instanceof ResourceNotFoundException) { listener.onResponse(Boolean.FALSE); } else { listener.onFailure(exception); } } ) ); }

After you make the modelExists() function async with a listener you will need to chain the various processing steps together. The best way to do this is use a SubscribableListener

Here's an example if it being used:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java#L128

Fixed in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825R223

Apologies about the delayed response, I was busy with work 😅

I couldn't spin up the service and test the endpoints this time around since the ml endpoints won't activate locally for whatever reason (I get a no handler found error for the ml endpoints)

so I've only verified that the unit tests succeed

davidkyle · 2024-04-10T17:22:16Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+
+            @Override
+            public void onFailure(Exception e) {
+                logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e);


This isn't an error as we are checking the model's existence here. If the model doesn't exist then that is fine and it should be reported back to the caller rather than logged.

Removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L234

davidkyle · 2024-04-10T17:25:50Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportDeleteTrainedModelAction.java

+                throw new ElasticsearchException("Timeout while waiting for trained model to be retrieved");
+            }
+        } catch (InterruptedException e) {
+            throw new ElasticsearchException("Unexpected exception", e);


This code is not necessary if you take my suggestion but in Java it's best practice in to reset the interrupt flag with Thread.currentThread().interrupt();

Removed this in commit: 04bed02#diff-9c8c7464beb6c8c177bd79140cdb557b630bdbeb06fb8ecac032eeb518487825L245

Issue-105420: Fix bug causing incorrect error on force deleting alrea…

413c789

…dy deleted model

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v8.14.0 labels Apr 8, 2024

Issue-105420: Fixed incorrect test

1191993

batcity changed the title ~~draft: Issue-105420: Fix bug causing incorrect error on force deleting already deleted model~~ Issue-105420: Fix bug causing incorrect error on force deleting already deleted model Apr 9, 2024

batcity marked this pull request as ready for review April 9, 2024 23:45

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Apr 9, 2024

batcity mentioned this pull request Apr 9, 2024

Wrong error message returned on delete trained model when model already deleted #105420

Open

davidkyle added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Apr 10, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Apr 10, 2024

davidkyle assigned maxhniebergall Apr 10, 2024

davidkyle added the >bug label Apr 10, 2024

maxhniebergall self-requested a review April 10, 2024 18:27

maxhniebergall suggested changes Apr 10, 2024

View reviewed changes

maxhniebergall reviewed Apr 10, 2024

View reviewed changes

davidkyle reviewed Apr 11, 2024

View reviewed changes

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

maxhniebergall removed their assignment May 15, 2024

Merge branch 'elastic:main' into issue-105420-bugfix

9e57be6

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

Merge branch 'main' into issue-105420-bugfix

076e2ac

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Merge branch 'elastic:main' into issue-105420-bugfix

749cd68

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

batcity added 2 commits October 8, 2025 19:35

Merge branch 'elastic:main' into issue-105420-bugfix

9db600c

Issue-105420: Fix issues mentioned in code review

04bed02

batcity requested a review from davidkyle October 8, 2025 14:25

batcity added 2 commits October 8, 2025 20:03

Issue-105420: add changelog for PR

c195844

Merge branch 'main' into issue-105420-bugfix

b91ce2d

elasticsearchmachine added v9.4.0 and removed v9.3.0 labels Dec 17, 2025

davidkyle removed their request for review March 30, 2026 10:06

	logger.error("Failed to retrieve model {}: {}", modelId, e.getMessage(), e);
	logger.error("Failed to retrieve model [" + modelId + "]: [" + e.getMessage() + "]", e);

Conversation

batcity commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I did:

Related issue:

How did I test my fix:

PR checklist:

Uh oh!

elasticsearchmachine commented Apr 10, 2024

Uh oh!

davidkyle commented Apr 10, 2024

Uh oh!

maxhniebergall left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

batcity Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

batcity commented Apr 8, 2024 •

edited

Loading

maxhniebergall left a comment •

edited

Loading

batcity Oct 8, 2025 •

edited

Loading