Skip to content

Validating mappings when simulate ingest is called#106440

Closed
masseyke wants to merge 36 commits intoelastic:mainfrom
masseyke:simulate-validate-mappings
Closed

Validating mappings when simulate ingest is called#106440
masseyke wants to merge 36 commits intoelastic:mainfrom
masseyke:simulate-validate-mappings

Conversation

@masseyke
Copy link
Member

This adds functionality to the simulate ingest API (#101409). If the index exists when the ingest simulate API is called that the transformed documents would be indexed into if this were not a simulation, then it now reports any mapping validations that would occur. For example, if we have an index named test-index with strict mappings and a single property named foo, then:

POST _ingest/_simulate?pretty&index=test-index
{
  "docs": [
    {
      "_source": {
        "foob": "bar"
      }
    },
    {
      "_source": {
        "foo": "baz"
      }
    }
  ]
}

would result in something like:

{
    "docs" : [
      {
        "doc" : {
          "_id" : "id",
          "_index" : "test-index",
          "_version" : -3,
          "_source" : {
            "foob" : "bar"
          },
          "executed_pipelines" : [
            "test-pipeline"
          ],
          "error" : {
            "type" : "strict_dynamic_mapping_exception",
            "reason" : "[1:9] mapping set to strict, dynamic introduction of [foob] within [_doc] is not allowed"
          }
        }
      },
      {
        "doc" : {
          "_id" : "id",
          "_index" : "test-index",
          "_version" : -3,
          "_source" : {
            "foo" : "baz"
          },
          "executed_pipelines" : [
            "test-pipeline"
          ]
        }
      }
    ]
}

@masseyke masseyke added >enhancement :Distributed/Ingest Node Execution or management of Ingest Pipelines v8.14.0 labels Mar 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look. I am not a fan of the isSimulated flag. I wonder if we can try an alternative approach at least where we build a specific simulateIndex API instead? That goes both for the transport layer and the engine/index-shard layer.

I left a couple examples of problems. I am sure you could fix those too - but I'd be worried about the amount of ifs and us not catching some that need to be caught. The simulate action is a read-only action, whereas the bulk action is a write one, I think we have to treat them quite separately.

@@ -1250,7 +1249,7 @@ public IndexResult index(Index index) throws IOException {
);
}
localCheckpointTracker.markSeqNoAsProcessed(indexResult.getSeqNo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should also not be done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I missed that one.

ActionListener.wrap(v -> executor.execute(this), this::onRejection),
documentParsingProvider
documentParsingProvider,
request.isSimulated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really for this line, but I wonder what would happen if a simulation failed to be replicated? I could imagine us failing the shard, which would be a pity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "failing the shard"? The only consequence would be that the call to simulate would fail, right (which might even be desirable, because that is what would happen if you were to attempt to actually index data)? Or are there worse consequences that I am not aware of (which is entirely possible since I'm new to this part of the codebase)?

Copy link
Member Author

@masseyke masseyke May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to not run simulation in a TransportWriteRequest, while attempting to still reuse as much code as possible. Simulation now goes through TransportSimulateBulkShardAction, which is an ordinary TransportAction that calls the dispatchedShardOperationOnPrimary method on TransportShardBulkAction.

@masseyke
Copy link
Member Author

masseyke commented Jul 8, 2024

Closed in favor of #110606

@masseyke masseyke closed this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines >enhancement v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants