Skip to content

[Ingest] Simulate Endpoint#14572

Merged
talevy merged 15 commits intoelastic:feature/ingestfrom
talevy:simulate_ingest
Nov 13, 2015
Merged

[Ingest] Simulate Endpoint#14572
talevy merged 15 commits intoelastic:feature/ingestfrom
talevy:simulate_ingest

Conversation

@talevy
Copy link
Copy Markdown
Contributor

@talevy talevy commented Nov 6, 2015

I think it is ready.

Adds _simulate endpoint to enable the processing of provided documents without indexing

@talevy talevy added review :Distributed/Ingest Node Execution or management of Ingest Pipelines labels Nov 6, 2015
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(() -> {
    payload.pipeline().execute(data);
    responses.add(new SimulatedItemResponse(data));         
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool.

then I think all of this can be moved into the SimulatePipelineRequestPayload class (might need to rename class to be shorter 😄)

and called like

payload.execute(threadPool)

@talevy talevy force-pushed the simulate_ingest branch 2 times, most recently from 226d809 to 8e1934f Compare November 6, 2015 06:11
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we go for supporting get too, use the util RestActions#hasBodyContent and getRestContent instead, those support the body passed in as the source query string parameter, which all of the core apis should support when they support a get with body (for clients that don't support sending a body with get)

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Nov 6, 2015

I did a round of review and left some comments and some questions, but it looks very good already!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the pipeline factory to be a field of the factory class?

@talevy talevy force-pushed the simulate_ingest branch 2 times, most recently from c393940 to c844da3 Compare November 12, 2015 02:33
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be updated to include the metadata as it was given in the input

so something more like

Map<String, Object> dataDoc = ...
dataDoc.put("_index", data.getIndex());
dataDoc.put("_type", data.getType());
dataDoc.put("_id", data.getId());
builder.field(Fields.DOCUMENT, dataDoc);

Although, I think it would be better for Data to provide a getDocumentWithMetadata to do this sort of thing so it can be re-used and agnostic to any new fields we have in metadata.

what do you think @javanna, @martijnvg ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately Data can't implement ToXContent... maybe we should just have a static helper method that serialises the Data with an XContentBuilder?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I just realized that I suggested below to make Data implement both Streamable and ToXContent...which is not correct as it would mess up dependencies. I think sooner or later we should extract these classes to a different module to prevent mistakes and confusion and enforce the no deps rule that we follow. We should have some shared code around outputting Data to ToXContent and serializing it too. Shall we have an enhanced version of Data as part of the plugin that accepts a Data instance but also implements ToXContent and Streamable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see yet specific equals/hashcode tests, I think we could benefit from those. I know that they have been generated and are most likely fine at the moment, but pretty sure without tests they will get out of sync with future modifications.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a couple,

found a really annoying problem that needs to be worked around...

Throwable comparison does not work...

Objects.equals(Throwable, Throwable)

I added logic to say that two Throwables are equivalent so long as they are of the same class... I am not a fan of this solution, have any ideas?

@talevy talevy force-pushed the simulate_ingest branch 3 times, most recently from 54d2a90 to cccad8d Compare November 13, 2015 01:44
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs cleaning up.

testing whether two Throwable objects are equal is not working on the object level, so I decided to just test whether they are of the same class type

@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Nov 13, 2015

@javanna I don't know what I've done! 😄

In trying to clean things up and making them more straightforward to test... I introduced a few things that are not ideal. I left them as inline comments. let me know what you think! I do not think they are that big of a deal, but they added a lot of code.

@talevy talevy force-pushed the simulate_ingest branch 2 times, most recently from e453fee to 154faf0 Compare November 13, 2015 02:22
Removed equals and hashcode whenever they wouldn't be reliable because of exception comparison. at the end of the day we use them for testing and we can simplify our tests without requiring equals and hashcode in prod code, which also would require more tests if maintained.

Add equals/hashcode test for Data/TransportData and randomize existing serialization tests
@javanna
Copy link
Copy Markdown
Contributor

javanna commented Nov 13, 2015

@talevy what you did looks good. I did simplify a few things around serialization in my last commit. I also moved some of the classes to implement Writeable rather than Streamable so we can have final fields and we can drop the default constructors that were needed for serialization only. I removed equals/hashcode when they required exceptions comparisons as that wasn't ideal and used only for testing, which can be simplified. Can you please have a look and tell me what you think?

@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Nov 13, 2015

++ LGTM

that commit really addressed what I was trying to clean up

Only MutateProcessor implemented equals / hashcode hence we would only use that one in our tests, since they relied on them. Better to not rely on equals/hashcode, drop them and mock processor/pipeline in our tests that need them. That also allow to make MutateProcessor constructor package private as the other processors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize we would return NPE. Is that because a field is not there? In general I think we should return nicer errors as much as we can.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this NPE is thrown by the MutateProcessor.

Data#getProperty should wrap the exception is the suggestion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea ok let's do it on a separate issue though, doesn't need to be part of the PR, although it will require changing these tests ;)

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Nov 13, 2015

I pushed a few more more cleanup commits and left a comment inline on error messages. One more thing is about the fact that we continue processing in case of failure. I understand why but it seems inconsistent given that we do the opposite in reality (when not simulating) till we will add support for continue_on_failure. Seems like this api should behave the same as the real one? What do you think?

@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Nov 13, 2015

continuing through the processors is only done during a Verbose execution. It is still the case that the pipeline halts during a regular non-verbose _simulate run

@javanna
Copy link
Copy Markdown
Contributor

javanna commented Nov 13, 2015

continuing through the processors is only done during a Verbose execution. It is still the case that the pipeline halts during a regular non-verbose _simulate run

ok I guess it's fine for now, we have to remember to adapt it to continue on failure once we add the flag I think

@talevy
Copy link
Copy Markdown
Contributor Author

talevy commented Nov 13, 2015

ok I guess it's fine for now, we have to remember to adapt it to continue on failure once we add the flag I think

I agree. I added a note to the continue_on_failure ticket: #14548

talevy added a commit that referenced this pull request Nov 13, 2015
@talevy talevy merged commit 082686d into elastic:feature/ingest Nov 13, 2015
@talevy talevy deleted the simulate_ingest branch November 13, 2015 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants