Conversation
|
Pinging @elastic/es-core-infra |
|
Pinging @elastic/ml-core |
droberts195
left a comment
There was a problem hiding this comment.
This is a great start. I left a few notes and ideas. I'll also ask the rest of the team to think about some of these design decisions.
| /** | ||
| * Specifies the end of the bucket resetting range | ||
| * | ||
| * @param resetEnd String representation of a timestamp; may be an epoch seconds, epoch millis or an ISO string |
There was a problem hiding this comment.
Might be best to say ISO 8601 instead of just ISO.
Also, this can be set to the string "now".
| for(int i = 0; i < 10; i++) { | ||
| Map<String, Object> hashMap = new HashMap<>(); | ||
| hashMap.put("total", randomInt(1000)); | ||
| postDataRequest.addDoc(hashMap); |
There was a problem hiding this comment.
It would be better if these docs contained the job's time field, with the loop generating some ascending values for it. Then the server could validate that they were received in ascending order.
To make this change an overload of buildJob() will have to be added that sets the time field and time format to specific values rather than random values, so that the loop knows what to add.
| } | ||
| for (Map<String, Object> objectMap : objectMaps) { | ||
| builder.map(objectMap); | ||
| } |
There was a problem hiding this comment.
There's a problem here because if data has been added as a mixture of bytes references and maps then all the maps will be sent after all the bytes references, and that could mean the data is not sent in ascending time order.
Two possible solutions I can think of are:
- Document that all data must be supplied in the same format - either bytes references or maps - and enforce this in the
addDoc()methods. - Convert the maps to bytes references in
addDoc()so that there are only bytes references being stored.
This also shows that we should say somewhere in the Javadocs that docs will be processed by the job in the order they're added to the request and that therefore they should be added to the request in ascending time order.
There was a problem hiding this comment.
I was thinking of doing option 2 and simply calling the bytesReferences overload of addDoc from the object map one.
The downside of serializing down to simply use the BytesReference overload is performance. Though, I may be preoptimizing here.
| * @param bytesReference document to add to bulk request, format must match the set XContentType | ||
| */ | ||
| public void addDoc(BytesReference bytesReference) { | ||
| this.bytesReferences.add(Objects.requireNonNull(bytesReference, "bytesReferences must not be null")); |
There was a problem hiding this comment.
Should we throw an exception if content != null? Or otherwise return early to ignore as the Javadoc says.
There was a problem hiding this comment.
Good thought, Since they set the the whole bulk content earlier, there is no need to continue to collect the individual docs.
| * @param objectMap document object to add to bulk request | ||
| */ | ||
| public void addDoc(Map<String, Object> objectMap) { | ||
| this.objectMaps.add(Objects.requireNonNull(objectMap, "objectMap must not be null")); |
There was a problem hiding this comment.
Should we throw an exception if content != null? Or otherwise return early to ignore as the Javadoc says.
| } | ||
|
|
||
| /** | ||
| * Set the total content to post. |
There was a problem hiding this comment.
Maybe add again that this takes precedence over any individual docs already added.
It could also .clear() the lists of individual docs.
An alternative to consider would be to throw an exception if individual docs have already been added. It seems like client code is badly designed if it adds individual docs then wipes them out by overriding with externally formatted content.
|
I've had a thought about this. Here is what I propose:
All these constructors end up building the Then the
The above keeps means the users cannot possibly create an invalid What do you think? |
droberts195
left a comment
There was a problem hiding this comment.
The new way of structuring the request object is much better. I just left a few more minor comments.
| public PostDataRequest(String jobId, XContentType xContentType, BytesReference content) { | ||
| this.jobId = Objects.requireNonNull(jobId, "job_id must not be null"); | ||
| this.xContentType = Objects.requireNonNull(xContentType, "content_type must not be null"); | ||
| this.content = content; |
There was a problem hiding this comment.
Should we also requireNonNull for content?
| this.xContentType = Objects.requireNonNull(xContentType, "content_type must not be null"); | ||
| ByteBuffer buffer = ByteBuffer.wrap(content); | ||
| ByteBuffer[] buffers = new ByteBuffer[]{ buffer }; | ||
| this.content = BytesReference.fromByteBuffers(buffers); |
There was a problem hiding this comment.
Can you avoid the ByteBuffer[] by using this.content = new ByteArray(content)? IIRC ByteArray extends BytesReference.
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(jobId, resetStart, resetEnd, xContentType); |
There was a problem hiding this comment.
I think it would be good to add a comment that content is deliberately left out as it is on the server side too. (I'm not convinced that was a good decision as it means two radically different posts can be the equal, but we are where we are.) But at least by having a comment it avoids someone adding it in in one place but not the other in the future.
Same for the comparison in equals() below.
* master: (30 commits) Include fallback settings when checking dependencies (elastic#33522) [DOCS] Fixing formatting issues in breaking changes CRUD: Disable wait for refresh tests with delete Test: Fix test name (elastic#33510) HLRC: split ingest request converters (elastic#33435) Logging: Configure the node name when we have it (elastic#32983) HLRC: split xpack request converters (elastic#33444) HLRC: split watcher request converters (elastic#33442) HLRC: add enable and disable user API support (elastic#33481) [DOCS] Fixes formatting error TEST: Ensure merge triggered in _source retention test (elastic#33487) [ML] Add a file structure determination endpoint (elastic#33471) HLRC: ML Forecast Job (elastic#33506) HLRC: split migration request converters (elastic#33436) HLRC: split snapshot request converters (elastic#33439) Make Watcher validation message copy/pasteable Removes redundant test method in SQL tests (elastic#33498) HLRC: ML Post Data (elastic#33443) Pass Directory instead of DirectoryService to Store (elastic#33466) Collapse package structure for metrics aggs (elastic#33463) ...
Adds the ability to Post Data to an ML job in the HLRC.
This relates to (#29827)