Skip to content

Script: Write Field API with basic path resolution#89738

Merged
stu-elastic merged 13 commits intoelastic:mainfrom
stu-elastic:0818-wfield-fn-proto
Sep 1, 2022
Merged

Script: Write Field API with basic path resolution#89738
stu-elastic merged 13 commits intoelastic:mainfrom
stu-elastic:0818-wfield-fn-proto

Conversation

@stu-elastic
Copy link
Copy Markdown
Contributor

@stu-elastic stu-elastic commented Aug 30, 2022

Adds WriteField to the ingest context via field(<path>).

WriteField implements APIs:

  • String getName(): The path
  • boolean exists(): Does the path exist in the document?
  • WriteField set(def): Set the value at the path, creates nested path elements if necessary
  • WriteField append(def): Appends value to the path, creates nested path elements if necessary, the value at path is always a List after this call?
  • boolean isEmpty(): Does the path contain a value?
  • int size(): How many elements does the path contain?
  • Iterator iterator(): Iterate over all elements at the path.
  • def get(def): Get the value at the path if it exists, otherwise return the given default
  • def get(int, def): Get the value at the path and index if it exists, otherwise return the given default
  • boolean hasValue(Predicate): Is there a value at the path that passes the filter?
  • WriteField transform(Function): Change all values at the path
  • WriteField deduplicate(): Remove duplicates from the path
  • WriteField removeValuesIf(Predicate): Remove a values from the path that pass the filter
  • WriteField removeValue(int): Remove the index from the path, if it exists

Some APIs remain unimplemented:

  • void move(String)
  • void overwrite(String)
  • void remove()

This PR does not handle equivalent paths, which are paths that differ in the source but flatten to the same field in the indexed document.

Path resolution is basic, each path element is assumed to be a key in the current container, starting with the root. If there is not an entry in the map at a given level, the algorithm checks to see if the remaining path exists as a flat key. The nested then flat nature algorithm handles the common case of some or none nesting followed by flat keys.

Refs: #79155

@stu-elastic stu-elastic added >enhancement :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Aug 30, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v8.5.0 labels Aug 30, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @stu-elastic, I've created a changelog YAML for you.

}

public void testIsExists() {
Map<String, Object> a = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add some exists() tests for the null and missing container value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was going to be changed to exists and we weren't going to worry about painless shortcuts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I missed the test, will rename.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

List<Object> values;
if (v == null) {
values = new ArrayList<>(4);
} else if (v instanceof List<?> list) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone supply a unmodifiable list? Should we copy the original list for unexpected side-effects?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, someone could pass an unmodifiable list, this came up during my testing, in fact.

Should we copy the original list for unexpected side-effects?

My thinking is that would violate expectations, if we attempt to append to an UnmodifiableList then we should propagate the UnsupportedOperationException.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth specializing here with a good error message as these things can be hard to track down in scripting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As indicated by @grcevski's comments below, there are a number of places we'd have to specialize to handle UnmodifiableList (or UnmodifiableRandomAccessList) and, to be consistent, we may want to also do the same for unmodifiable Maps.

That's a lot of code to provide a marginally better error message. Do you think it's worth it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth it especially because there's no reasonable way to do this in a performant way.

}

if (value instanceof List<?> list) {
// Assume modifiable list
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an assert here, and perhaps related to my earlier comment, should we consider always making sure we have mutable list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert is going to be turned off in production. The primary reason for using an UnmodifiableList is to error on modifications.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having a better error to let the user know what happened here as well?

}

if (value instanceof List<?> list) {
list.removeIf(filter);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert for mutable list here too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is we should let it fail in the standard way, what do you think?

}

if (value instanceof List<?> list) {
if (index < list.size()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutable list assert.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, an actual assert would be turned off and any attempt to modify behavior based on an UnmodifiableList works around the scripts intent. Thoughts?

Copy link
Copy Markdown
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as I already posted my major concern in the WIP PR about possible issues with out-of-sync write fields due to lack of a consistent backing structure. On the topic of an immutable list -- would it ever make sense for a user to set the value to one that already exists? An advanced user may work around this, so I guess the options are:

  1. assert as suggested by @grcevski
  2. better error message for the user
  3. do nothing and let users encounter it as the built in error
  4. copy values if we detect an immutable list though this probably fragile

@stu-elastic
Copy link
Copy Markdown
Contributor Author

On the topic of an immutable list -- would it ever make sense for a user to set the value to one that already exists?

Since scripting users do not have access to List.of(...), they would have to be very intentional in their creation of such a List. My preference 3 for the time being.

  1. do nothing and let users encounter it as the built in error

The thinking is there's no easy way to detect unmodifiable lists since they are implemented in multiple classes and not part of the java stdlib API contract. That would mean sprinkling try/catchs everywhere.

If users are tripped up by the errors, that is something we can add later.

@grcevski
Copy link
Copy Markdown
Contributor

grcevski commented Sep 1, 2022

On the topic of an immutable list -- would it ever make sense for a user to set the value to one that already exists?

Since scripting users do not have access to List.of(...), they would have to be very intentional in their creation of such a List. My preference 3 for the time being.

  1. do nothing and let users encounter it as the built in error

The thinking is there's no easy way to detect unmodifiable lists since they are implemented in multiple classes and not part of the java stdlib API contract. That would mean sprinkling try/catchs everywhere.

If users are tripped up by the errors, that is something we can add later.

If they don't have List.of then I think it's OK to assume the list will be mutable. 3. looks like a reasonable choice.

Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stu-elastic stu-elastic merged commit e3f34dc into elastic:main Sep 1, 2022
stu-elastic added a commit that referenced this pull request Sep 8, 2022
Adds `WriteField` path manipulation APIs:
 * `WriteField` `move(String)`:  Moves this path to the destination path.  Throws an error if destination path exists.
 * `WriteField` `overwrite(String)`: Same as `move(String)` but overwrites the destination path if it exists;
 * `void` `remove()`: Remove the leaf value from this path.

See also: #89738
Refs: #79155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants