Skip to content

Add ES|QL Locate function#106899

Merged
tteofili merged 25 commits intoelastic:mainfrom
tteofili:esql_locate
Apr 5, 2024
Merged

Add ES|QL Locate function#106899
tteofili merged 25 commits intoelastic:mainfrom
tteofili:esql_locate

Conversation

@tteofili
Copy link
Copy Markdown
Contributor

@tteofili tteofili commented Mar 29, 2024

This adds a LOCATE function to ES|QL.

PUT testidx
{
  "mappings": {
    "properties": {
      "text_field": {
        "type": "keyword"
      }
    }
  }
}

POST testidx/_doc
{ "text_field": "hello" }

POST _query
{
  "query": """
  FROM testidx|
  EVAL s = locate(text_field, 'll')
  """
}

closes #106818

@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview:

least |"integer|long|double|boolean|keyword|text|ip|version least(first:integer|long|double|boolean|keyword|text|ip|version, ?rest...:integer|long|double|boolean|keyword|text|ip|version)" | first |"integer|long|double|boolean|keyword|text|ip|version" |"" |"integer|long|double|boolean|keyword|text|ip|version" | "Returns the minimum value from many columns." | false | true | false
left |"keyword left(string:keyword|text, length:integer)" |[string, length] |["keyword|text", "integer"] |["The string from which to return a substring.", "The number of characters to return."] |keyword | "Returns the substring that extracts 'length' chars from 'string' starting from the left." | [false, false] | false | false
length |"integer length(string:keyword|text)" |string |"keyword|text" | "" |integer | "Returns the character length of a string." | false | false | false
locate |"integer locate(str:keyword|text, substr:keyword|text)" |[str,substr] |["keyword|text", "keyword|text"] |["", ""] |integer | "Returns an integer that indicates the position of a keyword substring within another string" | [false, false] | false | false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for this test. It was a good idea, but I think it's grown to be a problem. @luigidellaquila, would you be ok if I broke this test apart into a few tests so updating it is easier? Right now it's a huge blob of text to work through.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100, updating this test is always painful for me as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it's a good idea to break it down into smaller tests, imho.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing it. Now. Well. Maybe in an hour.

@tteofili tteofili marked this pull request as ready for review April 4, 2024 12:45
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Apr 4, 2024
@tteofili tteofili added >enhancement :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories labels Apr 4, 2024
@tteofili tteofili added the :Analytics/ES|QL AKA ESQL label Apr 4, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @tteofili, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Apr 4, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment regarding parameter names. Also, please add more complex csv-spec tests using from command:

  • nested function calls locate(another_function(...)) or another_function(locate(...))
  • use the result of locate in stats, sort and other commands
  • check the warning messages it creates. In string.csv-spec look for examples (warning:Line 1:).

)
public Locate(
Source source,
@Param(name = "str", type = { "keyword", "text" }, description = "An input string") Expression str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the full word instead of str. There was a recent effort in having a common naming convention as much as possible in all functions. Same statement for substr below.

Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one small optional comment.

@tteofili tteofili merged commit 54eeb62 into elastic:main Apr 5, 2024
int codePointCount = UnicodeUtil.codePointCount(str);
int indexStart = indexStart(codePointCount, start);
String utf8ToString = str.utf8ToString();
return 1 + utf8ToString.indexOf(substr.utf8ToString(), utf8ToString.offsetByCodePoints(0, indexStart));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about this one with characters that need two chars in utf-16.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right @nik9000, but I had to write a test to confirm. And at that point I decided to raise this PR #107172

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing this @nik9000, and thanks @ChrisHegarty for opening #107172

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 9, 2024
This is automatically generated and should probably have been committed
as part of elastic#106899.
alex-spies added a commit that referenced this pull request Apr 9, 2024
This is automatically generated and was created as part of #106899.
@leemthompo
Copy link
Copy Markdown
Member

leemthompo commented Apr 10, 2024

@tteofili @nik9000 looks like we forgot to actually render all the generated docs here...

Just noticed this as I look back over history for the Kibana inline docs updates I need to do for 8.14

[Opening PR to fix]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL LOCATE function

7 participants