Skip to content

#639: allow metadata fields and score opensearch function (#228)#1456

Merged
MaxKsyunz merged 9 commits intoopensearch-project:mainfrom
Bit-Quill:integ-metadata-fields
Apr 10, 2023
Merged

#639: allow metadata fields and score opensearch function (#228)#1456
MaxKsyunz merged 9 commits intoopensearch-project:mainfrom
Bit-Quill:integ-metadata-fields

Conversation

@acarbonetto
Copy link
Copy Markdown
Collaborator

Description

OpenSearch reserved fields (_id, _index, _sort, _score, _max_score) are not allowed to be used in SQL clauses (SELECT, WHERE, ORDER BY) because the field format starting with underscore _ is not allowed.

  • This ticket adds specific identifiers to the language, and opens up support for OpenSearch reserved identifiers.
  • As an aside, identifiers with double underscore at the start (such as __myCoolField) is acceptable as an identifier.

This ticket allows for the score(), score_query() and scorequery() function to be used to wrap around relevance-search queries to force the _score and _max_score metadata fields to be returned with values. For some queries, _score returns null unless the score() function is included.

  • The score function also allows for an optional boost argument to be included that boosts the score of the child relevance function.

Example - metadata fields returned:

SELECT calcs.key, str0, _id, _sort, _score, _maxscore FROM calcs WHERE _id="5"
Result:
{ "key04", "OFFICE SUPPLIES", "5", -2, null, null }

Example - Metadata fields not requested are not displayed:

SELECT *, _id FROM bigint WHERE _id="2"
Result (assuming bigint only has one field):
[9223372026854775807, "2"]

Example - relevance search without and with score function

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE QUERY_STRING([\"*\"], `BINDING SUPPLIES`);
result: 
[
    "7",
    "calcs",
    null,
    null,
    "OFFICE SUPPLIES",
    "BINDING SUPPLIES"
]

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE SCORE(QUERY_STRING([\"*\"], `BINDING SUPPLIES`));
[
    "7",
    "calcs",
    2.4849067,
    2,
    "OFFICE SUPPLIES",
    "BINDING SUPPLIES"
]

Example - boost score on the _sql/_explain call:

SELECT _id, _score, _maxscore FROM calcs WHERE SCORE(QUERY_STRING([\"*\"], `BINDING SUPPLIES`, boost=2.5));
result: 
[
    "7",
    6.2122664,
    6
]

SELECT _id, _score, _maxscore FROM calcs WHERE SCOREQUERY(QUERY_STRING([\"*\"], `BINDING SUPPLIES`, boost=2.5), 2.0);
result: 
[
    "7",
    12.424533,
    12
]

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE SCORE_QUERY(QUERY_STRING([\"*\"], `BINDING SUPPLIES`), 2.5);
result:
[
    "7",
    6.2122664,
    6
]

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…nction (#228)

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update to define and include metadata when visiting the expr node

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add specific metadata identifiers

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add IT tests and add parser changes

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update score function expression analyzer to return boosted relevance function

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update builder to track scores

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Remove ScoreExpression.java and cleanup checkstyle

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* cleanup checkstyle

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Cleanup and add alternative score function syntax

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Cleanup and add alternative score function syntax

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Fix some bugs and add Expression tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add expresssion and analyzer tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add score doctests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add score function doctests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add metafield tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Move legacy test and mark old as ignore

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update tests and identifier to accept metafields

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Checkstyle fixes

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Revert bad conflict resolution

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Fix for review comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update IT tests and legacy tests for comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Minor comment

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Updates for whitespace

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst to show OS result

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst to show OS result

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst description

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Change Score function to accept a double/integer not an unresolved

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update functions.rst

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Checkstyle update

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Move reserved world symbol table to OpenSearchTable

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update functions.rst for review comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Removed parser meta tokens; Changes ImmutableMap to Map

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Removed parser meta tokens; Changes ImmutableMap to Map

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

---------

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@acarbonetto acarbonetto requested a review from a team as a code owner March 21, 2023 18:18
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 21, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.49%. Comparing base (23cc0f6) to head (05c42f7).
⚠️ Report is 1034 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1456      +/-   ##
============================================
+ Coverage     98.46%   98.49%   +0.02%     
- Complexity     3869     3928      +59     
============================================
  Files           345      347       +2     
  Lines          9603     9771     +168     
  Branches        616      645      +29     
============================================
+ Hits           9456     9624     +168     
  Misses          142      142              
  Partials          5        5              
Flag Coverage Δ
sql-engine 98.49% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MaxKsyunz
MaxKsyunz previously approved these changes Mar 21, 2023
Copy link
Copy Markdown
Collaborator

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acarbonetto Do metafields work with aliases? E.g. SELECT h._id FROM tableA AS h

Is there a test that confirms this?

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@acarbonetto
Copy link
Copy Markdown
Collaborator Author

@acarbonetto Do metafields work with aliases? E.g. SELECT h._id FROM tableA AS h

Is there a test that confirms this?

IT test for this added in f3d89b8

GumpacG
GumpacG previously approved these changes Mar 23, 2023
Yury-Fridlyand
Yury-Fridlyand previously approved these changes Mar 25, 2023
MaxKsyunz
MaxKsyunz previously approved these changes Mar 27, 2023
Comment on lines +475 to +476
// Identifiers cannot start with a single '_' since this an OpenSearch reserved
// metadata field. Two underscores (or more) is acceptable, such as '__field'.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment match the rule below? So ID_LITERAL won't accept token starting with single underscore?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I need to update this comment since I've updated the pattern.
Thank you!

Yury-Fridlyand
Yury-Fridlyand previously approved these changes Apr 6, 2023
Comment on lines +100 to +103
FetchSourceContext fetchSource = this.sourceBuilder.fetchSource();
List<String> includes = fetchSource != null && fetchSource.includes() != null
? Arrays.asList(fetchSource.includes())
: List.of();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be do this in the query builder?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearchQueryRequest and OpenSearchScrollRequest needs a refactor before this makes any sense.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assist in creating an issue to monitor and manage future modifications.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@MaxKsyunz MaxKsyunz merged commit e805151 into opensearch-project:main Apr 10, 2023
@MaxKsyunz MaxKsyunz deleted the integ-metadata-fields branch April 10, 2023 19:42
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 10, 2023
…ction (#228) (#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
(cherry picked from commit e805151)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 10, 2023
…ction (#228) (#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
(cherry picked from commit e805151)
penghuo pushed a commit that referenced this pull request Apr 12, 2023
…ion (#228) (#1509)

* #639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (#1456)

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
penghuo pushed a commit that referenced this pull request Apr 12, 2023
…ion (#228) (#1508)

* #639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (#1456)

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
acarbonetto added a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Apr 18, 2023
…elds and the score OpenSearch function (#228) (opensearch-project#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants