Skip to content

jsonpath: add like_regex, string and null scalars#143240

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
normanchenn:norman/regex
Mar 27, 2025
Merged

jsonpath: add like_regex, string and null scalars#143240
craig[bot] merged 3 commits intocockroachdb:masterfrom
normanchenn:norman/regex

Conversation

@normanchenn
Copy link
Copy Markdown
Contributor

@normanchenn normanchenn commented Mar 20, 2025

jsonpath/parser: add support for string scalars

This commit adds string scalars, enabling string comparisons within
jsonpath queries.

Epic: None
Release note (sql change): Support string comparisons within jsonpath
queries.

jsonpath/parser: add support for null scalars.

This commit adds null scalars, enabling null comparisons within jsonpath
queries.

Epic: None
Release note (sql change): Support null comparisons within jsonpath
queries.

jsonpath: add like_regex support

This commit add like_regex predicate evaluation support. Flags for
like_regex are not supported yet.

Informs: #143243
Epic: None
Release note (sql change): Add like_regex predicate evaluation support
for jsonpath queries. Flags for like_regex are not supported yet.

@normanchenn normanchenn self-assigned this Mar 20, 2025
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@normanchenn normanchenn mentioned this pull request Mar 20, 2025
45 tasks
This commit adds string scalars, enabling string comparisons within
jsonpath queries.

Epic: None
Release note (sql change): Support string comparisons within jsonpath
queries.
@normanchenn normanchenn force-pushed the norman/regex branch 2 times, most recently from d5e30ec to 86603a3 Compare March 21, 2025 18:12
@normanchenn normanchenn marked this pull request as ready for review March 21, 2025 20:12
@normanchenn normanchenn requested a review from a team as a code owner March 21, 2025 20:12
@normanchenn normanchenn requested a review from yuzefovich March 21, 2025 20:12
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Reviewed 4 of 4 files at r1, 3 of 3 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @normanchenn)


-- commits line 28 at r3:
nit: should we mention that flag part is not supported? Or do you think we'll add that soon?


pkg/util/jsonpath/parser/jsonpath.y line 459 at r1 (raw file):

unreserved_keyword:
  AND

Hm, why AND, NOT, and OR are being removed in the first commit?


pkg/sql/logictest/testdata/logic_test/jsonb_path_query line 794 at r2 (raw file):


query T
SELECT jsonb_path_query('{}', 'null == null');

nit: did you mean to make this !=?


pkg/util/jsonpath/eval/operation.go line 112 at r3 (raw file):

		return jsonpathBoolUnknown, err
	}
	res := r.Find([]byte(*text))

nit: I think r.MatchString would be cleaner here.


pkg/util/jsonpath/parser/jsonpath.y line 460 at r1 (raw file):

any_identifier:
  IDENT

nit: should we remove IDENT since I think it's no longer used?


pkg/sql/logictest/testdata/logic_test/jsonb_path_query line 851 at r3 (raw file):

"987_650"
"987_424"

nit: should we add a couple test cases where regex uses a backslash?

Copy link
Copy Markdown
Contributor Author

@normanchenn normanchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @normanchenn and @yuzefovich)


-- commits line 28 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we mention that flag part is not supported? Or do you think we'll add that soon?

I'm not sure if we should support it yet - I'm not sure if theres a straightforward way to add these flags with the current implementation.


pkg/util/jsonpath/parser/jsonpath.y line 459 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, why AND, NOT, and OR are being removed in the first commit?

I don't think AND, NOT, and OR should be considered unreserved keywords since when we're scanning for these tokens, they are symbols such as &&, ||, and !, so when lexbase.GetKeywordID is called, I think it would make more sense to identify the string as an IDENT instead of AND for a key such as "and"


pkg/util/jsonpath/parser/jsonpath.y line 460 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we remove IDENT since I think it's no longer used?

We are still using IDENT, because most key accessor names are identified as IDENT during scanning. For example, in $.a, a is an IDENT. See the call to lexbase.GetKeywordID in JSONPathScanner.scanIdent.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Glad that UnescapePattern does what we need (and, philosophically speaking, it makes sense that we'd need to do something very similar for LIKE comparison operator). I do have one more comment.

Reviewed 10 of 10 files at r4, 13 of 13 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @normanchenn)


pkg/util/jsonpath/parser/jsonpath.y line 459 at r1 (raw file):

Previously, normanchenn (Norman Chen) wrote…

I don't think AND, NOT, and OR should be considered unreserved keywords since when we're scanning for these tokens, they are symbols such as &&, ||, and !, so when lexbase.GetKeywordID is called, I think it would make more sense to identify the string as an IDENT instead of AND for a key such as "and"

I see, sgtm.


pkg/util/jsonpath/parser/jsonpath.y line 460 at r1 (raw file):

Previously, normanchenn (Norman Chen) wrote…

We are still using IDENT, because most key accessor names are identified as IDENT during scanning. For example, in $.a, a is an IDENT. See the call to lexbase.GetKeywordID in JSONPathScanner.scanIdent.

Ack, thanks, I missed this usage.


pkg/sql/logictest/testdata/logic_test/jsonb_path_query line 794 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: did you mean to make this !=?

nit: this fixup belongs in the second commit, not third 😃


pkg/sql/logictest/testdata/logic_test/jsonb_path_query line 879 at r5 (raw file):

"https://test.com/path"

nit: double newline.


pkg/util/jsonpath/eval/operation.go line 115 at r5 (raw file):

	regex := op.Right.(jsonpath.Regex)
	regex.Regex, err = eval.UnescapePattern(regex.Regex, `\`, false)

Hm, I think this might be problematic since we're modifying the right operand in place. What will happen if we evaluate the same jsonpath like_regex expression on multiple JSON objects (i.e. multiple rows from the table)? I'd guess that we'd keep unescaping the right side every time, in-place.

Also nit: add inlined comments like /* escapeToken */ and for false too.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick run of relevant pg_regress files, and one difference caught my eye:

select jsonb_path_query('[null, 1, "abc", "abd", "aBdC", "abdacb", "babc", "adc\nabc", "ab\nadc"]', 'lax $[*] ? (@ like_regex "^ab.*c")');

PG returns two rows whereas we return three. At the same time we do match PG on:

select 'ab\nadc' like '^ab.*c';

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @normanchenn)


pkg/util/jsonpath/eval/operation.go line 115 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, I think this might be problematic since we're modifying the right operand in place. What will happen if we evaluate the same jsonpath like_regex expression on multiple JSON objects (i.e. multiple rows from the table)? I'd guess that we'd keep unescaping the right side every time, in-place.

Also nit: add inlined comments like /* escapeToken */ and for false too.

Hm, on the second thought, we are operating with the value, not the pointer, so perhaps this will be fine. Actually, your tests should already cover this - we should be invoking evalRegex multiple times when we have multiple paths.

This commit adds null scalars, enabling null comparisons within jsonpath
queries.

Epic: None
Release note (sql change): Support null comparisons within jsonpath
queries.
This commit add `like_regex` predicate evaluation support. Flags for
`like_regex` are not supported yet.

Epic: None
Release note (sql change): Add `like_regex` predicate evaluation support
for jsonpath queries. Flags for `like_regex` are not supported yet.
Copy link
Copy Markdown
Contributor Author

@normanchenn normanchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/util/jsonpath/eval/operation.go line 115 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, on the second thought, we are operating with the value, not the pointer, so perhaps this will be fine. Actually, your tests should already cover this - we should be invoking evalRegex multiple times when we have multiple paths.

I changed this to not use UnescapePattern anymore, as setting allowEscapes = true during scanning and retrieving the json string with AsText() seems to do the trick.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm:

Reviewed 13 of 13 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @normanchenn)

@normanchenn
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 27, 2025

@craig craig bot merged commit 697f493 into cockroachdb:master Mar 27, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants