Bug
Evaluation code generally expects that each column in the query matches at least one column in a schema. For pure wildcard columns though this isn't necessarily true -- it can be the case that upon dynamically expanding a wildcard we find that no column in the schema matches the wildcard.
During schema matching we currently only check that at least one column with a type matching the wildcard column exists somewhere in the archive (irrespective of namespace and subtree type) and don't bother checking for matching types on a per-schema basis. I.e. we don't consider pure wildcard columns in intersect_and_sub_expr.
The current behaviour during evaluation is to return false from the evaluation of a wildcard filter that matches no columns, but this leads to incorrect evaluation when the wildcard filter is inverted.
For example the query: NOT *: 0
against the dataset
will yield the result {"a": "b"}. This result is unexpected since our query semantics involve evaluating against columns with matching type but the returned record has no such column.
The simplest solution is to update our evaluation code in QueryRunner to use True, False, and Pruned like we do in kv-ir evaluation so that dynamic wildcard expansion can result in dynamic pruning of the AST.
CLP version
47ff53a
Environment
ubuntu 22.04 container
Reproduction steps
See issue.
Bug
Evaluation code generally expects that each column in the query matches at least one column in a schema. For pure wildcard columns though this isn't necessarily true -- it can be the case that upon dynamically expanding a wildcard we find that no column in the schema matches the wildcard.
During schema matching we currently only check that at least one column with a type matching the wildcard column exists somewhere in the archive (irrespective of namespace and subtree type) and don't bother checking for matching types on a per-schema basis. I.e. we don't consider pure wildcard columns in
intersect_and_sub_expr.The current behaviour during evaluation is to return
falsefrom the evaluation of a wildcard filter that matches no columns, but this leads to incorrect evaluation when the wildcard filter is inverted.For example the query:
NOT *: 0against the dataset
will yield the result
{"a": "b"}. This result is unexpected since our query semantics involve evaluating against columns with matching type but the returned record has no such column.The simplest solution is to update our evaluation code in
QueryRunnerto useTrue,False, andPrunedlike we do in kv-ir evaluation so that dynamic wildcard expansion can result in dynamic pruning of the AST.CLP version
47ff53a
Environment
ubuntu 22.04 container
Reproduction steps
See issue.