feat: Expand structured data types and type casting to reference check#2602
feat: Expand structured data types and type casting to reference check#2602
Conversation
|
|
||
| def _get_id_properties(self) -> dict[str, any]: | ||
| id_properties: dict[str, str] = super()._get_id_properties() | ||
| id_properties["column"] = str(self.column_expression) |
There was a problem hiding this comment.
adding to identity is handled in super
column name is not necessary - if no expression is provided by user then column_expression is equal to the column name already anyway.
| referencing_column_expression: COLUMN | SqlExpressionStr = self.metric_impl.column_expression | ||
|
|
||
| if isinstance(referencing_column_expression, SqlExpressionStr): | ||
| # find and replace the column name in the column expression with the aliased version |
There was a problem hiding this comment.
tested on example in comment and on json extraction (that one is in tests) - this may not be bulletproof, but should cover standard use cases
mivds
left a comment
There was a problem hiding this comment.
Some tricky stuff in there with the referencing. Not fully confident reviewing that, but it seems to make sense 🙂
|
|
||
| full_referencing_column_expression = referencing_column_expression | ||
|
|
||
| referencing_column_name: str = self.metric_impl.column_impl.column_yaml.name | ||
| if isinstance(referencing_column_expression, COLUMN): | ||
| full_referencing_column_expression = referencing_column_expression.IN(self.referencing_alias) |
There was a problem hiding this comment.
If I understand correctly, can't this be simplified to an else clause?
| full_referencing_column_expression = referencing_column_expression | |
| referencing_column_name: str = self.metric_impl.column_impl.column_yaml.name | |
| if isinstance(referencing_column_expression, COLUMN): | |
| full_referencing_column_expression = referencing_column_expression.IN(self.referencing_alias) | |
| else: | |
| referencing_column_expression = referencing_column_expression.IN(self.referencing_alias) |
There was a problem hiding this comment.
yes it is, and the type cast gives you a hint on what the else covers, but honestly I like the explicit isinstance as it's 100% clear on first glance
| aliased_column_name = f'"{self.referencing_alias}".{column_name}' | ||
| pattern = r"\b" + re.escape(column_name) + r"\b" | ||
| referencing_column_expression = SqlExpressionStr( | ||
| re.sub(pattern, aliased_column_name, referencing_column_expression.expression_str, count=1) |
There was a problem hiding this comment.
Why the count=1 here? Can there never be more than one occurrence? Not entirely clear on what might be inside this referencing_column_expression, so feel free to ignore
There was a problem hiding this comment.
I am also not sure what's best, this was suggested by Claude and I am kinda hesitant on the reasoning. Can there be more than one occurrence of the column in the expression? Certainly, but I am not sure what will people attempt to use this for, but normally type casting and struct data extraction should be covered by using the column only once. I kinda like the "replace only once" limitation as we don't want to encourage expressions that are too complex, we have dedicated check types for that.
There was a problem hiding this comment.
we had a discussion and decided to allow multi occurrence replacement. I updated the PR, thanks for flagging
|



Description
expand the feature to reference check. Postponed after simpler check types for extra testing and cautious release.
Checklist