Skip to content

feat: Expand structured data types and type casting to reference check#2602

Merged
m1n0 merged 4 commits intomainfrom
dtl-1657-support-structured-data-and-type-casting
Feb 26, 2026
Merged

feat: Expand structured data types and type casting to reference check#2602
m1n0 merged 4 commits intomainfrom
dtl-1657-support-structured-data-and-type-casting

Conversation

@m1n0
Copy link
Contributor

@m1n0 m1n0 commented Feb 25, 2026

Description

expand the feature to reference check. Postponed after simpler check types for extra testing and cautious release.

Checklist

  • I added a test to verify the new functionality.
  • I verified this PR does not break soda-extensions.


def _get_id_properties(self) -> dict[str, any]:
id_properties: dict[str, str] = super()._get_id_properties()
id_properties["column"] = str(self.column_expression)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to identity is handled in super
column name is not necessary - if no expression is provided by user then column_expression is equal to the column name already anyway.

referencing_column_expression: COLUMN | SqlExpressionStr = self.metric_impl.column_expression

if isinstance(referencing_column_expression, SqlExpressionStr):
# find and replace the column name in the column expression with the aliased version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested on example in comment and on json extraction (that one is in tests) - this may not be bulletproof, but should cover standard use cases

@m1n0 m1n0 requested a review from mivds February 25, 2026 16:26
Copy link
Contributor

@mivds mivds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tricky stuff in there with the referencing. Not fully confident reviewing that, but it seems to make sense 🙂

Comment on lines +292 to +296

full_referencing_column_expression = referencing_column_expression

referencing_column_name: str = self.metric_impl.column_impl.column_yaml.name
if isinstance(referencing_column_expression, COLUMN):
full_referencing_column_expression = referencing_column_expression.IN(self.referencing_alias)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, can't this be simplified to an else clause?

Suggested change
full_referencing_column_expression = referencing_column_expression
referencing_column_name: str = self.metric_impl.column_impl.column_yaml.name
if isinstance(referencing_column_expression, COLUMN):
full_referencing_column_expression = referencing_column_expression.IN(self.referencing_alias)
else:
referencing_column_expression = referencing_column_expression.IN(self.referencing_alias)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, and the type cast gives you a hint on what the else covers, but honestly I like the explicit isinstance as it's 100% clear on first glance

aliased_column_name = f'"{self.referencing_alias}".{column_name}'
pattern = r"\b" + re.escape(column_name) + r"\b"
referencing_column_expression = SqlExpressionStr(
re.sub(pattern, aliased_column_name, referencing_column_expression.expression_str, count=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the count=1 here? Can there never be more than one occurrence? Not entirely clear on what might be inside this referencing_column_expression, so feel free to ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure what's best, this was suggested by Claude and I am kinda hesitant on the reasoning. Can there be more than one occurrence of the column in the expression? Certainly, but I am not sure what will people attempt to use this for, but normally type casting and struct data extraction should be covered by using the column only once. I kinda like the "replace only once" limitation as we don't want to encourage expressions that are too complex, we have dedicated check types for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had a discussion and decided to allow multi occurrence replacement. I updated the PR, thanks for flagging

@sonarqubecloud
Copy link

@m1n0 m1n0 merged commit f65c079 into main Feb 26, 2026
41 checks passed
@m1n0 m1n0 deleted the dtl-1657-support-structured-data-and-type-casting branch February 26, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants