Skip to content

Fix: small fixes for Synapse data source#2608

Merged
Niels-b merged 3 commits intomainfrom
r-471-synapse
Feb 27, 2026
Merged

Fix: small fixes for Synapse data source#2608
Niels-b merged 3 commits intomainfrom
r-471-synapse

Conversation

@Niels-b
Copy link
Contributor

@Niels-b Niels-b commented Feb 26, 2026

Description

Note: this builds on the Fabric PR. I've split them up so changes are more clear.

synapse_data_source.py

  • Added connection parameter to SynapseDataSourceImpl.init (needed for create_copy_with_different_connection)
  • Added build_create_table_sql override with WITH (HEAP) (Synapse columnstore indexes don't support varchar(MAX))
  • Removed encode_string_for_sql call in _build_insert_into_values_row_sql (was corrupting newlines)

soda-synapse/.../synapse_data_source_test_helper.py

  • Added drop_schema_if_exists override (Synapse doesn't support IF EXISTS / CASCADE on DROP SCHEMA)

  • What problem are you solving?

    • Fixing issues surfaced by Synapse development
  • Any expected impact on downstream packages/services?

    • Yes, these issues were surfaced while developing extensions
  • Reference issue # if available.

Checklist

@sonarqubecloud
Copy link

@Niels-b Niels-b marked this pull request as ready for review February 26, 2026 16:04
@Niels-b Niels-b requested review from m1n0, mivds and paulteehan February 26, 2026 16:05
Comment on lines +54 to +56
# Synapse uses clustered columnstore indexes by default, which don't support varchar(MAX).
# Use HEAP to create a heap table instead.
create_table_sql = create_table_sql + "\nWITH (HEAP)"
Copy link
Contributor

@mivds mivds Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any negative impact with regard to query performance? I guess this is also used to create DWH tables where we store failed rows. Since that's potentially a lot of data, we should ensure query performance is good.

Some input from the friendly neighborhood AI:

When WITH (HEAP) is a good idea:

* Staging tables / transient loads (ELT pipelines)
* Some small tables / lookup-ish scenarios (though clustered index can also be a good fit)

When it’s usually a bad idea

* Large fact tables / “main warehouse tables” where you benefit from columnstore compression and scan performance (the default CCI is usually preferred).

For our use case I'm not sure it's a good idea from the query perspective. But then I have no insight on the VARCHAR(MAX) support or alternatives. So, your call

Copy link
Contributor Author

@Niels-b Niels-b Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doubting this as well. There is a significant trade-off here. After your review on this I checked with the PM: we decided to do it with HEAP now, so we can still support VARCHAR(MAX). The alternative is to use VARCHAR(8000), which is a relatively low limit.

This can still change. We'll include so we can test it, but won't enable it for customers. So we're not locking ourselves in either method at the moment. We'll verify this with CEs as well.

Thank you for pointing this out again 🙌 !


def _build_insert_into_values_row_sql(self, values: VALUES_ROW) -> str:
values_sql: str = "SELECT " + ", ".join([self.literal(value) for value in values.values])
values_sql = self.encode_string_for_sql(values_sql)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned issues with encoding \n in the PR description, but do we risk introducing different issues by removing this? (Also applies to the Fabric PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only applies to testing code and DWH IIRC.
The issue is tested specifically in DWH tests, where I've seen some issues with the strings containing \n or special quoting or...
Removing it actually solves it.

Base automatically changed from r-472-fabric to main February 27, 2026 08:40
@Niels-b Niels-b merged commit 85602e4 into main Feb 27, 2026
41 checks passed
@Niels-b Niels-b deleted the r-471-synapse branch February 27, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants