Conversation
|
| # Synapse uses clustered columnstore indexes by default, which don't support varchar(MAX). | ||
| # Use HEAP to create a heap table instead. | ||
| create_table_sql = create_table_sql + "\nWITH (HEAP)" |
There was a problem hiding this comment.
Does this have any negative impact with regard to query performance? I guess this is also used to create DWH tables where we store failed rows. Since that's potentially a lot of data, we should ensure query performance is good.
Some input from the friendly neighborhood AI:
When WITH (HEAP) is a good idea:
* Staging tables / transient loads (ELT pipelines)
* Some small tables / lookup-ish scenarios (though clustered index can also be a good fit)
When it’s usually a bad idea
* Large fact tables / “main warehouse tables” where you benefit from columnstore compression and scan performance (the default CCI is usually preferred).
For our use case I'm not sure it's a good idea from the query perspective. But then I have no insight on the VARCHAR(MAX) support or alternatives. So, your call
There was a problem hiding this comment.
I was doubting this as well. There is a significant trade-off here. After your review on this I checked with the PM: we decided to do it with HEAP now, so we can still support VARCHAR(MAX). The alternative is to use VARCHAR(8000), which is a relatively low limit.
This can still change. We'll include so we can test it, but won't enable it for customers. So we're not locking ourselves in either method at the moment. We'll verify this with CEs as well.
Thank you for pointing this out again 🙌 !
|
|
||
| def _build_insert_into_values_row_sql(self, values: VALUES_ROW) -> str: | ||
| values_sql: str = "SELECT " + ", ".join([self.literal(value) for value in values.values]) | ||
| values_sql = self.encode_string_for_sql(values_sql) |
There was a problem hiding this comment.
You mentioned issues with encoding \n in the PR description, but do we risk introducing different issues by removing this? (Also applies to the Fabric PR)
There was a problem hiding this comment.
It only applies to testing code and DWH IIRC.
The issue is tested specifically in DWH tests, where I've seen some issues with the strings containing \n or special quoting or...
Removing it actually solves it.



Description
Note: this builds on the Fabric PR. I've split them up so changes are more clear.
synapse_data_source.pysoda-synapse/.../synapse_data_source_test_helper.pyAdded drop_schema_if_exists override (Synapse doesn't support IF EXISTS / CASCADE on DROP SCHEMA)
What problem are you solving?
Any expected impact on downstream packages/services?
Reference issue # if available.
Checklist