fix composite keys order#669
Conversation
ccf8dff to
7ae539b
Compare
| ] | ||
|
|
||
| # Sort referred columns by key sequence | ||
| v["referred_columns"] = [col for _, col in sorted(v["referred_columns"])] |
There was a problem hiding this comment.
There are a few repetitions of this sorting.
Given the comment on lexicographic sort this could be replaced with helper
@staticmethod
def _sorted_by_key_sequence(pairs):
"""Sort (key_sequence, col_name) pairs and return col_names in order."""
return [col for _, col in sorted(pairs, key=lambda x: int(x[0]))]There was a problem hiding this comment.
Good point, added helper
| constraint["column_names"] = [ | ||
| col for _, col in sorted(constraint["column_names"]) | ||
| ] |
There was a problem hiding this comment.
row._mapping["key_sequence"] is likely returned as a string or Decimal by the Snowflake
Python connector. Sorting tuples of (str, str) is lexicographic, which breaks for
sequences beyond 9:
"1", "10", "2" ← string sort (wrong)
1, 2, 10 ← int sort (correct)
This is a latent bug that will only surface on tables with 10+ column composite keys —
uncommon, but definitely possible in data-warehouse schemas.
There was a problem hiding this comment.
Good point, added NamedTuple which casts to int and helper works on this NamedTuple to ensure we are sorting on ints rather than strings
|
|
||
| ans = defaultdict(list) | ||
| for constraint in unique_constraints.values(): | ||
| # Sort constrained columns by key sequence |
There was a problem hiding this comment.
These are sorted by 'sequence' or 'column_names'?
There was a problem hiding this comment.
Adjusted within helper to make it more readable; it should be sorted by key_sequence
| eq_(fks[0]["referred_columns"], ["col_a", "col_b", "col_c"]) | ||
| eq_(fks[0]["referred_table"], "test_keys_fk_parent_decl") | ||
| finally: | ||
| child.drop(engine_testaccount) |
There was a problem hiding this comment.
question: what if child.drop fails (e.g. throwing the exception)? parent.drop wouldn't be executed them, right? SHouldn't we use metadata.drop_all(engine_testaccount) instead in all such tests (example:
snowflake-sqlalchemy/tests/test_core.py
Line 559 in fc2e8fa
snowflake-sqlalchemy/tests/test_core.py
Line 326 in fc2e8fa
| from sqlalchemy.testing.assertions import eq_ | ||
|
|
||
|
|
||
| def test_composite_fk_reflects_key_order(engine_testaccount): |
There was a problem hiding this comment.
nit: these tests are pretty repeatable (they share similar setup, execution, and validation), I could see them as parametrized tests (it would make it easier to add more cases and focus on use cases). There are multiple options to implement paramterized tests, so maybe we could discuss later on.
31841ab to
589437d
Compare
| @staticmethod | ||
| def _sort_columns_by_key_sequence(columns: list[_KeyedColumn]) -> list[str]: | ||
| """Sort columns by key_sequence and return column names.""" | ||
| columns.sort(key=lambda c: c.key_sequence) |
There was a problem hiding this comment.
.sort() mutates the caller's list. The method signature and docstring both look like a pure function that returns a transformed list, but it also has an invisible side effect. The callers are fine today because they immediately reassign the result but this could silently corrupt state if the list were ever passed from somewhere that also holds a reference. Use sorted() instead:
return [c.column_name for c in sorted(columns, key=lambda c: c.key_sequence)]This is also more idiomatic Python for "return a sorted copy."
There was a problem hiding this comment.
test_composite_fk_reflects_key_order and test_composite_fk_when_parent_pk_order_differs_from_columns both test multi-column FK reflection. The second one adds a twist (PK column order differs from table column order), which is the most important edge case. Consider merging the two, or making it clear in the first test's name that it only covers the "same order" case.
There was a problem hiding this comment.
Shortened by parametrizing
| class _KeyedColumn(NamedTuple): | ||
| key_sequence: int | ||
| column_name: str | ||
|
|
||
| @classmethod | ||
| def new(cls, key_sequence, column_name: str) -> "_KeyedColumn": | ||
| return cls(int(key_sequence), column_name) |
There was a problem hiding this comment.
I suggest to remove the factory entirely and use as
_KeyedColumn(int(row._mapping["key_sequence"]), self.normalize_name(...))- The int() cast is now visible at the call site — it's explicit that the DB cursor returns a string and we're converting it. That's not something worth hiding.
- Fewer moving parts. The factory saves no lines; every call site with
.new(x, y)is the same length as(int(x), y). - The
NamedTupleconstructor signature already documents the types(key_sequence: int). A reader who seesint(row._mapping["key_sequence"])immediately understands the intent. - The
.new()name is also misleading — it looks like it might do validation or have richer logic, but it's just a coercion wrapper.
The only scenario where a factory would be justified is if key_sequence required non-trivial parsing (e.g., stripping units, handling nulls, mapping enums). A bare
int() cast doesn't clear that bar.
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-922868: Preserve the sequence order of foreign keys when dealing with composite keys. #450
References SNOW-922868 Preserve the sequence order of foreign keys when dealing with composite keys #449
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Adding sequence key to column queries so that in case of composite keys, they're being returned in order, which matters in cases like Alembic migrations (also following other dialects' behavior).