Skip to content

fix composite keys order#669

Merged
sfc-gh-jcieslak merged 12 commits into
mainfrom
jcieslak/fix-composite-keys-order
Apr 20, 2026
Merged

fix composite keys order#669
sfc-gh-jcieslak merged 12 commits into
mainfrom
jcieslak/fix-composite-keys-order

Conversation

@sfc-gh-jcieslak

@sfc-gh-jcieslak sfc-gh-jcieslak commented Mar 12, 2026

Copy link
Copy Markdown
Member
  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-922868: Preserve the sequence order of foreign keys when dealing with composite keys. #450
    References SNOW-922868 Preserve the sequence order of foreign keys when dealing with composite keys #449

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Adding sequence key to column queries so that in case of composite keys, they're being returned in order, which matters in cases like Alembic migrations (also following other dialects' behavior).

@sfc-gh-jcieslak sfc-gh-jcieslak marked this pull request as ready for review March 25, 2026 15:22
@sfc-gh-jcieslak sfc-gh-jcieslak requested a review from a team as a code owner March 25, 2026 15:22
@sfc-gh-jcieslak sfc-gh-jcieslak force-pushed the jcieslak/fix-composite-keys-order branch from ccf8dff to 7ae539b Compare March 25, 2026 15:23
Comment thread src/snowflake/sqlalchemy/snowdialect.py Outdated
]

# Sort referred columns by key sequence
v["referred_columns"] = [col for _, col in sorted(v["referred_columns"])]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few repetitions of this sorting.
Given the comment on lexicographic sort this could be replaced with helper

  @staticmethod
  def _sorted_by_key_sequence(pairs):
      """Sort (key_sequence, col_name) pairs and return col_names in order."""
      return [col for _, col in sorted(pairs, key=lambda x: int(x[0]))]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added helper

Comment thread src/snowflake/sqlalchemy/snowdialect.py Outdated
Comment on lines +450 to +452
constraint["column_names"] = [
col for _, col in sorted(constraint["column_names"])
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row._mapping["key_sequence"] is likely returned as a string or Decimal by the Snowflake
Python connector. Sorting tuples of (str, str) is lexicographic, which breaks for
sequences beyond 9:

  "1", "10", "2"  ← string sort (wrong)
    1,   2,  10  ← int sort (correct)

This is a latent bug that will only surface on tables with 10+ column composite keys —
uncommon, but definitely possible in data-warehouse schemas.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added NamedTuple which casts to int and helper works on this NamedTuple to ensure we are sorting on ints rather than strings

Comment thread src/snowflake/sqlalchemy/snowdialect.py Outdated

ans = defaultdict(list)
for constraint in unique_constraints.values():
# Sort constrained columns by key sequence

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are sorted by 'sequence' or 'column_names'?

@sfc-gh-jcieslak sfc-gh-jcieslak Mar 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted within helper to make it more readable; it should be sorted by key_sequence

Comment thread tests/test_keys.py Outdated
eq_(fks[0]["referred_columns"], ["col_a", "col_b", "col_c"])
eq_(fks[0]["referred_table"], "test_keys_fk_parent_decl")
finally:
child.drop(engine_testaccount)

@sfc-gh-asawicki sfc-gh-asawicki Apr 1, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what if child.drop fails (e.g. throwing the exception)? parent.drop wouldn't be executed them, right? SHouldn't we use metadata.drop_all(engine_testaccount) instead in all such tests (example:

metadata.drop_all(engine_testaccount)
)? (I see that it's also implemented like this in other tests, eg
addresses.drop(engine_testaccount)
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tests/test_keys.py Outdated
from sqlalchemy.testing.assertions import eq_


def test_composite_fk_reflects_key_order(engine_testaccount):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these tests are pretty repeatable (they share similar setup, execution, and validation), I could see them as parametrized tests (it would make it easier to add more cases and focus on use cases). There are multiple options to implement paramterized tests, so maybe we could discuss later on.

@sfc-gh-jcieslak sfc-gh-jcieslak force-pushed the jcieslak/fix-composite-keys-order branch from 31841ab to 589437d Compare April 15, 2026 12:32
Comment thread src/snowflake/sqlalchemy/snowdialect.py Outdated
@staticmethod
def _sort_columns_by_key_sequence(columns: list[_KeyedColumn]) -> list[str]:
"""Sort columns by key_sequence and return column names."""
columns.sort(key=lambda c: c.key_sequence)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sort() mutates the caller's list. The method signature and docstring both look like a pure function that returns a transformed list, but it also has an invisible side effect. The callers are fine today because they immediately reassign the result but this could silently corrupt state if the list were ever passed from somewhere that also holds a reference. Use sorted() instead:

return [c.column_name for c in sorted(columns, key=lambda c: c.key_sequence)]

This is also more idiomatic Python for "return a sorted copy."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tests/test_keys.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_composite_fk_reflects_key_order and test_composite_fk_when_parent_pk_order_differs_from_columns both test multi-column FK reflection. The second one adds a twist (PK column order differs from table column order), which is the most important edge case. Consider merging the two, or making it clear in the first test's name that it only covers the "same order" case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened by parametrizing

Comment thread src/snowflake/sqlalchemy/snowdialect.py Outdated
Comment on lines +81 to +87
class _KeyedColumn(NamedTuple):
key_sequence: int
column_name: str

@classmethod
def new(cls, key_sequence, column_name: str) -> "_KeyedColumn":
return cls(int(key_sequence), column_name)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the factory entirely and use as

  _KeyedColumn(int(row._mapping["key_sequence"]), self.normalize_name(...))
  • The int() cast is now visible at the call site — it's explicit that the DB cursor returns a string and we're converting it. That's not something worth hiding.
  • Fewer moving parts. The factory saves no lines; every call site with .new(x, y) is the same length as (int(x), y).
  • The NamedTuple constructor signature already documents the types (key_sequence: int). A reader who sees int(row._mapping["key_sequence"]) immediately understands the intent.
  • The .new() name is also misleading — it looks like it might do validation or have richer logic, but it's just a coercion wrapper.

The only scenario where a factory would be justified is if key_sequence required non-trivial parsing (e.g., stripping units, handling nulls, mapping enums). A bare
int() cast doesn't clear that bar.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sfc-gh-jcieslak sfc-gh-jcieslak merged commit c1b7d72 into main Apr 20, 2026
63 checks passed
@sfc-gh-jcieslak sfc-gh-jcieslak deleted the jcieslak/fix-composite-keys-order branch April 20, 2026 10:00
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-922868: Preserve the sequence order of foreign keys when dealing with composite keys.

3 participants