Cross database joins#670
Conversation
sfc-gh-mraba
left a comment
There was a problem hiding this comment.
Pls consider pausing this change and applying it after #675 is reviewed and accepted.
After rebasing following steps should bring this solution to life - I'd be happy to jump in:
_split_schema_by_dotcan be dropped - already present in #675- Restore same-schema FK detection (the critical regression)
The root cause:_get_schema_foreign_keysnow receivesfull_schema_name("DB"."SCHEMA") as its schema parameter.normalize_nameon a composite string never equals a plainpk_schema, so the user's schema is never insame_schemas.
The right fix is to separate SQL identity (what goes into SHOW) from the FK comparison identity (what to compare normalized pk_schema against). The cleanest way without changing the
cached function's cache key:
# In get_foreign_keys / get_multi_foreign_keys — pass raw schema through
def get_foreign_keys(self, connection, table_name, schema=None, **kw):
schema = schema or self.default_schema_name
if self._is_single_table_reflection(schema, **kw):
return self._get_table_foreign_keys(connection, table_name, schema, **kw)
full_schema_name = self._get_full_schema_name(connection, schema, **kw)
# Pass both: full_schema_name for the SHOW SQL,
# schema (raw) for same-schema FK detection
return self._get_schema_foreign_keys(
connection, full_schema_name, user_schema=schema, **kw
).get(table_name, [])
@reflection.cache
def _get_schema_foreign_keys(self, connection, schema, user_schema=None, **kw):
current_database, current_schema = self._current_database_schema(connection, **kw)
same_schemas = {self.default_schema_name, current_schema}
# Add the user-specified schema (schema part only, not db prefix)
# so same-schema FKs within a non-default schema get referred_schema=None
if user_schema:
parts = self.identifier_preparer._split_schema_by_dot(user_schema)
schema_only = parts[-1] # last part is always the schema
if "." not in str(schema_only): # guard: don't add composite strings
same_schemas.add(self.normalize_name(str(schema_only)))
result = connection.execute(
text(f"SHOW /* sqlalchemy:_get_schema_foreign_keys */ IMPORTED KEYS IN SCHEMA {schema}")
)
return self._parse_fk_rows(result, same_schemas, current_database)Apply the same fix to _get_table_foreign_keys (which already receives the raw schema; it just needs to stop discarding it from same_schemas when the schema is cross-database — the
current PR #670 logic of if "." not in normalized_schema is correct there, just the composite-string-as-input problem is unique to the schema-wide path).
▎ Why this matters: the existing docstring in _get_schema_foreign_keys explicitly states that the reflected schema must be in same_schemas for non-default-schema reflection to work. PR
#670 breaks this invariant silently.
- Quote referred_schema in
_parse_fk_rows(small, defensive) - Remove double-parse in
_query_all_columns_info(optional but recommended) - Add one regression test
def test_get_foreign_keys_non_current_schema_same_db(engine_testaccount, ...):
"""FKs within a non-current schema should have referred_schema=None."""
# Create a schema that differs from current_schema
# Create two tables with an FK between them in that schema
# Reflect FKs, assert referred_schema is NoneThis would have caught the same_schemas regression.
152c502 to
2885da4
Compare
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-948287: Best practice for doing cross database joins using CORE #456
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Add the possibility to create cross-database joins by adjusting how the schema is parsed.