Skip to content

Cross database joins#670

Merged
sfc-gh-jcieslak merged 14 commits into
mainfrom
jcieslak/cross-database-joins
Apr 28, 2026
Merged

Cross database joins#670
sfc-gh-jcieslak merged 14 commits into
mainfrom
jcieslak/cross-database-joins

Conversation

@sfc-gh-jcieslak

Copy link
Copy Markdown
Member

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-948287: Best practice for doing cross database joins using CORE #456

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Add the possibility to create cross-database joins by adjusting how the schema is parsed.

@sfc-gh-jcieslak sfc-gh-jcieslak requested a review from a team as a code owner April 15, 2026 12:43

@sfc-gh-mraba sfc-gh-mraba left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls consider pausing this change and applying it after #675 is reviewed and accepted.
After rebasing following steps should bring this solution to life - I'd be happy to jump in:

  • _split_schema_by_dot can be dropped - already present in #675
  • Restore same-schema FK detection (the critical regression)
    The root cause: _get_schema_foreign_keys now receives full_schema_name ("DB"."SCHEMA") as its schema parameter. normalize_name on a composite string never equals a plain pk_schema, so the user's schema is never in same_schemas.

The right fix is to separate SQL identity (what goes into SHOW) from the FK comparison identity (what to compare normalized pk_schema against). The cleanest way without changing the
cached function's cache key:

  # In get_foreign_keys / get_multi_foreign_keys — pass raw schema through
  def get_foreign_keys(self, connection, table_name, schema=None, **kw):
      schema = schema or self.default_schema_name
      if self._is_single_table_reflection(schema, **kw):
          return self._get_table_foreign_keys(connection, table_name, schema, **kw)
      full_schema_name = self._get_full_schema_name(connection, schema, **kw)
      # Pass both: full_schema_name for the SHOW SQL,
      # schema (raw) for same-schema FK detection
      return self._get_schema_foreign_keys(
          connection, full_schema_name, user_schema=schema, **kw
      ).get(table_name, [])

  @reflection.cache
  def _get_schema_foreign_keys(self, connection, schema, user_schema=None, **kw):
      current_database, current_schema = self._current_database_schema(connection, **kw)
      same_schemas = {self.default_schema_name, current_schema}

      # Add the user-specified schema (schema part only, not db prefix)
      # so same-schema FKs within a non-default schema get referred_schema=None
      if user_schema:
          parts = self.identifier_preparer._split_schema_by_dot(user_schema)
          schema_only = parts[-1]          # last part is always the schema
          if "." not in str(schema_only):  # guard: don't add composite strings
              same_schemas.add(self.normalize_name(str(schema_only)))

      result = connection.execute(
          text(f"SHOW /* sqlalchemy:_get_schema_foreign_keys */ IMPORTED KEYS IN SCHEMA {schema}")
      )
      return self._parse_fk_rows(result, same_schemas, current_database)

Apply the same fix to _get_table_foreign_keys (which already receives the raw schema; it just needs to stop discarding it from same_schemas when the schema is cross-database — the
current PR #670 logic of if "." not in normalized_schema is correct there, just the composite-string-as-input problem is unique to the schema-wide path).

▎ Why this matters: the existing docstring in _get_schema_foreign_keys explicitly states that the reflected schema must be in same_schemas for non-default-schema reflection to work. PR
#670 breaks this invariant silently.

  • Quote referred_schema in _parse_fk_rows (small, defensive)
  • Remove double-parse in _query_all_columns_info (optional but recommended)
  • Add one regression test
  def test_get_foreign_keys_non_current_schema_same_db(engine_testaccount, ...):
      """FKs within a non-current schema should have referred_schema=None."""
      # Create a schema that differs from current_schema
      # Create two tables with an FK between them in that schema
      # Reflect FKs, assert referred_schema is None

This would have caught the same_schemas regression.

@sfc-gh-mraba sfc-gh-mraba force-pushed the jcieslak/cross-database-joins branch from 152c502 to 2885da4 Compare April 28, 2026 12:04
@sfc-gh-jcieslak sfc-gh-jcieslak merged commit 1044177 into main Apr 28, 2026
121 of 122 checks passed
@sfc-gh-jcieslak sfc-gh-jcieslak deleted the jcieslak/cross-database-joins branch April 28, 2026 14:04
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-948287: Best practice for doing cross database joins using CORE

2 participants