Skip to content

Improve partition indexing resilience#968

Merged
rgalanakis merged 1 commit intomainfrom
partition-index-optimize
Jun 17, 2025
Merged

Improve partition indexing resilience#968
rgalanakis merged 1 commit intomainfrom
partition-index-optimize

Conversation

@rgalanakis
Copy link
Contributor

Indexing a partitioned table was problematic,
since it could fail and wasn't really idempotent.

This also fixes potential invalid indices when
migrating a non-partitioned table,
and in general should keep all indices tidy.

From the docs:

For partitioned tables, we need to be careful with how we create the indexes. If the overall function gets interrupted (the process gets killed, etc.), we can be left with an invalid index.
We can minimize this possibility by doing the following:

  • Create an index for each partition, if it does not exist already.
    • This is done outside a transaction, since it may happen concurrently.
  • Then we open a transaction for the final steps, which are all very fast/metadata-only.
    • A transaction here ensures that we only have the 'parent' index in a successful, completed state.
  • Create the 'parent' index 'ONLY ON' the parent table.
    • This is a very fast operation.
  • Attach all the partition indexes to the parent index. This is metadata-only so also very fast.
    • At this point, the parent index should be valid.
  • These steps mean that, at any point, the process can be interrupted and resumed, without losing progress:
    • The concurrent index creation for the partitions can fail, and result in an invalid index; but the next call to update the schema will drop invalid indexes for the table. Note that successfully created, but unattached, indexes for a partition are valid.
    • The parent index creation, and attaching partitions to it, are atomic.

Indexing a partitioned table was problematic,
since it could fail and wasn't really idempotent.

This also fixes potential invalid indices when
migrating a non-partitioned table,
and in general should keep all indices tidy.

From the docs:

For partitioned tables, we need to be careful with how we create the indexes.
If the overall function gets interrupted (the process gets killed, etc.),
we can be left with an invalid index.
We can minimize this possibility by doing the following:
- Create an index for each partition, if it does not exist already.
  - This is done outside a transaction, since it may happen concurrently.
- Then we open a transaction for the final steps, which are all very fast/metadata-only.
  - A transaction here ensures that we only have the 'parent' index in a successful, completed state.
- Create the 'parent' index 'ONLY ON' the parent table.
  - This is a very fast operation.
- Attach all the partition indexes to the parent index. This is metadata-only so also very fast.
  - At this point, the parent index should be valid.
- These steps mean that, at any point, the process can be interrupted and resumed,
  without losing progress:
  - The concurrent index creation for the partitions can fail, and result in an invalid index;
    but the next call to update the schema will drop invalid indexes for the table.
    Note that successfully created, but unattached, indexes for a partition are valid.
  - The parent index creation, and attaching partitions to it, are atomic.
@rgalanakis rgalanakis force-pushed the partition-index-optimize branch from c6b0fb5 to 374b567 Compare June 17, 2025 15:21
@codecov
Copy link

codecov bot commented Jun 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.58%. Comparing base (b4f0daf) to head (374b567).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #968      +/-   ##
==========================================
- Coverage   97.62%   97.58%   -0.04%     
==========================================
  Files         490      490              
  Lines       31098    31117      +19     
==========================================
+ Hits        30358    30365       +7     
- Misses        740      752      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rgalanakis rgalanakis merged commit 6bf71f9 into main Jun 17, 2025
4 checks passed
@rgalanakis rgalanakis deleted the partition-index-optimize branch June 17, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant