Skip to content

ddl: use advertise-addr as the instance address#43957

Merged
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
tangenta:add-index-fix-host
May 19, 2023
Merged

ddl: use advertise-addr as the instance address#43957
ti-chi-bot[bot] merged 4 commits intopingcap:masterfrom
tangenta:add-index-fix-host

Conversation

@tangenta
Copy link
Contributor

@tangenta tangenta commented May 18, 2023

What problem does this PR solve?

Issue Number: close #43983

Problem Summary:

When DDL owner is changed, the local checkpoint is mistakenly considered to be valid because the instance addresses are the same.

[checkpoint.go:289] ["[ddl-ingest] resume checkpoint"] ["job ID"=127] ["index ID"=30] ["local checkpoint"=74800000000000xxx] ["global checkpoint"=] ["previous instance"=[0.0.0.0:4000](http://0.0.0.0:4000/):/data] ["current instance"=[0.0.0.0:4000](http://0.0.0.0:4000/):/data]

What is changed and how it works?

  • Use "advertise-address" to identify the TiDB instance.
  • Fix the issue that the checkpoint may take no effect in partition tables.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 18, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Benjamin2037
  • wjhuang2016

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2023
@tangenta tangenta changed the title ddl: get instance addr from cluster instead of config.Host ddl: get instance addr from cluster instead of config.Host May 18, 2023
@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. and removed do-not-merge/needs-linked-issue labels May 18, 2023
@tangenta tangenta changed the title ddl: get instance addr from cluster instead of config.Host ddl: use advertise-addr as the instance address May 18, 2023
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 18, 2023
Copy link
Collaborator

@Benjamin2037 Benjamin2037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 19, 2023
@tangenta
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 19, 2023

This pull request has been accepted and is ready to merge.

DetailsCommit hash: b045e7d

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 19, 2023
@ti-chi-bot ti-chi-bot bot merged commit 5ebad25 into pingcap:master May 19, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #43990.

tangenta added a commit to ti-chi-bot/tidb that referenced this pull request May 22, 2023
ti-chi-bot bot pushed a commit that referenced this pull request May 22, 2023
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add test cases for the following previously uncovered scenarios:

1. TestIngestOwnerTransferEmptyPartition (pingcap#44265): Tests owner transfer
   with empty partitions ensures checkpoint contains partition ID.

2. TestIngestPartitionCheckpointRecovery (pingcap#43997/pingcap#44024): Tests that
   checkpoint correctly saves partition info for recovery.

3. TestIngestConcurrentJobCleanupRace (pingcap#44137/pingcap#44140): Tests parallel
   add index jobs don't cause panic from cleanup race.

4. TestIngestCancelCleanupOrder (pingcap#43323/pingcap#43326): Tests cancel during
   execution doesn't cause nil pointer panic.

5. TestIngestGCSafepointBlocking (pingcap#40074/pingcap#40081): Tests add index
   uses correct TS and blocks GC safepoint advancement.

6. TestCheckpointInstanceAddrValidation (pingcap#43983/pingcap#43957): Tests
   checkpoint instance address validation works correctly.

7. TestCheckpointPhysicalIDValidation: Tests checkpoint physical
   table ID validation during recovery.

Co-Authored-By: Warp <agent@warp.dev>
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add tests to verify checkpoint validation in add index ingest mode:

- TestCheckpointInstanceAddrValidation: Verify instance address uses
  unique identifier (AdvertiseAddress + TempDir) instead of just
  host:port, covering pingcap#43983 and pingcap#43957

- TestCheckpointPhysicalIDValidation: Verify checkpoint physical_id
  is always a valid partition ID by parsing from reorg_meta JSON

- TestAddIndexWithEmptyPartitions: Verify all partitions (including
  empty ones) are correctly iterated during add index, using
  afterUpdatePartitionReorgInfo failpoint to capture each partition
  switch, covering pingcap#44265

Also clean up redundant/slow tests in realtikvtest:
- Remove TestIngestOwnerTransferEmptyPartition (51s, too slow)
- Remove TestIngestPartitionCheckpointRecovery (redundant)
- Simplify TestIngestCancelCleanupOrder

Issue Number: close pingcap#65420
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add test cases for the following previously uncovered scenarios:

1. TestIngestOwnerTransferEmptyPartition (pingcap#44265): Tests owner transfer
   with empty partitions ensures checkpoint contains partition ID.

2. TestIngestPartitionCheckpointRecovery (pingcap#43997/pingcap#44024): Tests that
   checkpoint correctly saves partition info for recovery.

3. TestIngestConcurrentJobCleanupRace (pingcap#44137/pingcap#44140): Tests parallel
   add index jobs don't cause panic from cleanup race.

4. TestIngestCancelCleanupOrder (pingcap#43323/pingcap#43326): Tests cancel during
   execution doesn't cause nil pointer panic.

5. TestIngestGCSafepointBlocking (pingcap#40074/pingcap#40081): Tests add index
   uses correct TS and blocks GC safepoint advancement.

6. TestCheckpointInstanceAddrValidation (pingcap#43983/pingcap#43957): Tests
   checkpoint instance address validation works correctly.

7. TestCheckpointPhysicalIDValidation: Tests checkpoint physical
   table ID validation during recovery.
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add tests to verify checkpoint validation in add index ingest mode:

- TestCheckpointInstanceAddrValidation: Verify instance address uses
  unique identifier (AdvertiseAddress + TempDir) instead of just
  host:port, covering pingcap#43983 and pingcap#43957

- TestCheckpointPhysicalIDValidation: Verify checkpoint physical_id
  is always a valid partition ID by parsing from reorg_meta JSON

- TestAddIndexWithEmptyPartitions: Verify all partitions (including
  empty ones) are correctly iterated during add index, using
  afterUpdatePartitionReorgInfo failpoint to capture each partition
  switch, covering pingcap#44265

Also clean up redundant/slow tests in realtikvtest:
- Remove TestIngestOwnerTransferEmptyPartition (51s, too slow)
- Remove TestIngestPartitionCheckpointRecovery (redundant)
- Simplify TestIngestCancelCleanupOrder

Issue Number: close pingcap#65420
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add test cases for the following previously uncovered scenarios:

1. TestIngestOwnerTransferEmptyPartition (pingcap#44265): Tests owner transfer
   with empty partitions ensures checkpoint contains partition ID.

2. TestIngestPartitionCheckpointRecovery (pingcap#43997/pingcap#44024): Tests that
   checkpoint correctly saves partition info for recovery.

3. TestIngestConcurrentJobCleanupRace (pingcap#44137/pingcap#44140): Tests parallel
   add index jobs don't cause panic from cleanup race.

4. TestIngestCancelCleanupOrder (pingcap#43323/pingcap#43326): Tests cancel during
   execution doesn't cause nil pointer panic.

5. TestIngestGCSafepointBlocking (pingcap#40074/pingcap#40081): Tests add index
   uses correct TS and blocks GC safepoint advancement.

6. TestCheckpointInstanceAddrValidation (pingcap#43983/pingcap#43957): Tests
   checkpoint instance address validation works correctly.

7. TestCheckpointPhysicalIDValidation: Tests checkpoint physical
   table ID validation during recovery.
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add tests to verify checkpoint validation in add index ingest mode:

- TestCheckpointInstanceAddrValidation: Verify instance address uses
  unique identifier (AdvertiseAddress + TempDir) instead of just
  host:port, covering pingcap#43983 and pingcap#43957

- TestCheckpointPhysicalIDValidation: Verify checkpoint physical_id
  is always a valid partition ID by parsing from reorg_meta JSON

- TestAddIndexWithEmptyPartitions: Verify all partitions (including
  empty ones) are correctly iterated during add index, using
  afterUpdatePartitionReorgInfo failpoint to capture each partition
  switch, covering pingcap#44265

Also clean up redundant/slow tests in realtikvtest:
- Remove TestIngestOwnerTransferEmptyPartition (51s, too slow)
- Remove TestIngestPartitionCheckpointRecovery (redundant)
- Simplify TestIngestCancelCleanupOrder

Issue Number: close pingcap#65420
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add tests to verify checkpoint validation in add index ingest mode:

- TestCheckpointInstanceAddrValidation: Verify instance address uses
  unique identifier (AdvertiseAddress + TempDir) instead of just
  host:port, covering pingcap#43983 and pingcap#43957

- TestCheckpointPhysicalIDValidation: Verify checkpoint physical_id
  is always a valid partition ID by parsing from reorg_meta JSON

- TestAddIndexWithEmptyPartitions: Verify all partitions (including
  empty ones) are correctly iterated during add index, using
  afterUpdatePartitionReorgInfo failpoint to capture each partition
  switch, covering pingcap#44265

Also clean up redundant/slow tests in realtikvtest:
- Remove TestIngestOwnerTransferEmptyPartition (51s, too slow)
- Remove TestIngestPartitionCheckpointRecovery (redundant)
- Simplify TestIngestCancelCleanupOrder

Issue Number: close pingcap#65420
wjhuang2016 added a commit to wjhuang2016/tidb that referenced this pull request Jan 5, 2026
Add tests to verify checkpoint validation in add index ingest mode:

- TestCheckpointInstanceAddrValidation: Verify instance address uses
  unique identifier (AdvertiseAddress + TempDir) instead of just
  host:port, covering pingcap#43983 and pingcap#43957

- TestCheckpointPhysicalIDValidation: Verify checkpoint physical_id
  is always a valid partition ID by parsing from reorg_meta JSON

- TestAddIndexWithEmptyPartitions: Verify all partitions (including
  empty ones) are correctly iterated during add index, using
  afterUpdatePartitionReorgInfo failpoint to capture each partition
  switch, covering pingcap#44265

Also clean up redundant/slow tests in realtikvtest:
- Remove TestIngestOwnerTransferEmptyPartition (51s, too slow)
- Remove TestIngestPartitionCheckpointRecovery (redundant)
- Simplify TestIngestCancelCleanupOrder

Issue Number: close pingcap#65420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checkpoint is unexpectedly used when DDL owner is changed

4 participants