Skip to content

ddl: return early when encounter error in BackendCtx.Unregister() (#43326)#43330

Merged
ti-chi-bot[bot] merged 2 commits intopingcap:release-7.1from
ti-chi-bot:cherry-pick-43326-to-release-7.1
Jul 12, 2023
Merged

ddl: return early when encounter error in BackendCtx.Unregister() (#43326)#43330
ti-chi-bot[bot] merged 2 commits intopingcap:release-7.1from
ti-chi-bot:cherry-pick-43326-to-release-7.1

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #43326

What problem does this PR solve?

Issue Number: close #43323

Problem Summary:

[2023/04/23 15:41:46.619 +08:00] [WARN] [backfilling.go:496] ["[ddl] backfill worker failed"] ["job ID"=95] ["result next key"=] [error="[ddl:8214]Cancelled DDL job"]
[2023/04/23 15:41:46.863 +08:00] [WARN] [backfilling.go:450] ["[ddl] backfill worker handle tasks failed"] ["total added count"=18078289] ["start key"=74800000000000005a5f728000000000000001] ["task failed error"="[ddl:8214]Cancelled DDL job"]
[2023/04/23 15:41:46.863 +08:00] [WARN] [index.go:1015] ["[ddl] run add index job failed, convert job to rollback"] [job="ID:95, Type:add index, State:cancelling, SchemaState:write reorganization, SchemaID:88, TableID:90, RowCount:13332166, ArgLen:6, start time: 2023-04-23 15:41:25.554 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:440987991671832583, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]
[2023/04/23 15:41:46.899 +08:00] [INFO] [backend.go:305] ["engine close start"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75]
[2023/04/23 15:41:46.899 +08:00] [ERROR] [backend.go:307] ["engine close failed"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75] [takeTime=77.083µs] [error="open /tmp/tidb/tmp_ddl-4000/95/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/ff91b8bd-8af7-4f4b-88ca-e64ed4f78189.sst: no such file or directory"]
[2023/04/23 15:41:46.899 +08:00] [ERROR] [engine.go:110] ["[ddl-ingest] close engine error"] [error="open /tmp/tidb/tmp_ddl-4000/95/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/ff91b8bd-8af7-4f4b-88ca-e64ed4f78189.sst: no such file or directory"] ["job ID"=95] ["index ID"=3]
[2023/04/23 15:41:46.899 +08:00] [ERROR] [engine.go:116] ["[ddl-ingest] close writer error"] [error="open /tmp/tidb/tmp_ddl-4000/95/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/e8e8fc52-3e40-4a30-b883-ad23ae05d67f.sst: no such file or directory"] ["job ID"=95] ["index ID"=3]
[2023/04/23 15:41:46.904 +08:00] [ERROR] [misc.go:116] ["panic in the recoverable goroutine"] [label=ddl-worker] [funcInfo="worker 5, tp add index runDDLJob"] [r="\"invalid memory address or nil pointer dereference\""] [stack="github.com/pingcap/tidb/util.Recover\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/util/misc.go:120\nruntime.gopanic\n\t/usr/local/go1.20.3/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go1.20.3/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go1.20.3/src/runtime/signal_unix.go:837\ngithub.com/pingcap/tidb/br/pkg/lightning/backend.(*ClosedEngine).Cleanup\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/lightning/backend/backend.go:362\ngithub.com/pingcap/tidb/ddl/ingest.(*engineInfo).Clean\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine.go:120\ngithub.com/pingcap/tidb/ddl/ingest.(*litBackendCtx).Unregister\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine_mgr.go:86\ngithub.com/pingcap/tidb/ddl/ingest.(*litBackendCtx).unregisterAll\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/engine_mgr.go:110\ngithub.com/pingcap/tidb/ddl/ingest.(*litBackendCtxMgr).Unregister\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ingest/backend_mgr.go:151\ngithub.com/pingcap/tidb/ddl.runIngestReorgJob\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:926\ngithub.com/pingcap/tidb/ddl.doReorgWorkForCreateIndex\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:866\ngithub.com/pingcap/tidb/ddl.(*worker).onCreateIndex\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/index.go:683\ngithub.com/pingcap/tidb/ddl.rollingbackAddIndex\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/rollingback.go:250\ngithub.com/pingcap/tidb/ddl.convertJob2RollbackJob\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/rollingback.go:393\ngithub.com/pingcap/tidb/ddl.(*worker).runDDLJob\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:960\ngithub.com/pingcap/tidb/ddl.(*worker).HandleDDLJobTable\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_worker.go:752\ngithub.com/pingcap/tidb/ddl.(*ddl).delivery2worker.func1\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/job_table.go:281\ngithub.com/pingcap/tidb/util.(*WaitGroupWrapper).Run.func1\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/util/wait_group_wrapper.go:154"]
[2023/04/23 15:41:46.904 +08:00] [INFO] [ddl_worker.go:580] ["[ddl] finish DDL job"] [worker="worker 5, tp add index"] [job="ID:95, Type:add index, State:cancelled, SchemaState:delete only, SchemaID:88, TableID:90, RowCount:13332166, ArgLen:3, start time: 2023-04-23 15:41:25.554 +0800 CST, Err:<nil>, ErrCount:1, SnapshotVersion:440987991671832583, UniqueWarnings:0"]
[2023/04/23 15:41:46.909 +08:00] [ERROR] [conn.go:1039] ["connection running loop panic"] [conn=5172439147616403883] [lastSQL="alter table sbtest1 add index idx(c, pad)"] [err="When the state is JobStateRollbackDone or JobStateCancelled, historyJob.Error should never be nil"] [stack="github.com/pingcap/tidb/server.(*clientConn).Run.func1\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1042\nruntime.gopanic\n\t/usr/local/go1.20.3/src/runtime/panic.go:884\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec.func1\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:459\nruntime.gopanic\n\t/usr/local/go1.20.3/src/runtime/panic.go:884\ngithub.com/pingcap/tidb/ddl.(*ddl).DoDDLJob\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl.go:1153\ngithub.com/pingcap/tidb/ddl.(*ddl).createIndex\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:6793\ngithub.com/pingcap/tidb/ddl.(*ddl).AlterTable\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/ddl/ddl_api.go:3410\ngithub.com/pingcap/tidb/executor.(*DDLExec).executeAlterTable\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:384\ngithub.com/pingcap/tidb/executor.(*DDLExec).Next\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/ddl.go:148\ngithub.com/pingcap/tidb/executor.Next\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:326\ngithub.com/pingcap/tidb/executor.(*ExecStmt).next\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:1194\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:939\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:764\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:564\ngithub.com/pingcap/tidb/session.runStmt\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2348\ngithub.com/pingcap/tidb/session.(*session).ExecuteStmt\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2205\ngithub.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:252\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2061\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1852\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1339\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1120\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/Users/pingcap/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:677"]
  1. When the job is canceled, the ddl worker converts the job to a corresponding DROP INDEX as a rollback job.
  2. In convertAddIdxJob2RollbackJob , cleanupLocalIndexData() is invoked to remove the lightning sort path.
  3. In BackendCtx.Unregister(), TiDB fails to close the engine because the sort path is removed.
  4. No error is returned despite the failure of closing.
  5. "Nil pointer dereference" happens, and the DDL worker panic.
  6. The index state stays in delete-only, however the job is state cancelled.

What is changed and how it works?

To address step 2 and step 4, this PR

  • removes cleanupLocalIndexData() because the sort path removing is considered by Unregister().
  • returns early when it encounters an error in BackendCtx.Unregister()

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    [2023/04/23 17:25:27.259 +08:00] [WARN] [backfilling.go:496] ["[ddl] backfill worker failed"] ["job ID"=94] ["result next key"=] [error="[ddl:8214]Cancelled DDL job"]
    [2023/04/23 17:25:27.259 +08:00] [WARN] [backfilling.go:450] ["[ddl] backfill worker handle tasks failed"] ["total added count"=10271756] ["start key"=74800000000000005a5f728000000000000001] ["task failed error"="[ddl:8214]Cancelled DDL job"]
    [2023/04/23 17:25:27.259 +08:00] [WARN] [index.go:1015] ["[ddl] run add index job failed, convert job to rollback"] [job="ID:94, Type:add index, State:cancelling, SchemaState:write reorganization, SchemaID:88, TableID:90, RowCount:5199243, ArgLen:6, start time: 2023-04-23 17:25:15.49 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:440989624812175364, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]
    [2023/04/23 17:25:27.266 +08:00] [INFO] [backend.go:305] ["engine close start"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75]
    [2023/04/23 17:25:28.587 +08:00] [INFO] [engine.go:1485] ["compact sst"] [fileCount=16] [size=1073745568] [count=4511536] [cost=7.5140025s] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/e656c8a7-e075-4c2b-98ca-2b80e14ec8b6.sst]
    [2023/04/23 17:25:28.587 +08:00] [INFO] [engine.go:811] ["write data to local DB"] [size=1073745568] [kvs=4511536] [files=1] [sstFileSize=861974666] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/e656c8a7-e075-4c2b-98ca-2b80e14ec8b6.sst] [firstKey=74800000000000005A5F698000000000000003013030303030303439FF3235332D35313830FF353831353132362DFF3230323036383330FF3231372D36343337FF343533333530312DFF3730383239383435FF3036382D32353532FF353637393233342DFF3637303031373138FF3830332D34333132FF333338383238382DFF3535353932333934FF3139312D31353438FF3132323739393200FE013738313433353036FF3332322D35353335FF333632393233312DFF3135313439393532FF3935312D35393438FF383138373230302DFF3332383433383333FF3336310000000000FA0380000000001B2258] [lastKey=74800000000000005A5F698000000000000003013939393939393833FF3538382D31363733FF353037363638372DFF3337333430363434FF3037352D30393134FF323032313832382DFF3736373733353437FF3035342D34393835FF323138343932332DFF3733313632303332FF3630362D34303434FF303936343431342DFF3037383436383536FF3735392D31373536FF3537323633323300FE013737353233343039FF3431312D35323731FF383337303230302DFF3238313434393330FF3434312D35343833FF303236373933372DFF3436353030343038FF3131340000000000FA038000000000202585]
    [2023/04/23 17:25:28.823 +08:00] [INFO] [engine.go:1485] ["compact sst"] [fileCount=5] [size=191948190] [count=806505] [cost=1.12589575s] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/a71b3d29-c09c-48b9-bff7-b596fd31bdd3.sst]
    [2023/04/23 17:25:28.823 +08:00] [INFO] [engine.go:811] ["write data to local DB"] [size=191948190] [kvs=806505] [files=1] [sstFileSize=154764907] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/a71b3d29-c09c-48b9-bff7-b596fd31bdd3.sst] [firstKey=74800000000000005A5F698000000000000003013030303030303132FF3137352D39343639FF353033353939362DFF3132383834333238FF3230342D35373839FF383538333031312DFF3236343135353336FF3138372D39383539FF333136393438352DFF3235363330363631FF3831332D38333637FF323736303836322DFF3437373334383931FF3831322D38323235FF3035383130313400FE013030393035303733FF3339322D33303431FF393638383532332DFF3230363230313237FF3036382D38363730FF303938353435332DFF3036313030303431FF3433380000000000FA03800000000091FBAD] [lastKey=74800000000000005A5F698000000000000003013939393939393930FF3733342D33333332FF363932363031382DFF3137323039303637FF3631302D35373936FF303439353934352DFF3530323037353833FF3435322D30383632FF363031343830352DFF3430373734353439FF3437352D31353331FF383639343332332DFF3837313539383731FF3331342D35313638FF3634383736373000FE013031353732333037FF3533362D38383233FF363232343637352DFF3438353334343735FF3036352D38393932FF393939303037382DFF3631393639383038FF3132310000000000FA0380000000008CFBDE]
    [2023/04/23 17:25:31.157 +08:00] [INFO] [engine.go:1485] ["compact sst"] [fileCount=16] [size=1073745568] [count=4511536] [cost=5.827834334s] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/cd612571-de3b-4c62-bcca-122827de333d.sst]
    [2023/04/23 17:25:31.157 +08:00] [INFO] [engine.go:811] ["write data to local DB"] [size=1073745568] [kvs=4511536] [files=1] [sstFileSize=861966051] [file=/tmp/tidb/tmp_ddl-4000/94/462b4eef-7a5c-5d2f-b4d3-35fd1b503f75.sst/cd612571-de3b-4c62-bcca-122827de333d.sst] [firstKey=74800000000000005A5F698000000000000003013030303030303435FF3131382D35343633FF363237333636312DFF3333353136333333FF3331392D34343538FF333236373334342DFF3332373736383830FF3937352D38333638FF323538323139352DFF3334343736333334FF3032302D35313637FF383537373935362DFF3838363035363932FF3635342D37343732FF3632373637393500FE013530393731303839FF3237382D38323837FF363330353539382DFF3839343130333431FF3032332D30313339FF343238353038312DFF3236323731353030FF3130340000000000FA03800000000059A701] [lastKey=74800000000000005A5F698000000000000003013939393939393733FF3830372D34363738FF393336383931382DFF3936323130393938FF3533352D36353537FF363033393432312DFF3634393735393835FF3239392D39303238FF393737353039352DFF3730393832303835FF3439322D39323433FF313839383033382DFF3039373139383331FF3030342D37333134FF3433353832373000FE013230393332313430FF3835312D30373934FF303133353035322DFF3731363835333637FF3137392D38363031FF323536383432312DFF3037353331343734FF3038310000000000FA0380000000008018B4]
    [2023/04/23 17:25:31.257 +08:00] [INFO] [backend.go:307] ["engine close completed"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75] [takeTime=3.9908425s] []
    [2023/04/23 17:25:31.257 +08:00] [INFO] [backend.go:362] ["cleanup start"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75]
    [2023/04/23 17:25:31.327 +08:00] [INFO] [backend.go:364] ["cleanup completed"] [engineTag=sbtest1:3] [engineUUID=462b4eef-7a5c-5d2f-b4d3-35fd1b503f75] [takeTime=69.791833ms] []
    
    And DDL cancels the ADD INDEX job successfully.
    mysql> alter table sbtest1 add index idx(c, pad);
    ERROR 8214 (HY000): Cancelled DDL job
    
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 23, 2023

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-7.1 This PR is cherry-picked to release-7.1 from a source PR. labels Apr 23, 2023
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/cherry-pick-not-approved labels Apr 23, 2023
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2023
@ti-chi-bot ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 12, 2023
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 12, 2023
@ti-chi-bot ti-chi-bot added the cherry-pick-approved Cherry pick PR approved by release team. label Jul 12, 2023
@ti-chi-bot ti-chi-bot bot added the lgtm label Jul 12, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hawkingrei, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [hawkingrei,zimulala]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 12, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 12, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-07-12 03:28:51.404789126 +0000 UTC m=+475423.175127838: ☑️ agreed by zimulala.
  • 2023-07-12 04:11:02.14332763 +0000 UTC m=+477953.913666340: ☑️ agreed by hawkingrei.

@ti-chi-bot ti-chi-bot bot merged commit 385424e into pingcap:release-7.1 Jul 12, 2023
@ti-chi-bot ti-chi-bot added cherry-pick-approved Cherry pick PR approved by release team. and removed cherry-pick-approved Cherry pick PR approved by release team. labels Jul 24, 2023
@ti-chi-bot ti-chi-bot deleted the cherry-pick-43326-to-release-7.1 branch April 15, 2025 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. type/cherry-pick-for-release-7.1 This PR is cherry-picked to release-7.1 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants