-
Notifications
You must be signed in to change notification settings - Fork 410
Closed
Labels
Description
Steps to reproduce: checkout this commit: JaySon-Huang@597dd86
I've added a failpoint exception_during_mpp_write_err_to_tunnel to mock this bug.
create table t (a int primary key clustered);
insert into test.t values (1),(2),(3),(4);
set tidb_enforce_mpp=1
-- 1. With no failpoint injected, running this SQL is OK, MPPTask can be released as expected
tidb> select * from (select * from t) a join (select * from t) b;
ch> select * from dt_tables where tidb_database='test' and tidb_table='t' \G
...
tidb_database: test
tidb_table: t
...
storage_stable_oldest_snapshot_lifetime: 0 <-- the snapshot will be released as expected
-- 2. With failpoint "exception_during_mpp_non_root_task_run" enabled
ch> DBGInvoke enable_fail_point(exception_during_mpp_non_root_task_run);
-- MPPTask can be released as expected too
tidb> select * from (select * from t) a join (select * from t) b;
ch> select * from dt_tables where tidb_database='test' and tidb_table='t' \G
...
tidb_database: test
tidb_table: t
...
storage_stable_oldest_snapshot_lifetime: 0 <-- the snapshot will be released as expected
-- 3. With failpoint "exception_during_mpp_non_root_task_run" and "exception_during_mpp_write_err_to_tunnel" enabled
ch> DBGInvoke enable_fail_point(exception_during_mpp_non_root_task_run);
ch> DBGInvoke enable_fail_point(exception_during_mpp_write_err_to_tunnel);
-- MPPTask can not be released, it will hold the resources of IStorage, and cause other trouble, such as:
-- TiFlash stucks after applying DDL operations
-- DeltaTree can not run GC on its delta data
tidb> select * from (select * from t) a join (select * from t) b;
ch> select * from dt_tables where tidb_database='test' and tidb_table='t' \G
...
tidb_database: test
tidb_table: t
...
storage_stable_oldest_snapshot_lifetime: 45.19153881 <-- the snapshot will not be released
...And we can see that ~MPPTask() is not called in case 3, there is no logging like "finish MPPTask", while case 1 and case 2 have.
JaySon-Huang@c4ba52f#diff-31a1170d1da0609fde7eb7068713665415014352587f2256fd15a29fe581b3caR302-R336
I guess these changes can resolve part of the problem, cause MPPTunnel::close won't throw exception like MPPTunnel::write when the tunnel get closed.
However, if other exceptions are thrown in MPPTunnel::close, then MPPTask still gets stuck and can not release resources of the Storage layer.
Reactions are currently unavailable