Skip to content

Conversation

@vinlee19
Copy link
Contributor

@vinlee19 vinlee19 commented Sep 13, 2025

What problem does this PR solve?

Following PR #55404 which addressed incorrect schema results for Paimon tables, this PR further optimizes the Paimon time travel implementation by removing redundant code and fixing the critical issue where schema ID and snapshot ID were fetched separately, causing consistency problems and unnecessary I/O overhead.

Key Problem Solved

The core challenge in Paimon time travel is efficiently constructing tables with specified snapshot ID and schema ID. Previously, these were fetched separately, causing unnecessary I/O overhead and potential consistency issues.

Major Improvements

  1. Performance Optimization
  • Unified schema and snapshot retrieval: Combined separate API calls for schema ID and snapshot ID into a single operation
  • Removed redundant metadata fetches: Eliminated duplicate calls to Paimon metadata store
  • Optimized branch schema handling: Streamlined branch-specific schema resolution
  1. Bug Fixes
  • Fixed timezone-related query errors: Resolved incorrect results when using Paimon time travel syntax with non-UTC timezones
  • Fixed branch schema consistency: Ensured snapshot ID and schema ID are always fetched atomically to prevent mismatched metadata
  1. Enhanced Timestamp Format Support for FOR TIME AS OF

Now supports the following timestamp formats:

  • YYYY-MM-DD HH:MM:SS.SSS - Full timestamp with milliseconds (e.g., 2024-01-15 10:30:45.123)
  • YYYY-MM-DD HH:MM:SS - Timestamp with seconds precision (e.g., 2024-01-15 10:30:45)
  • YYYY-MM-DD - Date only format (defaults to 00:00:00.000) (e.g., 2024-01-15)

Example usage:

-- Using different timestamp formats
SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15 10:30:45.123";
SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15 10:30:45";
SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15";

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@vinlee19
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34863 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

------ Round 1 ----------------------------------
q1	17603	5225	5075	5075
q2	1978	348	209	209
q3	10218	1317	723	723
q4	10211	1021	537	537
q5	7518	2422	2360	2360
q6	186	163	140	140
q7	926	780	634	634
q8	9369	1344	1212	1212
q9	6886	5147	5158	5147
q10	6951	2416	1993	1993
q11	516	306	286	286
q12	355	365	233	233
q13	17803	3644	3081	3081
q14	250	235	216	216
q15	582	493	485	485
q16	1014	994	965	965
q17	615	866	358	358
q18	7380	7125	7061	7061
q19	1526	950	574	574
q20	349	349	230	230
q21	4148	2518	2359	2359
q22	1073	1019	985	985
Total cold run time: 107457 ms
Total hot run time: 34863 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5476	5132	5105	5105
q2	254	333	232	232
q3	2150	2668	2336	2336
q4	1361	1761	1302	1302
q5	4221	4501	4514	4501
q6	215	178	134	134
q7	2092	1971	1829	1829
q8	2645	2596	2551	2551
q9	7376	7345	7309	7309
q10	3143	3296	2866	2866
q11	570	520	534	520
q12	687	763	615	615
q13	3477	3898	3493	3493
q14	287	311	273	273
q15	554	499	485	485
q16	1033	1154	1062	1062
q17	1275	1552	1388	1388
q18	8045	7830	7741	7741
q19	877	913	966	913
q20	1899	1953	1817	1817
q21	4748	4349	4213	4213
q22	1073	1037	1026	1026
Total cold run time: 53458 ms
Total hot run time: 51711 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188750 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

query1	1078	431	414	414
query2	6549	1761	1702	1702
query3	6751	222	233	222
query4	26091	23162	23138	23138
query5	4403	632	477	477
query6	334	243	232	232
query7	4655	520	298	298
query8	316	257	254	254
query9	8698	2637	2692	2637
query10	528	367	311	311
query11	15479	15643	14960	14960
query12	174	116	113	113
query13	1689	572	442	442
query14	11072	9263	9328	9263
query15	220	209	187	187
query16	7706	700	476	476
query17	1276	756	622	622
query18	2030	419	330	330
query19	203	201	169	169
query20	132	128	118	118
query21	215	136	116	116
query22	4129	4303	3949	3949
query23	33997	33193	33177	33177
query24	8517	2424	2448	2424
query25	575	512	463	463
query26	1247	275	164	164
query27	2745	505	355	355
query28	4326	2237	2223	2223
query29	778	609	487	487
query30	280	222	200	200
query31	945	796	720	720
query32	79	73	73	73
query33	587	376	328	328
query34	800	857	535	535
query35	832	829	742	742
query36	993	1010	903	903
query37	110	111	85	85
query38	3479	3582	3535	3535
query39	1511	1482	1438	1438
query40	213	134	117	117
query41	64	60	64	60
query42	128	114	124	114
query43	530	544	468	468
query44	1353	844	845	844
query45	187	178	170	170
query46	864	1024	654	654
query47	1775	1799	1734	1734
query48	396	411	320	320
query49	763	512	408	408
query50	644	699	408	408
query51	3965	3922	3921	3921
query52	107	109	99	99
query53	240	266	194	194
query54	601	582	543	543
query55	89	85	85	85
query56	307	312	307	307
query57	1189	1185	1108	1108
query58	290	274	268	268
query59	2509	2714	2651	2651
query60	346	352	317	317
query61	186	158	163	158
query62	817	729	671	671
query63	229	195	204	195
query64	4482	1153	826	826
query65	4054	4012	3977	3977
query66	1110	453	366	366
query67	15483	15341	15055	15055
query68	8007	947	577	577
query69	491	312	283	283
query70	1330	1359	1319	1319
query71	546	363	313	313
query72	6045	5076	5176	5076
query73	725	681	358	358
query74	9252	8843	8704	8704
query75	4005	3331	2795	2795
query76	3678	1156	746	746
query77	809	401	318	318
query78	9597	9790	8904	8904
query79	2479	862	591	591
query80	686	548	552	548
query81	500	267	222	222
query82	476	176	135	135
query83	277	261	248	248
query84	271	105	95	95
query85	922	458	422	422
query86	390	300	303	300
query87	3798	3728	3638	3638
query88	3778	2222	2233	2222
query89	406	333	295	295
query90	1875	223	213	213
query91	178	164	140	140
query92	83	71	66	66
query93	2202	1006	650	650
query94	686	449	287	287
query95	387	321	310	310
query96	482	578	274	274
query97	2924	2947	2876	2876
query98	241	219	212	212
query99	1346	1404	1293	1293
Total cold run time: 277322 ms
Total hot run time: 188750 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.06
query3	0.26	0.08	0.08
query4	1.60	0.11	0.11
query5	0.28	0.26	0.25
query6	1.18	0.64	0.65
query7	0.03	0.03	0.02
query8	0.05	0.05	0.05
query9	0.62	0.54	0.52
query10	0.58	0.58	0.58
query11	0.17	0.11	0.11
query12	0.14	0.12	0.12
query13	0.63	0.64	0.63
query14	1.02	1.03	1.03
query15	0.87	0.86	0.86
query16	0.40	0.40	0.39
query17	1.07	1.05	1.02
query18	0.21	0.20	0.19
query19	1.93	1.86	1.83
query20	0.02	0.01	0.01
query21	15.40	0.95	0.60
query22	0.76	1.15	0.66
query23	15.01	1.37	0.64
query24	7.15	1.58	0.47
query25	0.50	0.18	0.14
query26	0.66	0.16	0.13
query27	0.06	0.05	0.06
query28	8.73	0.88	0.44
query29	12.56	3.95	3.23
query30	0.29	0.14	0.11
query31	2.83	0.61	0.39
query32	3.24	0.58	0.47
query33	3.16	3.07	3.11
query34	16.05	5.47	4.93
query35	4.93	4.91	4.92
query36	0.69	0.52	0.50
query37	0.10	0.07	0.08
query38	0.07	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.14
query41	0.09	0.03	0.02
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 103.81 s
Total hot run time: 29.65 s

@shuke987
Copy link
Collaborator

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34769 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

------ Round 1 ----------------------------------
q1	17703	5195	5079	5079
q2	2010	364	216	216
q3	10197	1286	713	713
q4	10236	1031	550	550
q5	7554	2412	2380	2380
q6	181	168	136	136
q7	949	763	646	646
q8	9358	1276	1136	1136
q9	7011	5210	5116	5116
q10	6912	2390	1987	1987
q11	486	296	286	286
q12	342	368	234	234
q13	17777	3651	3040	3040
q14	247	243	216	216
q15	552	501	497	497
q16	1003	1003	950	950
q17	591	860	378	378
q18	7560	7043	7081	7043
q19	1222	943	556	556
q20	350	351	230	230
q21	3803	3209	2397	2397
q22	1068	1027	983	983
Total cold run time: 107112 ms
Total hot run time: 34769 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5166	5129	5115	5115
q2	249	358	226	226
q3	2245	2716	2291	2291
q4	1405	1790	1338	1338
q5	4218	4576	4573	4573
q6	218	172	130	130
q7	2026	1978	1793	1793
q8	2680	2630	2573	2573
q9	7445	7398	7285	7285
q10	3072	3306	2945	2945
q11	577	526	505	505
q12	726	791	604	604
q13	3496	3895	3568	3568
q14	286	308	297	297
q15	526	503	481	481
q16	1036	1138	1082	1082
q17	1123	1466	1463	1463
q18	7972	7724	7578	7578
q19	813	830	929	830
q20	2011	2139	1921	1921
q21	5075	4566	4331	4331
q22	1116	1040	1004	1004
Total cold run time: 53481 ms
Total hot run time: 51933 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188828 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

query1	1077	476	414	414
query2	6593	1691	1732	1691
query3	6750	218	220	218
query4	26313	23600	23149	23149
query5	4874	617	493	493
query6	337	229	222	222
query7	4673	522	306	306
query8	298	254	239	239
query9	8678	2637	2644	2637
query10	521	355	278	278
query11	15632	14933	14752	14752
query12	166	119	114	114
query13	1660	595	470	470
query14	10892	9078	9182	9078
query15	214	198	181	181
query16	7750	671	492	492
query17	1181	756	623	623
query18	2035	430	348	348
query19	217	195	171	171
query20	128	123	118	118
query21	213	135	115	115
query22	4105	4191	4185	4185
query23	33823	33174	33140	33140
query24	8478	2389	2385	2385
query25	587	522	468	468
query26	1236	277	167	167
query27	2729	524	370	370
query28	4379	2260	2279	2260
query29	794	648	512	512
query30	334	222	201	201
query31	925	833	733	733
query32	84	76	69	69
query33	594	372	351	351
query34	787	846	513	513
query35	813	824	746	746
query36	992	1017	921	921
query37	115	107	87	87
query38	3516	3505	3516	3505
query39	1497	1472	1508	1472
query40	215	137	120	120
query41	66	62	64	62
query42	131	115	112	112
query43	509	527	486	486
query44	1334	851	844	844
query45	183	173	172	172
query46	860	1027	648	648
query47	1841	1809	1742	1742
query48	394	428	324	324
query49	777	494	414	414
query50	649	699	401	401
query51	4036	3998	3909	3909
query52	112	110	101	101
query53	236	271	190	190
query54	605	592	530	530
query55	91	90	86	86
query56	322	323	309	309
query57	1202	1216	1121	1121
query58	282	286	282	282
query59	2613	2628	2580	2580
query60	356	355	343	343
query61	197	184	204	184
query62	806	757	667	667
query63	237	197	196	196
query64	4500	1127	831	831
query65	4079	3972	4022	3972
query66	1095	442	364	364
query67	15823	15456	15353	15353
query68	8090	939	591	591
query69	486	316	286	286
query70	1405	1303	1271	1271
query71	556	341	315	315
query72	6036	5017	5076	5017
query73	738	665	364	364
query74	8850	9436	8708	8708
query75	3933	3356	2878	2878
query76	3739	1153	736	736
query77	826	406	320	320
query78	9863	9930	8844	8844
query79	2139	817	601	601
query80	631	561	496	496
query81	508	268	232	232
query82	520	171	128	128
query83	267	259	255	255
query84	262	107	92	92
query85	891	519	425	425
query86	386	298	291	291
query87	3807	3747	3563	3563
query88	3626	2215	2217	2215
query89	392	318	294	294
query90	1891	218	213	213
query91	172	167	133	133
query92	86	71	64	64
query93	1641	995	633	633
query94	664	429	321	321
query95	394	317	305	305
query96	493	570	275	275
query97	2987	2964	2867	2867
query98	247	212	213	212
query99	1369	1421	1346	1346
Total cold run time: 277436 ms
Total hot run time: 188828 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b862ab6d43defa4505af35cee1fd8ca9a5280e93, data reload: false

query1	0.06	0.04	0.05
query2	0.10	0.05	0.06
query3	0.25	0.08	0.08
query4	1.60	0.11	0.12
query5	0.28	0.27	0.26
query6	1.22	0.67	0.64
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.62	0.53	0.51
query10	0.59	0.59	0.59
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.63	0.62	0.62
query14	1.04	1.04	1.04
query15	0.89	0.86	0.87
query16	0.43	0.41	0.42
query17	1.06	1.06	1.05
query18	0.23	0.21	0.20
query19	1.95	1.91	1.88
query20	0.02	0.02	0.01
query21	15.40	0.94	0.60
query22	0.76	1.37	0.96
query23	14.69	1.42	0.66
query24	7.55	0.79	1.14
query25	0.52	0.20	0.08
query26	0.56	0.16	0.14
query27	0.06	0.06	0.06
query28	9.02	0.89	0.43
query29	12.57	3.95	3.23
query30	0.29	0.13	0.11
query31	2.82	0.61	0.39
query32	3.24	0.57	0.49
query33	3.08	3.12	3.07
query34	16.07	5.48	4.84
query35	4.89	4.88	4.98
query36	0.68	0.50	0.50
query37	0.10	0.07	0.07
query38	0.06	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.17	0.14
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 104.08 s
Total hot run time: 30.26 s

@vinlee19
Copy link
Contributor Author

run p0

@vinlee19
Copy link
Contributor Author

run cloud_p0

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 23, 2025
@morningman morningman merged commit cd475b2 into apache:master Sep 23, 2025
31 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 23, 2025
…ma Consistency Issues (#56023)

### What problem does this PR solve?

Following PR #55404 which addressed incorrect schema results for Paimon
tables, this PR further optimizes the Paimon time travel implementation
by removing redundant code and fixing the critical issue where schema ID
and snapshot ID were fetched separately, causing consistency problems
and unnecessary I/O overhead.

  #### Key Problem Solved

The core challenge in Paimon time travel is efficiently constructing
tables with specified snapshot ID and schema ID. Previously, these were
fetched separately, causing unnecessary I/O overhead and potential
consistency issues.

  #### Major Improvements

  1. Performance Optimization 

- Unified schema and snapshot retrieval: Combined separate API calls for
schema ID and snapshot ID into a single operation
- Removed redundant metadata fetches: Eliminated duplicate calls to
Paimon metadata store
- Optimized branch schema handling: Streamlined branch-specific schema
resolution

  2. Bug Fixes

- Fixed timezone-related query errors: Resolved incorrect results when
using Paimon time travel syntax with non-UTC timezones
- Fixed branch schema consistency: Ensured snapshot ID and schema ID are
always fetched atomically to prevent mismatched metadata

3. Enhanced Timestamp Format Support for FOR TIME AS OF

  Now supports the following timestamp formats:
- YYYY-MM-DD HH:MM:SS.SSS - Full timestamp with milliseconds (e.g.,
2024-01-15 10:30:45.123)
- YYYY-MM-DD HH:MM:SS - Timestamp with seconds precision (e.g.,
2024-01-15 10:30:45)
- YYYY-MM-DD - Date only format (defaults to 00:00:00.000) (e.g.,
2024-01-15)

  Example usage:
  ```
  -- Using different timestamp formats
  SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15 10:30:45.123";
  SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15 10:30:45";
  SELECT * FROM paimon_table FOR TIME AS OF "2024-01-15";
  ```
morrySnow pushed a commit that referenced this pull request Sep 25, 2025
…and Fix Schema Consistency Issues #56023 (#56338)

Cherry-picked from #56023

Co-authored-by: Petrichor <xiaowenli@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants