Skip to content

Conversation

@Gabriel39
Copy link
Contributor

@Gabriel39 Gabriel39 commented Jan 13, 2026

What problem does this PR solve?

pick #57397 #58283 #58290 #58282 #58832 #58905 #58960 #59005 #59088 #59098 #59126 #59187 #59581 #59625 #59775

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…che#58960)

Disable push down all `cast` expr to avoid tricky value conversion in
column predicates.
TopN filter should be push down from scan operators like other
predicates.
…e#59098)

### What problem does this PR solve?

Introduced by apache#58905

==2037076==ERROR: AddressSanitizer: heap-use-after-free on address
0x7baaae908730 at pc 0x561b769a1fd0 bp 0x7b3caf4ebdf0 sp 0x7b3caf4ebde8
22:30:08  READ of size 1 at 0x7baaae908730 thread T12303 (rs_normal
[work)
22:30:08  #0 0x561b769a1fcf in doris::(anonymous
namespace)::string_compare(char const*, long, char const*, long, long)
/root/doris/be/src/vec/common/string_ref.h:170:29
22:30:08  apache#1 0x561b769a1fcf in
doris::StringRef::compare(doris::StringRef const&) const
/root/doris/be/src/vec/common/string_ref.h:259:30
22:30:08  apache#2 0x561b76f537cd in doris::StringRef::ge(doris::StringRef
const&) const /root/doris/be/src/vec/common/string_ref.h:282:52
22:30:08  apache#3 0x561b76f537cd in
doris::StringRef::operator>=(doris::StringRef const&) const
/root/doris/be/src/vec/common/string_ref.h:292:60
22:30:08  apache#4 0x561b76f537cd in bool
doris::Compare::greater_equal<doris::StringRef>(doris::StringRef const&,
doris::StringRef const&) /root/doris/be/src/common/compare.h:42:18
22:30:08  apache#5 0x561b76f537cd in
doris::ComparisonPredicateBase<(doris::PrimitiveType)23,
(doris::PredicateType)6>::camp_field(doris::vectorized::Field const&,
doris::vectorized::Field const&) const
/root/doris/be/src/olap/comparison_predicate.h:192:20
22:30:08  apache#6 0x561b76f4baa4 in
doris::ComparisonPredicateBase<(doris::PrimitiveType)23,
(doris::PredicateType)6>::evaluate_and(doris::vectorized::ParquetPredicate::ColumnStat*)
const /root/doris/be/src/olap/comparison_predicate.h:207:26
22:30:08  apache#7 0x561b76765284 in
doris::AndBlockColumnPredicate::evaluate_and(doris::vectorized::ParquetPredicate::ColumnStat*)
const /root/doris/be/src/olap/block_column_predicate.h:251:42
22:30:08  apache#8 0x561b89acd735 in
doris::vectorized::ParquetReader::_process_column_stat_filter(tparquet::RowGroup
const&, std::vector<std::unique_ptr<doris::MutilColumnBlockPredicate,
std::default_delete<doris::MutilColumnBlockPredicate> >,
std::allocator<std::unique_ptr<doris::MutilColumnBlockPredicate,
std::default_delete<doris::MutilColumnBlockPredicate> > > > const&,
bool*, bool*, bool*)
/root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:1225:25
22:30:08  apache#9 0x561b89ac8dd7 in
doris::vectorized::ParquetReader::_process_min_max_bloom_filter(doris::vectorized::RowGroupReader::RowGroupIndex
const&, tparquet::RowGroup const&,
std::vector<std::unique_ptr<doris::MutilColumnBlockPredicate,
std::default_delete<doris::MutilColumnBlockPredicate> >,
std::allocator<std::unique_ptr<doris::MutilColumnBlockPredicate,
std::default_delete<doris::MutilColumnBlockPredicate> > > > const&,
doris::segment_v2::RowRanges*)
/root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:1108:9
22:30:08  apache#10 0x561b89ac3e73 in
doris::vectorized::ParquetReader::_next_row_group_reader()
/root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:718:9
22:30:08  apache#11 0x561b89ac008f in
doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*,
unsigned long*, bool*)
/root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:607:21
22:30:08  apache#12 0x561b8a07c6f7 in
doris::vectorized::HiveReader::get_next_block_inner(doris::vectorized::Block*,
unsigned long*, bool*)
/root/doris/be/src/vec/exec/format/table/hive_reader.cpp:32:5
22:30:08  apache#13 0x561b89fee256 in
doris::vectorized::TableFormatReader::get_next_block(doris::vectorized::Block*,
unsigned long*, bool*)
/root/doris/be/src/vec/exec/format/table/table_format_reader.h:81:16
22:30:08  apache#14 0x561b89f71b97 in
doris::vectorized::FileScanner::_get_block_wrapped(doris::RuntimeState*,
doris::vectorized::Block*, bool*)
/root/doris/be/src/vec/exec/scan/file_scanner.cpp:472:13
22:30:08  apache#15 0x561b89f7086f in
doris::vectorized::FileScanner::_get_block_impl(doris::RuntimeState*,
doris::vectorized::Block*, bool*)
/root/doris/be/src/vec/exec/scan/file_scanner.cpp:409:17
22:30:08  apache#16 0x561b8a19f86e in
doris::vectorized::Scanner::get_block(doris::RuntimeState*,
doris::vectorized::Block*, bool*)
/root/doris/be/src/vec/exec/scan/scanner.cpp:109:17
22:30:08  apache#17 0x561b8a19f0a6 in
doris::vectorized::Scanner::get_block_after_projects(doris::RuntimeState*,
doris::vectorized::Block*, bool*)
/root/doris/be/src/vec/exec/scan/scanner.cpp:85:16
22:30:08  apache#18 0x561b8a1ccd0f in
doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)
/root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:173:5
22:30:08  apache#19 0x561b8a1d6875 in
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()::operator()() const::'lambda'()::operator()() const
/root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:76:17
22:30:08  apache#20 0x561b8a1d6875 in
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()::operator()() const
/root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:75:27
22:30:08  apache#21 0x561b8a1d6875 in bool std::__invoke_impl<bool,
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()&>(std::__invoke_other,
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()&)
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:63:14
22:30:08  apache#22 0x561b8a1d6875 in std::enable_if<is_invocable_r_v<bool,
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()&>, bool>::type std::__invoke_r<bool,
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()&>(doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()&)
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:116:9
22:30:08  apache#23 0x561b8a1d6875 in std::_Function_handler<bool (),
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()()
const::'lambda'()>::_M_invoke(std::_Any_data const&)
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292:9
22:30:08  apache#24 0x561b8a1d5f07 in std::function<bool ()>::operator()()
const
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
22:30:08  apache#25 0x561b8a1d5f07 in
doris::vectorized::ScannerSplitRunner::process_for(std::chrono::duration<long,
std::ratio<1l, 1000000000l> >)
/root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:407:25
22:30:08  apache#26 0x561b8a2c56d4 in
doris::vectorized::PrioritizedSplitRunner::process()
/root/doris/be/src/vec/exec/executor/time_sharing/prioritized_split_runner.cpp:103:35
22:30:08  apache#27 0x561b8a29045c in
doris::vectorized::TimeSharingTaskExecutor::_dispatch_thread()
/root/doris/be/src/vec/exec/executor/time_sharing/time_sharing_task_executor.cpp:570:77
22:30:08  apache#28 0x561b7b9fecb6 in std::function<void ()>::operator()()
const
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
22:30:08  apache#29 0x561b7b9fecb6 in doris::Thread::supervise_thread(void*)
/root/doris/be/src/util/thread.cpp:460:5
22:30:08  apache#30 0x561b76044d26 in asan_thread_start(void*)
(/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P1/Cluster0/be/lib/doris_be+0x23962d26)
22:30:08  apache#31 0x7f4aaae68608 in start_thread
/build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
22:30:08  apache#32 0x7f4aaad7b132 in __clone
/build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This PR refine predicates generation. Previously, predicates are
generated on ScanOperator for OlapTable and push down to TabletReader.
However, for other types of tables, Exprs are just push down simply and
converted to predicates on own file readers. This introduces complexity
and overhead for us to maintain.
And then, this PR makes all predicates generated on ScanOperator for all
tables.
@Gabriel39 Gabriel39 requested a review from yiguolei as a code owner January 13, 2026 02:21
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39
Copy link
Contributor Author

run buildall

@Gabriel39
Copy link
Contributor Author

run buildall

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 35.73% (622/1741) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.42% (18844/35276)
Line Coverage 39.26% (174912/445509)
Region Coverage 33.92% (135388/399127)
Branch Coverage 34.86% (58571/168009)

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 35.73% (622/1741) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.42% (18845/35280)
Line Coverage 39.26% (174937/445571)
Region Coverage 33.87% (135234/399221)
Branch Coverage 34.85% (58576/168077)

@yiguolei yiguolei merged commit 0174967 into apache:branch-4.0 Jan 14, 2026
24 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants