Skip to content

Vector Verification: Rework to run based on env variable DUCKDB_DEBUG_VERIFY_VECTOR and move to Main.yml#16957

Merged
Mytherin merged 8 commits intoduckdb:mainfrom
Mytherin:seqverify
Apr 2, 2025
Merged

Vector Verification: Rework to run based on env variable DUCKDB_DEBUG_VERIFY_VECTOR and move to Main.yml#16957
Mytherin merged 8 commits intoduckdb:mainfrom
Mytherin:seqverify

Conversation

@Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Apr 2, 2025

This PR modifies the vector verification tests so they run based on an environment variable, e.g.:

DUCKDB_DEBUG_VERIFY_VECTOR=dictionary_expression build/debug/test/unittest ...

This then allows all of the vector verification tests to be run in a single workflow instead of requiring multiple re-compilations.

We move this workflow from nightly to Main.yml as well.

In addition, we fix a minor bug with Reinterpret on dictionary vectors which fixes an issue in the sequence operator verification run.

@carlopi
Copy link
Contributor

carlopi commented Apr 2, 2025

+1 on moving to this to ENV variable

@Mytherin Mytherin merged commit 9c0ec2c into duckdb:main Apr 2, 2025
46 of 62 checks passed
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Vector Verification: Rework to run based on env variable `DUCKDB_DEBUG_VERIFY_VECTOR` and move to `Main.yml` (duckdb/duckdb#16957)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Vector Verification: Rework to run based on env variable `DUCKDB_DEBUG_VERIFY_VECTOR` and move to `Main.yml` (duckdb/duckdb#16957)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Vector Verification: Rework to run based on env variable `DUCKDB_DEBUG_VERIFY_VECTOR` and move to `Main.yml` (duckdb/duckdb#16957)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Vector Verification: Rework to run based on env variable `DUCKDB_DEBUG_VERIFY_VECTOR` and move to `Main.yml` (duckdb/duckdb#16957)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Vector Verification: Rework to run based on env variable `DUCKDB_DEBUG_VERIFY_VECTOR` and move to `Main.yml` (duckdb/duckdb#16957)
@Mytherin Mytherin deleted the seqverify branch June 12, 2025 15:29
Mytherin added a commit that referenced this pull request Jul 16, 2025
#16957 fixed the bug in `Vector::Reinterpret` for dictionary vectors by
creating a new child buffer every time. However, creating a new child
buffer is much more expensive than copying a shared_ptr, which results
in about 3% performance regression in TPC-H benchmarks.

The main issue is that a dictionary vector cannot directly copy the
child buffer if the types don't match, as this would make the child
vector's type inconsistent. So we can check whether the types are
different and only create a new child buffer when the types are not
equal - which isn't the case in most scenarios.

benchmark result:
```
python scripts/regression/test_runner.py --old build/release-old/benchmark/benchmark_runner --new build/release/benchmark/benchmark_runner --benchmarks .github/regression/tpch.csv --threads 2
Old timing geometric mean: 0.02501917589777838
New timing geometric mean: 0.024095980407583723, roughly 3% faster

python scripts/regression/test_runner.py --old build/release-old/benchmark/benchmark_runner --new build/release/benchmark/benchmark_runner --benchmarks .github/regression/tpcds.csv --threads 2
Old timing geometric mean: 0.021899101352704627
New timing geometric mean: 0.021254155592682008, roughly 2% faster
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants