Skip to content

Nightly test encryption fixes#20450

Closed
ccfelius wants to merge 4504 commits intoduckdb:v1.4-andiumfrom
ccfelius:encryption_test_fixes
Closed

Nightly test encryption fixes#20450
ccfelius wants to merge 4504 commits intoduckdb:v1.4-andiumfrom
ccfelius:encryption_test_fixes

Conversation

@ccfelius
Copy link
Member

@ccfelius ccfelius commented Jan 8, 2026

Fixes https://github.com/duckdblabs/duckdb-internal/issues/7080 and other potential nightly test failures related to USE + restart in encryption tests.

Same cause as here #20409

yan-alex and others added 30 commits December 19, 2025 15:10
…, and grab a vacuum lock when initiating a delete operation
  The original code had an invalid check:
```c++
  if (s_ele.__isset.num_children && s_ele.num_children > 0) { // inner node
```
This incorrectly assumed that `num_children == 0 or
!__isset.num_children` meant the schema element was a leaf node.
According to the Parquet
  specification:
  - Leaf nodes are defined by having a type field set
  - Inner nodes (groups) are defined by NOT having a type field set

This caused DuckDB to fail when reading parquet files with schemas that
contain empty groups, mistakenly stating that the file wasn't following
the parquet specification.
… make appending to "removed_data_during_checkpoint" conditional
…b#19930)

Note that I currently kept the checkpointing in the destructor as a last
back-up, but I am not sure if we should? As technically that should
never happen now - maybe we can instead log an error there? See my
`FIXME` in `src/main/attached_database.cpp`.

Fixes duckdblabs/duckdb-internal#6643
- fix `make generate-files` to also work in case the root of the repo is
not checked out in directory `duckdb`
- removed deprecated `asyncore` import, not longer supported since
`Python 3.12`
  - see: https://docs.python.org/3/library/asyncore.html
The PR fixes big-endian related issues with these:
- FSST compression ("gracefully" merges the latest of
https://github.com/cwida/fsst to pick up [BE
support](cwida/fsst#36));
- arrow conversion;
- GEOMETRY and HASH types;
- md5 functions.
Basically all automatic, BUT for
duckdb@1cc1ab7
that needs to be reverted via
duckdb@d29f8cb.

Note: on 1.4.4 and subsequent minor releases on v1.4-andium branch, same
dance again.

See more details at
duckdb#20227 (comment) and linked
conversations.
lnkuiper and others added 24 commits January 6, 2026 15:04
duckdb#20107)

This PR introduces a new scalar function
`parse_formatted_bytes(VARCHAR)` to DuckDB, which converts a
human-readable byte size (e.g. "16 KiB) into a numeric (UBIGINT) number
of bytes.

Changes:
- Added `parse_formatted_bytes` (throws errors) and
`try_parse_formatted_bytes` (returns `NULL`s on Errors)
- Implementation of parsing logic in `StringUtil` (originally taken from
`DBConfig`)
- Updated the existing component (DBConfig) that relied on the old
parsing logic to use the new `StringUtil` version.
- Tests for the new scalar function

## Why?
The initial motivation came from the implementation of
duckdb#19726, which aimed to expose
another column representing byte sizes.
During the review, @Mytherin suggested exposing a dedicated function
instead, keeping `duckdb_settings` lean while providing a more generally
useful utility.
`try_parse_formatted_bytes` can now be reused across various use cases
where parsing byte-size strings is needed.

Note - `try(parse_formatted_bytes)` acted differently than expected, it
seems like it doesn't handle runtime `Exception`s. Exposing
`try_parse_formatted_bytes` was my workaround.
…uckdb#20272)

Fixes duckdb#20187

## Description
- This PR fixes the internal error "Pivot aggregate count mismatch" that
occurs when using PIVOT with duplicate aggregates on ranges >= 21.
- Currently, the binder was not correctly duplicating aggregate
expressions for repeated aggregates in the USING clause.

## Changes
- Modified `BindBoundPivot()` in
`src/planner/binder/tableref/bind_pivot.cpp`
- Added logic to detect and handle duplicate aggregate.
- Expanded the deduplicated aggregates to match the user's expected
column count.

## Testing
Added test cases in
`test/sql/pivot/test_pivot_duplicate_aggregates.test` to test Mixed
duplicate and unique aggregates and different range sizes.
I met this issue when I tried to wrap `JsonReader`'s filesystem with
caching filesystem.

Current implementation for `CachingFileSystemWrapper` is buggy for
`CanSeek`:
- There're two types of filesystem instances in duckdb
+ One is raw filesystem, which interacts directly with storage backends,
for example, local filesystem and S3 filesystem
+ Another is wrapper filesystem, which wraps filesystem(s) and provides
additional features (i.e., virtual FS provides abstraction layer over
multiple FS, caching FS enables read cache)
- File handle's invocation [delegates
to](https://github.com/duckdb/duckdb/blob/041f4cac6889afa6604c8deee9e6574b8d6ae3aa/src/common/file_system.cpp#L748-L750)
filesystem's
- For caching fs wrapper, it's the [cache-wrapped internal
filesystem](https://github.com/duckdb/duckdb/blob/041f4cac6889afa6604c8deee9e6574b8d6ae3aa/src/storage/caching_file_system_wrapper.cpp#L339-L341),
which could be either raw filesystem or wrapped filesystem and they
should be treated separately

~~Proposed solution implemented in this PR:~~
~~- Seek-ability is an immutable attribute for a filesystem~~
~~+ For raw file systems, `CanSeek` (the one with no arguments provided)
has been already implemented~~
~~+ For wrapped filesystems, the function call should be properly
delegated to internal filesystem instances, just as what we do for other
APIs~~

Update: follow Lauren's suggestion to integrate `CachingMode` into
`FileOpenFlags` to get the caching work handled inside of
VirtualFileSystem.
This PR resolves several Windows‑specific inconsistencies in DuckDB’s
formatting scripts, ensuring they behave deterministically across
Windows and Unix‑like environments.

### **Improvements**
- **Force Linux‑style line endings (`\n`)** when writing temporary
formatted files.
The `open_utf8` helper now forwards `newline='\n'` (and any additional
keyword arguments), preventing Windows from injecting `\r\n` and causing
spurious diffs in `scripts/format.py`.

- **Normalize all paths to POSIX (`/`) form** when generating test
metadata.
`format_test_benchmark.py` now uses `Path(...).as_posix()` to ensure
stable, platform‑independent `# name:` headers.

  **Example:**
  ```
  good
  # name: benchmark/appian_benchmarks/q01.benchmark
  # description: Run query 01 from the appian benchmarks

  bad
  # name: benchmark\appian_benchmarks\q01.benchmark
  # description: Run query 01 from the appian benchmarks
  ```

### **Why This Matters**
These changes eliminate Windows‑specific formatting noise, making the
formatting scripts fully deterministic across developer environments.
This improves contributor experience and prevents CI failures caused by
platform‑dependent EOL or path differences.

### **Scope**
- No functional changes to DuckDB itself  
- Only affects developer tooling under `scripts/`
When environment variable DUCKDB_FORMAT_SKIP_VERSION_CHECKS is defined,
skip failing on specific version

This is out of spite, I could/should fixup my local setup, but also I
think a way to skip the checks does make sense.
This PR try fixes issue duckdb#20233.
hi duckdb team,I added the `DoesColumnAliasExist` function to
`QualifyBinder` based on the code in the `where_bingder.cpp` file.
Testing shows that it works correctly, but I'm not sure if it will
affect other modules. thanks~
…uckdb#20407)

This PR removes unused and redundant headers across the storage
subsystem.

The cleanup exposed missing explicit includes that were previously
satisfied via
transitive dependencies; these have been fixed by adding the required
headers in
the appropriate locations, improving include hygiene and Windows/MSVC
compatibility.

All CI workflows (Linux, macOS, Windows) pass on this branch.
Fixes duckdblabs/duckdb-internal#6956 and
duckdblabs/duckdb-internal#7005

Test failed due to usage of `USE db_name` + `restart`. This pattern
fails in any case, i.e.

```
loop i 1 2

statement ok
ATTACH '__TEST_DIR__/test.db' as test;

statement ok
USE test

restart

endloop
```

also fails with 

```
Catalog Error: SET search_path: No catalog + schema named "test.main" found.
```

removing `USE` and using declarative `db_name.tbl` solved the issue.

Not 100% sure if this is desired behavior, but I would assume so.
Picking up duckdb#19606, solving merge
conflict, and removing unnecessary exclusions of `time_ns` type.

We are still missing the tests for the different precision types of
`TIME`, these should be added in our Python client after this code gets
merged, as we need arrow to produce these precision types.
…ion and improve ingest name resolution (duckdb#20369)

Add support for `adbc.ingest.target_catalog` in the DuckDB ADBC ingest
path.

Close duckdb#20128

This allows ingesting into attached databases (catalogs) by propagating
the catalog into both the CREATE/DROP SQL and the appender.

When only target_catalog is provided (no target_db_schema), the driver
defaults the schema to `main` and uses a fully-qualified 3-part name
(`catalog_name.main.table_name`) to avoid DuckDB’s catalog/schema
ambiguity with 2-part names.

Temporary ingestion is also updated to align with common ADBC
expectations: temporary tables are created in the `temp` schema and are
distinct from persistent tables with the same name.
`temporary` remains incompatible with target_db_schema / target_catalog
at execution time, but enabling `temporary` after setting schema/catalog
clears those options so ingest can proceed (mirroring behavior in the
Postgres driver in the arrow-adbc repo).

cc @lidavidm
`FileOpener` is needed for HTTP utils, which should be propagated from
virtual filesystem, to caching wrapper, to caching filesystem.

Checked with previously failed SQL and confirmed to work:
```sql
memory D select
             unnest(data) as customers
         from
             read_json('https://non.existant/endpoint');
IO Error:
Could not establish connection error for HTTP HEAD to 'https://non.existant/endpoint'

LINE 4:     read_json('https://non.existant/endpoint');
            ^
```
We may need to eliminate NOT operator for efficiency. It can
1. save an operator
2.  beneficial for statistics propagation
3. more efficient ternary process

In this pr, the rule eliminates NOT operator with the following tree
patterns
1. Nested NOT
```
NOT NOT col1 > 1   TO    col1 > 1
NOT NOT NOT col1 > 1   TO    NOT col1 > 1

``` 
2. NOT with IS_NOT_NULL/IS_NULL
```
NOT IS_NOT_NULL/IS_NULL ==> IS_NULL/IS_NOT_NULL
```
4. NOT with AND 
```
NOT (col1 > 1 AND col2 <= 2)   TO  col1 <= 1 OR col2 > 2
```
5. NOT with OR 
```
NOT (col1 > 1 OR col2 <= 2)   TO  col1 <= 1 AND col2 > 2
```
Please let me know if there is anything wrong with the above
understanding :), thanks
@ccfelius ccfelius changed the title Nightly encryption test fixes Nightly test encryption fixes Jan 8, 2026
@ccfelius ccfelius changed the base branch from main to v1.4-andium January 9, 2026 07:13
@ccfelius ccfelius closed this Jan 9, 2026
@ccfelius ccfelius deleted the encryption_test_fixes branch January 9, 2026 07:50
lnkuiper added a commit that referenced this pull request Jan 12, 2026
Fixes duckdblabs/duckdb-internal#7080 and
other potential nightly test failures related to USE + restart in
encryption tests.

Same cause as here #20409

Was #20450, now targeted to the
right branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.