Improve compressed DML expression pushdown by svenklemm · Pull Request #6895 · timescale/timescaledb

svenklemm · 2024-05-08T10:47:20Z

Try to constify query constraints when filtering batches to cover
a wider range of expressions that are safe to evaluate when we
do the batch filtering.

This will allow the following expressions to be usable with
compressed DML batch filtering:

IMMUTABLE and STABLE functions
SQLValueFunction (CURRENT_TIME,CURRENT_USER,LOCAL_TIME,...)
Prepared Statement parameters

codecov · 2024-05-08T10:56:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.73%. Comparing base (59f50f2) to head (19d05ec).
Report is 148 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6895      +/-   ##
==========================================
+ Coverage   80.06%   80.73%   +0.66%     
==========================================
  Files         190      199       +9     
  Lines       37181    37209      +28     
  Branches     9450     9726     +276     
==========================================
+ Hits        29770    30041     +271     
- Misses       2997     3206     +209     
+ Partials     4414     3962     -452

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akuzm · 2024-05-08T15:22:46Z

tsl/src/compression/compression.c

 				   List **index_filters, List **is_null)
 {
 	ListCell *lc;
+	PlannerGlobal glob = { .boundParams = NULL };


Should we take es_param_list_info from the node state here, like in runtime chunk exclusion? Then it could work for prepared statement params, and under joins.

I think for joins it would not be safe as we dont repeat decompression in rescan. For now this is intentional. I'll add some more tests cause having prepared statement support would be nice.

Oh, right, I think it's not even possible, because the rescans happen underneath the DELETE, and it receives the tids only. And we have to guess which batches we have to decompress even before the underlying nested loop starts running.

Would be nice if the underlying scans could return the matching compressed tids as well, and then we'd use that to decompress, but that's a completely different approach.

Might also create additional weirdness with the snapshots betweeen the internal scan and the normal scan

Prepared Statement works too as they are handled differently from join params, i'll add a test

I've added a comment as well.

akuzm · 2024-05-08T15:26:33Z

tsl/src/compression/compression.c

 				if (!IsA(expr, Const))
-					continue;
+				{
+					expr = (Expr *) estimate_expression_value(&root, (Node *) expr);


Currently this means we're able to evaluate the stable functions in WHERE conditions, when it is on one side of a supported operator, right?

correct, all the sqlvaluefunction (current_timestamp, current_user, ...) will also be supported with this

antekresic

lgtm

Try to constify query constraints when filtering batches to cover a wider range of expressions that are safe to evaluate when we do the batch filtering. This will allow the following expressions to be usable with compressed DML batch filtering: - IMMUTABLE and STABLE functions - SQLValueFunction (CURRENT_TIME,CURRENT_USER,LOCAL_TIME,...) - Prepared Statement parameters

This release contains performance improvements and bug fixes since the 2.15.3 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6880: Add support for the array operators used for compressed DML batch filtering. * timescale#6895: Improve the compressed DML expression pushdown. * timescale#6897: Add support for replica identity on compressed hypertables. * timescale#6918: Remove support for PG13. * timescale#6920: Rework compression activity wal markers. * timescale#6989: Add support for foreign keys when converting plain tables to hypertables. * timescale#7020: Add support for the chunk column statistics tracking. * timescale#7048: Add an index scan for INSERT DML decompression. * timescale#7075: Reduce decompression on the compressed INSERT. * timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE. * timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints **Bugfixes** * timescale#7018: Fix `search_path` quoting in the compression defaults function. * timescale#7046: Prevent locking for compressed tuples. * timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * timescale#7064: Fix the bug in the default `order by` calculation in compression. * timescale#7069: Fix the index column name usage. * timescale#7074: Fix the bug in the default `segment by` calculation in compression. **Thanks**

This release contains performance improvements and bug fixes since the 2.15.3 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6880: Add support for the array operators used for compressed DML batch filtering. * timescale#6895: Improve the compressed DML expression pushdown. * timescale#6897: Add support for replica identity on compressed hypertables. * timescale#6918: Remove support for PG13. * timescale#6920: Rework compression activity wal markers. * timescale#6989: Add support for foreign keys when converting plain tables to hypertables. * timescale#7020: Add support for the chunk column statistics tracking. * timescale#7048: Add an index scan for INSERT DML decompression. * timescale#7075: Reduce decompression on the compressed INSERT. * timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE. * timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints * timescale#7116 Use DELETE instead of TRUNCATE after compression * timescale#7134 Refactor foreign key handling for compressed hypertables **Bugfixes** * timescale#7018: Fix `search_path` quoting in the compression defaults function. * timescale#7046: Prevent locking for compressed tuples. * timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * timescale#7064: Fix the bug in the default `order by` calculation in compression. * timescale#7069: Fix the index column name usage. * timescale#7074: Fix the bug in the default `segment by` calculation in compression. **Thanks**

@jledentu

This release contains performance improvements and bug fixes since the 2.15.3 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6880: Add support for the array operators used for compressed DML batch filtering. * timescale#6895: Improve the compressed DML expression pushdown. * timescale#6897: Add support for replica identity on compressed hypertables. * timescale#6918: Remove support for PG13. * timescale#6920: Rework compression activity wal markers. * timescale#6989: Add support for foreign keys when converting plain tables to hypertables. * timescale#7020: Add support for the chunk column statistics tracking. * timescale#7048: Add an index scan for INSERT DML decompression. * timescale#7075: Reduce decompression on the compressed INSERT. * timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE. * timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints * timescale#7116 Use DELETE instead of TRUNCATE after compression * timescale#7134 Refactor foreign key handling for compressed hypertables * timescale#7161 Fix `mergejoin input data is out of order` **Bugfixes** * timescale#6987 Fix REASSIGN OWNED BY for background jobs * timescale#7018: Fix `search_path` quoting in the compression defaults function. * timescale#7046: Prevent locking for compressed tuples. * timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * timescale#7064: Fix the bug in the default `order by` calculation in compression. * timescale#7069: Fix the index column name usage. * timescale#7074: Fix the bug in the default `segment by` calculation in compression. **Thanks** * @jledentu For reporting a problem with mergejoin input order

@jledentu

This release contains significant performance improvements when working with compressed data, extended join support in continuous aggregates, and the ability to define foreign keys from regular tables towards hypertables. We recommend that you upgrade at the next available opportunity. In TimescaleDB v2.16.0 we: * Introduce multiple performance focused optimizations for data manipulation operations (DML) over compressed chunks. Improved upsert performance by more than 100x in some cases and more than 1000x in some update/delete scenarios. * Add the ability to define chunk skipping indexes on non-partitioning columns of compressed hypertables TimescaleDB v2.16.0 extends chunk exclusion to use those skipping (sparse) indexes when queries filter on the relevant columns, and prune chunks that do not include any relevant data for calculating the query response. * Offer new options for use cases that require foreign keys defined. You can now add foreign keys from regular tables towards hypertables. We have also removed some really annoying locks in the reverse direction that blocked access to referenced tables while compression was running. * Extend Continuous Aggregates to support more types of analytical queries. More types of joins are supported, additional equality operators on join clauses, and support for joins between multiple regular tables. **Highlighted features in this release** * Improved query performance through chunk exclusion on compressed hypertables. You can now define chunk skipping indexes on compressed chunks for any column with one of the following integer data types: `smallint`, `int`, `bigint`, `serial`, `bigserial`, `date`, `timestamp`, `timestamptz`. After you call `enable_chunk_skipping` on a column, TimescaleDB tracks the min and max values for that column. TimescaleDB uses that information to exclude chunks for queries that filter on that column, and would not find any data in those chunks. * Improved upsert performance on compressed hypertables. By using index scans to verify constraints during inserts on compressed chunks, TimescaleDB speeds up some ON CONFLICT clauses by more than 100x. * Improved performance of updates, deletes, and inserts on compressed hypertables. By filtering data while accessing the compressed data and before decompressing, TimescaleDB has improved performance for updates and deletes on all types of compressed chunks, as well as inserts into compressed chunks with unique constraints. By signaling constraint violations without decompressing, or decompressing only when matching records are found in the case of updates, deletes and upserts, TimescaleDB v2.16.0 speeds up those operations more than 1000x in some update/delete scenarios, and 10x for upserts. * You can add foreign keys from regular tables to hypertables, with support for all types of cascading options. This is useful for hypertables that partition using sequential IDs, and need to reference those IDs from other tables. * Lower locking requirements during compression for hypertables with foreign keys Advanced foreign key handling removes the need for locking referenced tables when new chunks are compressed. DML is no longer blocked on referenced tables while compression runs on a hypertable. * Improved support for queries on Continuous Aggregates `INNER/LEFT` and `LATERAL` joins are now supported. Plus, you can now join with multiple regular tables, and you can have more than one equality operator on join clauses. **PostgreSQL 13 support removal announcement** Following the deprecation announcement for PostgreSQL 13 in TimescaleDB v2.13, PostgreSQL 13 is no longer supported in TimescaleDB v2.16. The Currently supported PostgreSQL major versions are 14, 15 and 16. **Features** * #6880: Add support for the array operators used for compressed DML batch filtering. * #6895: Improve the compressed DML expression pushdown. * #6897: Add support for replica identity on compressed hypertables. * #6918: Remove support for PG13. * #6920: Rework compression activity wal markers. * #6989: Add support for foreign keys when converting plain tables to hypertables. * #7020: Add support for the chunk column statistics tracking. * #7048: Add an index scan for INSERT DML decompression. * #7075: Reduce decompression on the compressed INSERT. * #7101: Reduce decompressions for the compressed UPDATE/DELETE. * #7108 Reduce decompressions for INSERTs with UNIQUE constraints * #7116 Use DELETE instead of TRUNCATE after compression * #7134 Refactor foreign key handling for compressed hypertables * #7161 Fix `mergejoin input data is out of order` **Bugfixes** * #6987 Fix REASSIGN OWNED BY for background jobs * #7018: Fix `search_path` quoting in the compression defaults function. * #7046: Prevent locking for compressed tuples. * #7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * #7064: Fix the bug in the default `order by` calculation in compression. * #7069: Fix the index column name usage. * #7074: Fix the bug in the default `segment by` calculation in compression. **Thanks** * @jledentu For reporting a problem with mergejoin input order

svenklemm self-assigned this May 8, 2024

svenklemm added the Columnstore Related to the column store / compression label May 8, 2024

svenklemm requested review from akuzm, antekresic and fabriziomello May 8, 2024 15:17

svenklemm force-pushed the compressed_dml_constify branch from 14f4fd6 to 2be3ae9 Compare May 8, 2024 15:20

akuzm reviewed May 8, 2024

View reviewed changes

svenklemm force-pushed the compressed_dml_constify branch 5 times, most recently from 10d6685 to 54c5d04 Compare May 9, 2024 05:33

antekresic approved these changes May 9, 2024

View reviewed changes

akuzm approved these changes May 9, 2024

View reviewed changes

svenklemm force-pushed the compressed_dml_constify branch from 54c5d04 to 16c077e Compare May 9, 2024 12:34

svenklemm force-pushed the compressed_dml_constify branch from 16c077e to 19d05ec Compare May 9, 2024 12:39

svenklemm enabled auto-merge (rebase) May 9, 2024 12:56

svenklemm merged commit f41cf0c into timescale:main May 9, 2024

pallavisontakke added this to the TimescaleDB 2.16.0 milestone Jul 12, 2024

pallavisontakke mentioned this pull request Jul 18, 2024

Release 2.16.0 #7135

Closed

pallavisontakke mentioned this pull request Jul 25, 2024

Release 2.16.0 #7156

Closed

pallavisontakke mentioned this pull request Jul 31, 2024

Release 2.16.0 #7169

Closed

bayandin mentioned this pull request Aug 1, 2024

timescaledb 2.16.0 bayandin/homebrew-tap#173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve compressed DML expression pushdown#6895

Improve compressed DML expression pushdown#6895
svenklemm merged 1 commit intotimescale:mainfrom
svenklemm:compressed_dml_constify

svenklemm commented May 8, 2024 •

edited

Loading

Uh oh!

codecov bot commented May 8, 2024 •

edited

Loading

Uh oh!

akuzm May 8, 2024

Uh oh!

svenklemm May 8, 2024

Uh oh!

akuzm May 8, 2024

Uh oh!

svenklemm May 8, 2024

Uh oh!

svenklemm May 9, 2024

Uh oh!

svenklemm May 9, 2024

Uh oh!

akuzm May 8, 2024

Uh oh!

svenklemm May 8, 2024

Uh oh!

antekresic left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

svenklemm commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antekresic left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

svenklemm commented May 8, 2024 •

edited

Loading

codecov bot commented May 8, 2024 •

edited

Loading