Skip to content

Rework compression activity wal markers#6920

Merged
fabriziomello merged 1 commit intomainfrom
jg/fun-with-compression
May 28, 2024
Merged

Rework compression activity wal markers#6920
fabriziomello merged 1 commit intomainfrom
jg/fun-with-compression

Conversation

@JamesGuthrie
Copy link
Copy Markdown
Contributor

@JamesGuthrie JamesGuthrie commented May 15, 2024

  • Adds WAL markers around all compression and decompression activities.
  • Renames the GUC controlling this behaviour.
  • Enables the WAL marker GUC by default.

This allows to distinguish between "user-driven" and
"compression-driven" DML on uncompressed chunks. This is a requirement
to be able to support DML on compressed chunks in live migration.

Note: A previous commit 1 added wal markers before and after inserts
which were part of "transparent decompression". Transparent
decompression is triggered when an UPDATE or DELETE statement affects
compressed data, or an INSERT statment inserts into a range of
compressed data which has a unique or primary key constraint. In these
cases, the data is first moved from the compressed chunk to the
uncompressed chunk, and then the DML is applied.

This change extends the existing behaviour on two fronts:

  1. It adds WAL markers for both chunk compression and decompression
    events.
  2. It extends the WAL markers for transparent decompression to include
    not only INSERTs into the compressed chunk, but also to TimescaleDB
    catalog operations which were part of the decompression.

@JamesGuthrie JamesGuthrie force-pushed the jg/fun-with-compression branch 9 times, most recently from efd6c57 to 3f7717a Compare May 16, 2024 10:07
@JamesGuthrie JamesGuthrie requested a review from svenklemm May 16, 2024 10:38
@JamesGuthrie JamesGuthrie marked this pull request as ready for review May 16, 2024 10:38
src/guc.c Outdated
@@ -477,10 +477,9 @@ _guc_init(void)
NULL);

DefineCustomBoolVariable(MAKE_EXTOPTION("enable_decompression_logrep_markers"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, decompression markers sound specific to decompression activity. With this PR, the markers will be available for recompression as well. So, I think we should generalize the GUC to be compression markers (representing both decompression and recompression).

Suggested change
DefineCustomBoolVariable(MAKE_EXTOPTION("enable_decompression_logrep_markers"),
DefineCustomBoolVariable(MAKE_EXTOPTION("enable_compression_logrep_markers"),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that, but it would break compatibility with applications which are already using this guc. The way I justified retaining the name is to imagine that there are brackets around the "de" in decompression, like this: enable_(de)compression_logrep_markers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to enable_compression_wal_markers.

@noctarius
Copy link
Copy Markdown
Contributor

If I understand the PR correctly, your patch wouldn't send the catalog updates for the chunk tables itself, would it? That would actually break functionality on my tool, since I completely replicate the catalog in memory and need that information.

I'd probably define a second set of markers with slightly different prefix names. In this case everybody would be able to decide to either ignore them or act on them. Alternatively, we could use the message body, like adding the table names, but I think the second prefix is easier and more obvious.

Changing the GUC name is ok, I can let people know in the docs :)

@JamesGuthrie
Copy link
Copy Markdown
Contributor Author

JamesGuthrie commented May 21, 2024

If I understand the PR correctly, your patch wouldn't send the catalog updates for the chunk tables itself, would it?

This PR doesn't change which catalog updates are emitted. One thing that has changed is where exactly the "end decompression marker" is in the WAL. It now surrounds both inserts into the uncompressed chunk, and the associated catalog changes. To visualise this:

Previously the WAL would look like this for a "transparent decompression" event (e.g. UPDATE to compressed data):

BEGIN
message: transactional: 1 prefix: ::timescaledb-decompression-start, sz: 0 content:
table _timescaledb_internal.compress_hyper_2_2_chunk: DELETE: (no-tuple-data)
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 18:00:00-07' device_id[bigint]:1 value[double precision]:1
message: transactional: 1 prefix: ::timescaledb-decompression-end, sz: 0 content:
table _timescaledb_catalog.chunk: UPDATE: id[integer]:1 hypertable_id[integer]:1 schema_name[name]:'_timescaledb_internal' table_name[name]:'_hyper_1_1_chunk' compressed_chunk_id[integer]:2 dropped[boolean]:false status[integer]:9 osm_chunk[boolean]:false
table _timescaledb_internal._hyper_1_1_chunk: UPDATE: old-key: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1 new-tuple: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:22
COMMIT

Now, it looks like this:

BEGIN
message: transactional: 1 prefix: ::timescaledb-decompression-start, sz: 0 content:
table _timescaledb_internal.compress_hyper_2_2_chunk: DELETE: (no-tuple-data)
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 18:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_catalog.chunk: UPDATE: id[integer]:1 hypertable_id[integer]:1 schema_name[name]:'_timescaledb_internal' table_name[name]:'_hyper_1_1_chunk' compressed_chunk_id[integer]:2 dropped[boolean]:false status[integer]:9 osm_chunk[boolean]:false
message: transactional: 1 prefix: ::timescaledb-decompression-end, sz: 0 content:
table _timescaledb_internal._hyper_1_1_chunk: UPDATE: old-key: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1 new-tuple: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:22
COMMIT

Note that the timescaledb-decompression-end message now comes after the UPDATE to the _timescaledb_catalog.chunk table.

IMHO this is "more correct", as the decompression markers now surround all decompression activities. Your tool can decide which kinds of activities it does or does not ignore in the context of those markers.

@JamesGuthrie JamesGuthrie force-pushed the jg/fun-with-compression branch from 3f7717a to c972677 Compare May 21, 2024 07:49
@noctarius
Copy link
Copy Markdown
Contributor

Ah I see. Yeah that is totally fine, since I can still act on all catalog entries, while ignoring anything outside the catalog 👍

Copy link
Copy Markdown
Member

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, makes the replication markers more consistent across the whole codebase.

@JamesGuthrie JamesGuthrie force-pushed the jg/fun-with-compression branch 4 times, most recently from 9e2dd49 to d5c0eed Compare May 22, 2024 14:27
ereport((if_not_compressed ? NOTICE : ERROR),
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("chunk \"%s\" is already compressed", get_rel_name(chunk->table_id))));
write_logical_replication_msg_compression_end();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be triggered in the error path. I guess it doesnt matter if you want it in the error path you should have it above the ereport.

Copy link
Copy Markdown
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesGuthrie JamesGuthrie force-pushed the jg/fun-with-compression branch 2 times, most recently from 65aa711 to 51edab2 Compare May 27, 2024 11:27
@JamesGuthrie JamesGuthrie changed the title Full coverage of compression activity wal markers Rework compression activity wal markers May 27, 2024
Comment on lines +6 to +9
#include "guc.h"
#include <replication/message.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "guc.h"
#include <replication/message.h>
#pragma once
#include <postgres.h>
#include <replication/message.h>
#include "guc.h"

All headers should have the #pragma once to make sure it will be included only once. Also all headers should always include "postgres.h" as the first header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Adds WAL markers around all compression and decompression activities.
- Renames the GUC controlling this behaviour.
- Enables the WAL marker GUC by default.

This allows to distinguish between "user-driven" and
"compression-driven" DML on uncompressed chunks. This is a requirement
to be able to support DML on compressed chunks in live migration.

Note: A previous commit [1] added wal markers before and after inserts
which were part of "transparent decompression". Transparent
decompression is triggered when an UPDATE or DELETE statement affects
compressed data, or an INSERT statment inserts into a range of
compressed data which has a unique or primary key constraint. In these
cases, the data is first moved from the compressed chunk to the
uncompressed chunk, and then the DML is applied.

This change extends the existing behaviour on two fronts:

1. It adds WAL markers for both chunk compression and decompression
   events.
2. It extends the WAL markers for transparent decompression to include
   not only INSERTs into the compressed chunk, but also to TimescaleDB
   catalog operations which were part of the decompression.

[1]: b5b46a3
@JamesGuthrie JamesGuthrie force-pushed the jg/fun-with-compression branch from 51edab2 to cbad494 Compare May 28, 2024 09:00
@fabriziomello fabriziomello merged commit 3066605 into main May 28, 2024
@fabriziomello fabriziomello deleted the jg/fun-with-compression branch May 28, 2024 22:35
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request May 29, 2024
In timescale#6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
fabriziomello added a commit that referenced this pull request May 29, 2024
In #6920 was introduced the dependency of Postgres contrib extension
`test_decoding` for TAP tests but we forgot to include it in the
sanitizer tests.
@pallavisontakke pallavisontakke added this to the TimescaleDB 2.16.0 milestone Jul 12, 2024
pallavisontakke added a commit to pallavisontakke/timescaledb that referenced this pull request Jul 18, 2024
This release contains performance improvements and bug fixes since
the 2.15.3 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#6880: Add support for the array operators used for compressed DML batch filtering.
* timescale#6895: Improve the compressed DML expression pushdown.
* timescale#6897: Add support for replica identity on compressed hypertables.
* timescale#6918: Remove support for PG13.
* timescale#6920: Rework compression activity wal markers.
* timescale#6989: Add support for foreign keys when converting plain tables to hypertables.
* timescale#7020: Add support for the chunk column statistics tracking.
* timescale#7048: Add an index scan for INSERT DML decompression.
* timescale#7075: Reduce decompression on the compressed INSERT.
* timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE.
* timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints

**Bugfixes**
* timescale#7018: Fix `search_path` quoting in the compression defaults function.
* timescale#7046: Prevent locking for compressed tuples.
* timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`.
* timescale#7064: Fix the bug in the default `order by` calculation in compression.
* timescale#7069: Fix the index column name usage.
* timescale#7074: Fix the bug in the default `segment by` calculation in compression.

**Thanks**
@pallavisontakke pallavisontakke mentioned this pull request Jul 18, 2024
pallavisontakke added a commit to pallavisontakke/timescaledb that referenced this pull request Jul 25, 2024
This release contains performance improvements and bug fixes since
the 2.15.3 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#6880: Add support for the array operators used for compressed DML batch filtering.
* timescale#6895: Improve the compressed DML expression pushdown.
* timescale#6897: Add support for replica identity on compressed hypertables.
* timescale#6918: Remove support for PG13.
* timescale#6920: Rework compression activity wal markers.
* timescale#6989: Add support for foreign keys when converting plain tables to hypertables.
* timescale#7020: Add support for the chunk column statistics tracking.
* timescale#7048: Add an index scan for INSERT DML decompression.
* timescale#7075: Reduce decompression on the compressed INSERT.
* timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE.
* timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints
* timescale#7116 Use DELETE instead of TRUNCATE after compression
* timescale#7134 Refactor foreign key handling for compressed hypertables

**Bugfixes**
* timescale#7018: Fix `search_path` quoting in the compression defaults function.
* timescale#7046: Prevent locking for compressed tuples.
* timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`.
* timescale#7064: Fix the bug in the default `order by` calculation in compression.
* timescale#7069: Fix the index column name usage.
* timescale#7074: Fix the bug in the default `segment by` calculation in compression.

**Thanks**
@pallavisontakke pallavisontakke mentioned this pull request Jul 25, 2024
pallavisontakke added a commit to pallavisontakke/timescaledb that referenced this pull request Jul 31, 2024
This release contains performance improvements and bug fixes since
the 2.15.3 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#6880: Add support for the array operators used for compressed DML batch filtering.
* timescale#6895: Improve the compressed DML expression pushdown.
* timescale#6897: Add support for replica identity on compressed hypertables.
* timescale#6918: Remove support for PG13.
* timescale#6920: Rework compression activity wal markers.
* timescale#6989: Add support for foreign keys when converting plain tables to hypertables.
* timescale#7020: Add support for the chunk column statistics tracking.
* timescale#7048: Add an index scan for INSERT DML decompression.
* timescale#7075: Reduce decompression on the compressed INSERT.
* timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE.
* timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints
* timescale#7116 Use DELETE instead of TRUNCATE after compression
* timescale#7134 Refactor foreign key handling for compressed hypertables
* timescale#7161 Fix `mergejoin input data is out of order`

**Bugfixes**
* timescale#6987 Fix REASSIGN OWNED BY for background jobs
* timescale#7018: Fix `search_path` quoting in the compression defaults function.
* timescale#7046: Prevent locking for compressed tuples.
* timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`.
* timescale#7064: Fix the bug in the default `order by` calculation in compression.
* timescale#7069: Fix the index column name usage.
* timescale#7074: Fix the bug in the default `segment by` calculation in compression.

**Thanks**
* @jledentu For reporting a problem with mergejoin input order
@pallavisontakke pallavisontakke mentioned this pull request Jul 31, 2024
svenklemm added a commit that referenced this pull request Jul 31, 2024
This release contains significant performance improvements when working with compressed data, extended join
support in continuous aggregates, and the ability to define foreign keys from regular tables towards hypertables.
We recommend that you upgrade at the next available opportunity.

In TimescaleDB v2.16.0 we:

* Introduce multiple performance focused optimizations for data manipulation operations (DML) over compressed chunks.

  Improved upsert performance by more than 100x in some cases and more than 1000x in some update/delete scenarios.

* Add the ability to define chunk skipping indexes on non-partitioning columns of compressed hypertables

  TimescaleDB v2.16.0 extends chunk exclusion to use those skipping (sparse) indexes when queries filter on the relevant columns,
  and prune chunks that do not include any relevant data for calculating the query response.

* Offer new options for use cases that require foreign keys defined.

  You can now add foreign keys from regular tables towards hypertables. We have also removed
  some really annoying locks in the reverse direction that blocked access to referenced tables
  while compression was running.

* Extend Continuous Aggregates to support more types of analytical queries.

  More types of joins are supported, additional equality operators on join clauses, and
  support for joins between multiple regular tables.

**Highlighted features in this release**

* Improved query performance through chunk exclusion on compressed hypertables.

  You can now define chunk skipping indexes on compressed chunks for any column with one of the following
  integer data types: `smallint`, `int`, `bigint`, `serial`, `bigserial`, `date`, `timestamp`, `timestamptz`.

  After you call `enable_chunk_skipping` on a column, TimescaleDB tracks the min and max values for
  that column. TimescaleDB uses that information to exclude chunks for queries that filter on that
  column, and would not find any data in those chunks.

* Improved upsert performance on compressed hypertables.

  By using index scans to verify constraints during inserts on compressed chunks, TimescaleDB speeds
  up some ON CONFLICT clauses by more than 100x.

* Improved performance of updates, deletes, and inserts on compressed hypertables.

  By filtering data while accessing the compressed data and before decompressing, TimescaleDB has
  improved performance for updates and deletes on all types of compressed chunks, as well as inserts
  into compressed chunks with unique constraints.

  By signaling constraint violations without decompressing, or decompressing only when matching
  records are found in the case of updates, deletes and upserts, TimescaleDB v2.16.0 speeds
  up those operations more than 1000x in some update/delete scenarios, and 10x for upserts.

* You can add foreign keys from regular tables to hypertables, with support for all types of cascading options.
  This is useful for hypertables that partition using sequential IDs, and need to reference those IDs from other tables.

* Lower locking requirements during compression for hypertables with foreign keys

  Advanced foreign key handling removes the need for locking referenced tables when new chunks are compressed.
  DML is no longer blocked on referenced tables while compression runs on a hypertable.

* Improved support for queries on Continuous Aggregates

  `INNER/LEFT` and `LATERAL` joins are now supported. Plus, you can now join with multiple regular tables,
  and you can have more than one equality operator on join clauses.

**PostgreSQL 13 support removal announcement**

Following the deprecation announcement for PostgreSQL 13 in TimescaleDB v2.13,
PostgreSQL 13 is no longer supported in TimescaleDB v2.16.

The Currently supported PostgreSQL major versions are 14, 15 and 16.

**Features**
* #6880: Add support for the array operators used for compressed DML batch filtering.
* #6895: Improve the compressed DML expression pushdown.
* #6897: Add support for replica identity on compressed hypertables.
* #6918: Remove support for PG13.
* #6920: Rework compression activity wal markers.
* #6989: Add support for foreign keys when converting plain tables to hypertables.
* #7020: Add support for the chunk column statistics tracking.
* #7048: Add an index scan for INSERT DML decompression.
* #7075: Reduce decompression on the compressed INSERT.
* #7101: Reduce decompressions for the compressed UPDATE/DELETE.
* #7108 Reduce decompressions for INSERTs with UNIQUE constraints
* #7116 Use DELETE instead of TRUNCATE after compression
* #7134 Refactor foreign key handling for compressed hypertables
* #7161 Fix `mergejoin input data is out of order`

**Bugfixes**
* #6987 Fix REASSIGN OWNED BY for background jobs
* #7018: Fix `search_path` quoting in the compression defaults function.
* #7046: Prevent locking for compressed tuples.
* #7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`.
* #7064: Fix the bug in the default `order by` calculation in compression.
* #7069: Fix the index column name usage.
* #7074: Fix the bug in the default `segment by` calculation in compression.

**Thanks**
* @jledentu For reporting a problem with mergejoin input order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants