Skip to content

sql: fix sql compaction job full scan#108947

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
j82w:108936
Aug 18, 2023
Merged

sql: fix sql compaction job full scan#108947
craig[bot] merged 1 commit intocockroachdb:masterfrom
j82w:108936

Conversation

@j82w
Copy link
Copy Markdown
Contributor

@j82w j82w commented Aug 17, 2023

The sql-stats-compaction is failing with TransactionRetryError. This is caused by the internal executor uses the zero-values for the settings, rather than the cluster defaults. This causes SET reorder_joins_limit = 0; which then causes the sql-stats-compaction delete statement to do a full scan. The full scan is causing the query to take a long time causing other queries to conflict with it.

Error:
TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: committed value on key

The fix to use the correct default value instead of 0 is made in #101486.

Solution is to change the query to avoid the join and thus the full scan.

Fixes: #108936

Release note (sql change): Optimized the sql-stats-compaction job delete query to avoid full scan. This helps avoid the TransactionRetryError which can cause the job to fail.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 17, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

The sql-stats-compaction is failing with TransactionRetryError. This
is caused by the internal executor uses the zero-values for the
settings, rather than the cluster defaults. This causes `SET
reorder_joins_limit = 0;` which then causes the `sql-stats-compaction`
delete statement to do a full scan. The full scan is causing the query
to take a long time causing other queries to conflict with it.

Error:
`TransactionRetryWithProtoRefreshError: TransactionRetryError: retry
txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict:
committed value on key`

The fix to use the correct default value instead of 0 is made in
#101486.

Solution is to change the query to avoid the join and thus the full
scan.

Fixes: #108936

Release note (sql change): Optimized the sql-stats-compaction job delete
query to avoid full scan. This helps avoid the TransactionRetryError
which can cause the job to fail.
@j82w j82w marked this pull request as ready for review August 18, 2023 00:09
@j82w j82w requested a review from a team August 18, 2023 00:09
@j82w j82w requested a review from a team as a code owner August 18, 2023 00:09
@j82w j82w requested review from a team and yuzefovich and removed request for a team August 18, 2023 00:09
Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @yuzefovich)

@kevin-v-ngo
Copy link
Copy Markdown

Hi @j82w and @DrewKimball , out of curiosity, is this what's happening?

  1. DELETE performs a full scan
  2. DELETE has it's timestamp pushed due to the timestamp cache
  3. Another write A modifies a row that the DELETE had read (during the full scan)
  4. DELETE fails with 40001

If so, do we know who is write A?

@DrewKimball
Copy link
Copy Markdown
Collaborator

Hi @j82w and @DrewKimball , out of curiosity, is this what's happening?

Swap 3 and 4, and yes. The write should be new statement statistics which are avoided by the constrained scan, but not by the full scan.

@maryliag maryliag added backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only backport-23.1.9-rc labels Aug 18, 2023
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @j82w)


pkg/sql/sqlstats/persistedsqlstats/compaction_exec.go line 287 at r2 (raw file):

		unconstrainedDeleteStmt: `
      DELETE FROM system.statement_statistics
      WHERE (aggregated_ts, fingerprint_id, transaction_fingerprint_id, plan_hash, app_name, node_id) IN (

Any idea why we had this self-join in the first place? On a quick glance the queries are equivalent.

@j82w
Copy link
Copy Markdown
Contributor Author

j82w commented Aug 18, 2023

pkg/sql/sqlstats/persistedsqlstats/compaction_exec.go line 287 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Any idea why we had this self-join in the first place? On a quick glance the queries are equivalent.

Not sure. It was added in the following PR, but I don't see a reason for it.
#80341

@j82w
Copy link
Copy Markdown
Contributor Author

j82w commented Aug 18, 2023

@DrewKimball can a SELECT that reads a row while the DELETE transaction is in progress also cause the DELETE to fail with 40001 or is it only if the row is modified?

@j82w
Copy link
Copy Markdown
Contributor Author

j82w commented Aug 18, 2023

pkg/sql/sqlstats/persistedsqlstats/compaction_exec.go line 287 at r2 (raw file):

Previously, j82w (Jake) wrote…

Not sure. It was added in the following PR, but I don't see a reason for it.
#80341

The queries should be identical. We are just avoiding the overhead of the join that is causing a scan.

@j82w
Copy link
Copy Markdown
Contributor Author

j82w commented Aug 18, 2023

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 18, 2023

Build succeeded:

@craig craig bot merged commit efa3cfa into cockroachdb:master Aug 18, 2023
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 18, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 91f452f to blathers/backport-release-23.1-108947: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 91f452f to blathers/backport-release-23.1.9-rc-108947: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.9-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@DrewKimball
Copy link
Copy Markdown
Collaborator

@DrewKimball can a SELECT that reads a row while the DELETE transaction is in progress also cause the DELETE to fail with 40001 or is it only if the row is modified?

There has to be a concurrent update.

@j82w
Copy link
Copy Markdown
Contributor Author

j82w commented Aug 18, 2023

blathers backport 22.2

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 18, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 91f452f to blathers/backport-release-22.2-108947: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql-stats-compaction: failing with TransactionRetryError

6 participants