Skip to content

admission: investigate TPC-E online index creation problem #85641

@sumeerbhola

Description

@sumeerbhola

Motivation in experiments run by @nvanbenschoten
in https://cockroachlabs.slack.com/archives/C038JEXC5AT/p1658247509643359?thread_ts=1657630075.576439&cid=C038JEXC5AT and https://docs.google.com/document/d/1wzkBXaA3Ap_daMV1oY1AhQqlnAjO3pIVLZTXY53m0Xk/edit#heading=h.fhho371lyula
@nvanbenschoten has packaged this in a roachtest #85002

Summary: AC was able to keep read amplification in check but foreground throughput and latency were severely affected. Our assumption is that the rate at which foreground work is being admitted has dropped.

  • We expect some improvement based on pending PRs for better byte token estimation, and tracking follower writes that bypass admission control. We should wait for them to merge.
  • To diagnose this we need to add labeled admission latency and admission rate metrics with a label per priority admission: additional observability #82743
  • We may still see an isolation failure due to follower writes bypassing admission control, so we should set admission.kv.pause_replication_io_threshold = 0.8.

@irfansharif @nvanbenschoten

Jira issue: CRDB-18354

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions