Intensify "xxx_one_in"'s default value in crash test by hx235 · Pull Request #12127 · facebook/rocksdb

hx235 · 2023-12-07T20:17:36Z

Context/Summary:
My experimental stress runs with more frequent "xxx_one_in" surfaced a couple interesting bugs/issues with RocksDB or crash test framework in the past. We now consider changing the default value so they are run more frequently in production testing environment.

Increase frequency by 2 orders of magnitude for most parameters, except for error-prone features e.g, manual compaction and file ingestion (increased by 3 orders) and expensive features e.g, checksum verification (increased by 1 order)

Test:
Monitor CI to see if it did surface more interesting bugs/issues. If not, we may consider intensify even more.

facebook-github-bot · 2023-12-07T20:18:01Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger

While I do generally believe this biases the bug finding toward places where bugs are most common, I believe it also widens some areas that could go under-stressed. In particular

The high occurrence of IO-heavy or DB mutex-heavy operations reduces stress on the fast write and read paths. With HCC, both write path and read path have lock-free algorithms for which stress test is our best defense against regression bugs.
Consistently high flush rates could mask issues that only show up with long-lived memtables or large SST files, etc.

Perhaps one way you could think about it is that there could be a higher-level random decision about where each run should be on a spectrum between "let the read and write paths flow as freely as possible" and "throw as many wrenches and curveballs into smooth DB operation as possible" and you could derive these other parameters from that one.

In other other words, there is risk in being too consistent with our randomness. So adding some higher-level randomness to the behavioral parameters could reproduce a larger suite of stress conditions. IMHO.

pdillinger

Still overall a better config than before IMHO

hx235 · 2023-12-07T21:46:32Z

In other other words, there is risk in being too consistent with our randomness. So adding some higher-level randomness to the behavioral parameters could reproduce a larger suite of stress conditions. IMHO.

Yes - good idea. Let me do that as a follow up. An immediate follow-up could be lambda: random.choice() between the original value and intensified value as @akankshamahajan15 suggested.

akankshamahajan15 · 2023-12-07T21:50:10Z

Yes - good idea. Let me do that as a follow up. An immediate follow-up could be lambda: random.choice() between 0 and intensified value as @akankshamahajan15 suggested.

I meant lambda: random.choice([current_value, intensified_value]) That way it only selects either the current value or the intensified value.

facebook-github-bot · 2023-12-07T21:56:53Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

hx235 · 2023-12-07T21:57:17Z

Yes - good idea. Let me do that as a follow up. An immediate follow-up could be lambda: random.choice() between 0 and intensified value as @akankshamahajan15 suggested.

I meant lambda: random.choice([current_value, intensified_value]) That way it only selects either the current value or the intensified value.

Yes - Sorry - I mistyped it and meant to say original value. Edited it

facebook-github-bot · 2023-12-07T21:57:40Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger · 2023-12-07T23:40:09Z

I meant lambda: random.choice([current_value, intensified_value]) That way it only selects either the current value or the intensified value.

A good suggestion to reduce loss of coverage on the "let the read and write paths flow as freely as possible" side of the spectrum!

My speculation that there's value in covering even further on that side is... just speculation.

facebook-github-bot · 2023-12-08T18:29:19Z

@hx235 merged this pull request in 179d2c7.

Summary: **Context/Summary:** Continued from #12127, we can randomly reduce the # max key to coerce more operations on the same key. My experimental run shows it surfaced more issue than just #12127. I also randomly reduce the related parameters, write buffer size and target file base, to adapt to randomly lower number of # max key. This creates 4 situations of testing, 3 of which are new: 1. **high** # max key with **high** write buffer size and target file base (existing) 2. **high** # max key with **low** write buffer size and target file base (new, will go through some rehearsal testing to ensure we don't run out of space with many files) 3. **low** # max key with **high** write buffer size and target file base (new, keys will stay in memory longer) 4. **low** # max key with **low** write buffer size and target file base (new, experimental runs show it surfaced even more issues) Pull Request resolved: #12148 Test Plan: - [Ongoing] Rehearsal stress test - Monitor production stress test Reviewed By: jaykorean Differential Revision: D52174980 Pulled By: hx235 fbshipit-source-id: bd5e11280826819ca9314c69bbbf05d481c6d105

hx235 requested a review from ajkr December 7, 2023 20:17

facebook-github-bot added the CLA Signed label Dec 7, 2023

pdillinger reviewed Dec 7, 2023

View reviewed changes

pdillinger approved these changes Dec 7, 2023

View reviewed changes

Optionally intensify some crash test parameter

c20a9cb

hx235 force-pushed the aggresive_stress_test_value branch from b65af52 to c20a9cb Compare December 7, 2023 21:56

facebook-github-bot closed this in 179d2c7 Dec 8, 2023

facebook-github-bot added the Merged label Dec 8, 2023

hx235 mentioned this pull request Dec 14, 2023

Intensify operations on same key in crash test #12148

Closed

hx235 mentioned this pull request Jan 8, 2024

[CI only]Aggressive crash test value #10761

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intensify "xxx_one_in"'s default value in crash test#12127

Intensify "xxx_one_in"'s default value in crash test#12127
hx235 wants to merge 1 commit intofacebook:mainfrom
hx235:aggresive_stress_test_value

hx235 commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

pdillinger left a comment

Uh oh!

pdillinger left a comment

Uh oh!

hx235 commented Dec 7, 2023 •

edited

Loading

Uh oh!

akankshamahajan15 commented Dec 7, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

hx235 commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

pdillinger commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hx235 commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

pdillinger left a comment

Choose a reason for hiding this comment

Uh oh!

pdillinger left a comment

Choose a reason for hiding this comment

Uh oh!

hx235 commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akankshamahajan15 commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

hx235 commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 7, 2023

Uh oh!

pdillinger commented Dec 7, 2023

Uh oh!

facebook-github-bot commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hx235 commented Dec 7, 2023 •

edited

Loading

akankshamahajan15 commented Dec 7, 2023 •

edited

Loading