Fix flaky test case for absolute TTL replication #9069

ny0312 · 2021-06-10T21:54:25Z

Fix flaky test case reported here: #8474

The root cause is that one test (5 keys in, 5 keys out) is leaking a volatile key that can expire while another later test(All TTL in commands are propagated as absolute timestamp in replication stream) is running. Such leaked expiration injects an unexpected DEL command into the replication command during the later test, causing it to fail.

The fixes are two fold:

Plug the leak in the first test.
Add FLUSHALL to the later test, to avoid future interference from other tests.

madolson · 2021-06-10T23:25:38Z

@ny0312 I think it would be okay to merge it now so we can start watching the nightly runs, It's Oran's weekend now.

eliblight

Nice

oranagra

Good catch!!!
isn't it sufficient to just add a FLUSHALL to that test before attaching to the replication stream.
Ie. Instead of this massive LOC change?

ny0312 · 2021-06-11T18:51:18Z

Good catch!!!
isn't it sufficient to just add a FLUSHALL to that test before attaching to the replication stream.
Ie. Instead of this massive LOC change?

That would also be sufficient. It depends on which convention we want to establish when it comes to isolating a test case from all side effects of other test cases:

Use a dedicated server, or
Do a "clean up" on the same server. Such "clean up" may include FLUSHALL and other things, for example, reverting/resetting configurations that have been set in earlier tests.

Personally I would prefer (1). It is slower. But it is explicit, simple and clean. It frees an author of a new test from having to understand the side effects of all preceding tests in order to properly "clean up" their side effects. I would encourage all contributors to follow this convention.

oranagra · 2021-06-11T19:27:52Z

Spinning up a dedicated server for each test would slow down the test suite a lot.
We obviously use that on tests that cause crashes, or tests that are really very fragile, where any slight change can break them.
On other tests it is sufficient to do FLUSHALL.
There's also another concern which is the configuration, here we usually prefer that tests restore the original configuration of configs they changed.

ny0312 · 2021-06-11T20:25:10Z

Spinning up a dedicated server for each test would slow down the test suite a lot.
We obviously use that on tests that cause crashes, or tests that are really very fragile, where any slight change can break them.
On other tests it is sufficient to do FLUSHALL.
There's also another concern which is the configuration, here we usually prefer that tests restore the original configuration of configs they changed.

So your preferred convention is "re-use servers across test cases unless it is impossible or too hard to make work".

I agree that this approach would give us the fastest tests therefore shortest feedback cycle. It would more friendly to contributors of Redis, which is very important.

I will change to FLUSHALL if no other objections.

ny0312 · 2021-06-13T03:42:42Z

Updated. Please review. @madolson @oranagra

The root cause is that one test (`5 keys in, 5 keys out`) is leaking a volatile key that can expire while another later test(`All TTL in commands are propagated as absolute timestamp in replication stream`) is running. Such leaked expiration injects an unexpected `DEL` command into the replication command during the later test, causing it to fail. The fixes are two fold: 1. Plug the leak in the first test. 2. Add FLUSHALL to the later test, to avoid future interference from other tests.

ny0312 mentioned this pull request Jun 10, 2021

Always replicate TTLs as absolute timestamps in milliseconds #8474

Merged

madolson previously approved these changes Jun 10, 2021

View reviewed changes

eliblight previously approved these changes Jun 11, 2021

View reviewed changes

oranagra reviewed Jun 11, 2021

View reviewed changes

Fix flaky test case for absolute TTL replication

efe45cf

ny0312 dismissed stale reviews from eliblight and madolson via efe45cf June 13, 2021 03:34

ny0312 force-pushed the fix-absolute-ttl-test branch from e751c0f to efe45cf Compare June 13, 2021 03:34

oranagra approved these changes Jun 13, 2021

View reviewed changes

oranagra merged commit fb140a1 into redis:unstable Jun 13, 2021

ny0312 deleted the fix-absolute-ttl-test branch July 2, 2021 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky test case for absolute TTL replication #9069

Fix flaky test case for absolute TTL replication #9069

Uh oh!

ny0312 commented Jun 10, 2021 •

edited by oranagra

Loading

Uh oh!

madolson commented Jun 10, 2021

Uh oh!

eliblight left a comment

Uh oh!

oranagra left a comment

Uh oh!

ny0312 commented Jun 11, 2021 •

edited

Loading

Uh oh!

oranagra commented Jun 11, 2021

Uh oh!

ny0312 commented Jun 11, 2021

Uh oh!

ny0312 commented Jun 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix flaky test case for absolute TTL replication #9069

Fix flaky test case for absolute TTL replication #9069

Uh oh!

Conversation

ny0312 commented Jun 10, 2021 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Jun 10, 2021

Uh oh!

eliblight left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

ny0312 commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jun 11, 2021

Uh oh!

ny0312 commented Jun 11, 2021

Uh oh!

ny0312 commented Jun 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ny0312 commented Jun 10, 2021 •

edited by oranagra

Loading

ny0312 commented Jun 11, 2021 •

edited

Loading