Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Feb 12, 2022

Added regression tests for #10020 / #10081 / #10243.
The above PRs fixed some crashes due to an asserting,
see function clientHasPendingReplies (introduced in #9166).

This commit added some tests to cover the above scenario.
These tests will all fail in #9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover #8868 (verify the logs).

Other changes:

  1. Reduces the wait time in waitForBgsave and waitForBgrewriteaof
    from 1s to 50ms, which should reduce the time for some tests.
  2. Improve the test infra to print context when assert_match fails.
  3. Improve the test infra to print $error when assert_error fails.
Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test)

Added regression tests for redis#10020 / redis#10081 / redis#10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in redis#9166).

This commit added some tests to cover the above scenario.
These tests will all fail in redis#9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover redis#8868 (verify the logs).

Other changes: reduces the wait time in `waitForBgsave` and
`waitForBgrewriteaof` from 1s to 50ms, which should reduce
the time for some tests.
1. sometimes assert_error will fails, use catch instead, more logs.
2. improve the test infra to print $error when assert_error fails.
```
Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test)
```
@enjoy-binbin
Copy link
Contributor Author

Tested locally(centos + sanitizer + valgrind) many times (100+) all passed.
github action full ci: https://github.com/enjoy-binbin/redis/actions/runs/1832482372 (Several attempts all passed, failed tests are not these)
so I guess there should be no sort of timing issues
those cleanups is IDE work, (the space doesn't break the blame log)

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure there's a high value in adding regression tests for these bugs, but i suppose there's no harm, as long as:

  1. the tests make it clear what they come to check.
  2. they're relatively quick (take about a second or two, combined).

so bottom line, i guess we'll merge it.
thanks.

@oranagra oranagra merged commit 62c8be2 into redis:unstable Feb 13, 2022
@enjoy-binbin enjoy-binbin deleted the regression_test_for_sync_psync_crash branch February 13, 2022 07:53
assert_error {ERR value is not an integer or out of range} {r psync replicationid offset_str}
set logs [exec tail -n 100 < [srv 0 stdout]]
assert_match {*Replica * asks for synchronization but with a wrong offset} $logs
assert_equal "PONG" [r ping]
Copy link
Contributor Author

@enjoy-binbin enjoy-binbin Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that I have seen it fail once or twice (fails here, r ping):
today daily: https://github.com/redis/redis/runs/7511033997?check_suite_focus=true#step:7:4183

[exception]: Executing test client: MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error..
MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
    while executing
"[srv $level "client"] {*}$args"
    (procedure "r" line 7)
    invoked from within
"r ping"

i see PING will also check writeCommandsDeniedByDiskError, so i should remove the [r ping] or do a SAVE or ignore stop-writes-on-bgsave-error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the purpose for which we added PING there? is it to check that the server is alive? maybe change it to INFO?
can you explain why it'll sometimes fail and sometimes pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it to check that the server is alive?

yes, change it to INFO can do the work, i will update it.

can you explain why it'll sometimes fail and sometimes pass?

i suppose the reason is: in the tests before this test, bgsave was triggered,
but in order to end the test faster, i do a kill -9, so lastbgsave_status became C_ERR.
and then in this test, PING got rejected because writeCommandsDeniedByDiskError

    /* Don't accept write commands if there are problems persisting on disk
     * unless coming from our master, in which case check the replica ignore
     * disk write error config to either log or crash. */
    int deny_write_type = writeCommandsDeniedByDiskError();
    if (deny_write_type != DISK_ERROR_TYPE_NONE &&
        (is_write_command || c->cmd->proc == pingCommand))
    {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, you mean that the kill is sometimes successful and sometimes already too late.
i.e. it's just there for fast termination..
ok, so maybe it's better to normalize the state and trigger a successful SAVE when that test terminates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, do it in #11043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants