Dial timeout error is not redirectable

Issue tracker is used for reporting bugs and discussing new features. Please use
[stackoverflow](https://stackoverflow.com) for supporting issues.

The MaxRedirects mechanism does not work if a dial tcp: i/o timeout error is received

## Expected Behavior

When using ClusterClient, it seems to me that a strange behavior occurs. When accessing a server that is unavailable, I get the error dial tcp: I/o timeout. I expect the library to use the MaxRedirects mechanism, repeat the request, and then, in case of an error, redirect the request to another slave.

## Current Behavior

However, now the shouldRetry function returns false in case of a dial tcp: i/o timeout error and the request is not repeated.

The possible problem is here
```go
// error.go
func isTimeoutError(err error) (isTimeout bool, hasTimeoutFlag bool) {
	// Check for timeoutError interface (works with wrapped errors)
	var te timeoutError
	if errors.As(err, &te) {
		return true, te.Timeout()
	}

	// Check for net.Error specifically (common case for network timeouts)
	var netErr net.Error
	if errors.As(err, &netErr) {
		return true, netErr.Timeout()
	}

	return false, false
}

func shouldRetry(err error, retryTimeout bool) bool {
	...

	// Check for timeout errors (works with wrapped errors)
	if isTimeout, hasTimeoutFlag := isTimeoutError(err); isTimeout {
		if hasTimeoutFlag {
			return retryTimeout
		}
		return true
	}

//osscluster.go
func (c *ClusterClient) process(ctx context.Context, cmd Cmder) error {
...
if shouldRetry(lastErr, cmd.readTimeout() == nil) && !cmd.NoRetry() {
			// First retry the same node.
			if attempt == 0 {
				continue
			}

			// Second try another node.
			node.MarkAsFailing()
			node = nil
			continue
		}
...
}
```

So when you have readTimeout you always will get false on dial tcp errors and request will not be repeated because this function from net package will always return true
```go
func (e *OpError) Timeout() bool {
	if ne, ok := e.Err.(*os.SyscallError); ok {
		t, ok := ne.Err.(timeout)
		return ok && t.Timeout()
	}
	t, ok := e.Err.(timeout)
	return ok && t.Timeout()
}
```

## Possible Solution

Add an additional check for the dial tcp: i/o timeout error to the shouldRetry method and repeat it on one and then on the other node.

As a fix I can

```go
// errors.go

// isDialError reports whether err is a connection-establishment error,
// meaning the TCP dial failed before any data was sent to the server.
func isDialError(err error) bool {
	var opErr *net.OpError
	return errors.As(err, &opErr) && opErr.Op == "dial"
}

func shouldRetry(err error, retryTimeout bool) bool {
	...

	// Check for timeout errors (works with wrapped errors)
	if isTimeout, hasTimeoutFlag := isTimeoutError(err); isTimeout {
		if hasTimeoutFlag {
			// A dial error means the TCP connection was never established and the
			// command was never sent to the server, so retry is always safe
			if isDialError(err) {
				return true
			}
			return retryTimeout
		}
		return true
        ...
}
```

## Steps to Reproduce

Use a redis cluster, make key requests from certain slaves, and disable one of them so that requests start falling by error dial tcp: i/o timeout

## Context (Environment)

In the case of updating redis machines or in the case when one slave crashes for some reason, I expect that the MaxRedirects mechanism will turn on, requests will be repeated first to the same slave, then to another (healthy) slave. But in the case of a dial tcp: i/o timeout error, this does not happen and the code has to go to the database, thereby increasing the load on it and reducing the overall response time. Although the dial tcp: i/o timeout error seems completely safe to repeat.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dial timeout error is not redirectable #3786

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dial timeout error is not redirectable #3786

Description

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions