Skip to content

Timeout error on query if timeout=1 in a distributed deployment #5463

@mautini

Description

@mautini

Current Behavior

When querying the Qdrant Recommendation API, if the collection is distributed (across multiple shards) and the timeout is set to 1 second, Qdrant may return a timeout error during request distribution, even if the request execution time is less than 1 second. Without a timeout specified, or with timeout=2, the request succeeds.

Steps to Reproduce

  1. Create a Qdrant cluster with multiple nodes using the official Helm Chart.
  2. Create a collection with multiple shards.
  3. Execute a simple recommend query like the one below (with timeout=1):
POST /collections/<collection_name>/points/query?timeout=1
{
  "query": {
    "recommend": {
      "positive": [
        "c026bde1-8399-488d-954a-c777a42ab74a"
      ]
    }
  },
  "with_payload": false,
  "with_vector": false
}
  1. Qdrant output the following error
{
  "error": "Service internal error: 2 of 2 read operations failed:\n  Timeout error: Deadline Exceeded: status: DeadlineExceeded, message: \"Timeout: Timeout error: Operation 'Search' timed out after 0 seconds\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Mon, 18 Nov 2024 17:28:13 GMT\", \"content-length\": \"0\"} }\n  Timeout error: Deadline Exceeded: status: DeadlineExceeded, message: \"Timeout: Timeout error: Operation 'Search' timed out after 0 seconds\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Mon, 18 Nov 2024 17:28:13 GMT\", \"content-length\": \"0\"} }"
}

Expected Behavior

The request should return a result without any errors. (Running the same request without a timeout set returns a response in less than 1 second.)

Possible Solution

The issue seems to stem from the timeout settings between nodes (shards), as indicated by the message timed out after 0 seconds. This might be due to subtracting 1 (an integer) instead of a float (less than 1). The issue could be related to the following commits:

Context (Environment)

  • Deployment: Qdrant deployed in a private Kubernetes cluster using the official Helm Chart.
  • Affected Versions: At least Qdrant 1.11.5 and 1.12.3. It worked fine with 1.10.1
  • Reproducibility: The issue is not related to any specific client, as it can be reproduced using Python or Curl.
  • Scope: The issue occurs only with distributed clusters (multiple shards).

cc @evelynegroen

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions