Skip to content

RW - wrong status code for storage.ErrTooOldSample (out of order time window feature) #13334

@nmdanny

Description

@nmdanny

What did you do?

The remote write spec specifies that a RW sender must perform retries when encountering a 5xx status code

When using the experimental out of order window feature, and attempting to ingest a sample older than the window,
I receive 500 Internal Server Error: too old sample

Assuming there's no server mis-configuration or time sync issues, this error is truly non-recoverable and thus it makes no sense to retry it. Such scenario is likely when there can be significant lag between the RW sender and the receiver, e.g, when using a message queue.

The following lines (and perhaps others) seem like they should be adjusted to handle storage.ErrTooOldSample as well

https://github.com/prometheus/prometheus/blob/6b8e9453881bffe5fe3aec99a6f3462676a95489/storage/remote/write_handler.go#L71C1-L74C9

What did you expect to see?

No response

What did you see instead? Under which circumstances?

Sending RW requests (from a go service) with old samples, beyond the out of order window, yields a status code of 500 (should probably be 400)
Prometheus 2.45.1

System information

No response

Prometheus version

No response

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions