Skip to content

Alert API Values as Strings#5303

Closed
dtrejod wants to merge 4 commits intoprometheus:masterfrom
dtrejod:dt/nan-inf-alert-api
Closed

Alert API Values as Strings#5303
dtrejod wants to merge 4 commits intoprometheus:masterfrom
dtrejod:dt/nan-inf-alert-api

Conversation

@dtrejod
Copy link
Contributor

@dtrejod dtrejod commented Mar 4, 2019

This patch changes the api/v1/alert endpoint to return string alert values mimicking the api/v1/query return values. The change ensures null, and +-Inf values can be properly parsed.

We are using the client_golang library in our applications to interact with the api/v1/alert endpoint. We've recently hit an issue where the library failed to parse the return values from a +Inf alert value. We traced the problem back to the alert api not using string return values. We noticed NaN values are also affected.

Problem
Create an alert with a +Inf return value; for example count(up)/0 != 0. When curling the api endpoint you can see the returned value is as follows:

$ curl -s http://$(hostname)/prometheus/api/v1/rules |jq '.data.groups[] |select(.name == "breaking")'
{
  "name": "breaking",
  "file": "rules.yml",
  "rules": [
    {
      "name": "InfRule",
      "query": "count(up) / 0 != 0",
      "duration": 30,
      "labels": {
        "severy": "p2"
      },
      "annotations": {
        "decription": " This rule should break ",
        "summary": " Breaking +INF rule "
      },
      "alerts": [
        {
          "labels": {
            "alertname": "InfRule",
            "severy": "p2"
          },
          "annotations": {
            "decription": " This rule should break ",
            "summary": " Breaking +INF rule "
          },
          "state": "firing",
          "activeAt": "2019-02-26T12:43:21.527666941-05:00",
          "value": 1.7976931348623157e+308
        }
      ],
      "health": "ok",
      "type": "alerting"
    }
  ],
  "interval": 60
}

Running the same query against the query api results in:

$ curl -s  -g 'http://$(hostname)/prometheus/api/v1/query?query=count(up)/0'|jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1551461908.309,
          "+Inf"
        ]
      }
    ]
  }
}

The resulting error when parsing using the client_golang library is:

bad_response: invalid character '+' looking for beginning of value

Solution
This patch converts the return values for the alert api to strings. Running the same query now results with:

{                                                           
  "name": "breaking",    
  "file": "rules.yml",
  "rules": [
    {                
      "name": "InfRule",
      "query": "count(up) / 0 != 0",
      "duration": 30,
      "labels": {       
        "severity": "p2"                          
      },             
      "annotations": {
        "decription": " This rule should break {{ $labels.value }}",
        "summary": " Breaking +INF rule "
      },              
      "alerts": [                                                  
        {                               
          "labels": {
            "alertname": "InfRule",
            "severity": "p2"
          },         
          "annotations": {         
            "decription": " This rule should break ",
            "summary": " Breaking +INF rule "
          },              
          "state": "firing",                         
          "activeAt": "2019-03-04T09:39:21.527666941-05:00",
          "value": "+Inf"
        }                   
      ],                                                    
      "health": "ok",   
      "type": "alerting"
    },  
    {                
      "name": "NanRule",
      "query": "(count(up) - count(up)) / 0 != 0",
      "duration": 30,
      "labels": {
        "severity": "p2"
      },
      "annotations": {                                             
        "decription": " This rule should break {{ $labels.value}}",
        "summary": " Breaking Nan rule "
      },                           
      "alerts": [           
        {            
          "labels": {              
            "alertname": "NanRule",                  
            "severity": "p2"                 
          },              
          "annotations": {                           
            "decription": " This rule should break ",       
            "summary": " Breaking Nan rule "
          },                
          "state": "firing",                                
          "activeAt": "2019-03-04T09:39:21.527666941-05:00",
          "value": "NaN"
        }
      ],             
      "health": "ok",   
      "type": "alerting"                          
    }                
  ],             
  "interval": 60        
}               

Prometheus Version: 2.6.0

…"INF" and "Nan" special values. Mimics the return values used in api/v1/query

Signed-off-by: Devin Trejo <dtrejo@palantir.com>
@dtrejod dtrejod force-pushed the dt/nan-inf-alert-api branch from 98e97f7 to 86ce89f Compare March 4, 2019 20:56
@simonpasquier
Copy link
Member

Thanks for spotting this! Tracking it down, the culprit is the json-iterator library which doesn't produce 100% compliant data. Another option would be to encode such values as null but I like your proposal better as it is consistent with the query API endpoints.

In general the v1 API is considered stable but we have a disclaimer in the documentation for rules and alerts.

cc @mxinden @brian-brazil

@brian-brazil
Copy link
Contributor

What does the standard Go JSON library do with these?

@simonpasquier
Copy link
Member

simonpasquier commented Mar 8, 2019

It returns an error: json: unsupported value: +Inf

@brian-brazil
Copy link
Contributor

Okay, using the same approach we do for the query apis seems best then. I think the code can be cleaner, we should do whatever we do in the query api.

@dtrejod
Copy link
Contributor Author

dtrejod commented Mar 8, 2019

Willing to rework the PR if need be. It looks like the other Marshalling functions used by the query api are under https://github.com/prometheus/prometheus/blob/master/promql/value.go

Would adding a reusable Value definition there be cleaner?

@brian-brazil
Copy link
Contributor

A Scalar should be enough here.

@dtrejod dtrejod force-pushed the dt/nan-inf-alert-api branch from 91dfb68 to 5422131 Compare March 11, 2019 16:04
dtrejod added 3 commits March 11, 2019 12:17
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
@dtrejod dtrejod force-pushed the dt/nan-inf-alert-api branch from f81c3ec to 1042ec9 Compare March 11, 2019 16:17
@dtrejod
Copy link
Contributor Author

dtrejod commented Mar 11, 2019

Latest commit uses Scalars. Apologies for the mess of commits as I pushed the wrong branch originally.

Let me know if all looks good.

},
"state": "firing",
"value": 1
"value": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not right, it should be the value as a string - not a list with a timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the Scalar type as suggested which is a timestamp + value. If we are okay with modifying the API to return a different value type I think this approach is the best solution as it mimics what the query API currently returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're technically allowed to make breaking changes here, however if we wanted the timestamp I think we'd put it in a different field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep is the same as the query API?

$ curl -s  -g 'http://$(hostname)/prometheus/api/v1/query?query=count(up)/0'|jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1551461908.309,
          "+Inf"
        ]
      }
    ]
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd break existing users unnecessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I understand. I thought we were moving forward with breaking the API given the clause.

Also, sorry I've been busy with other work. Hopefully I can revisit sometime next week. If someone wants to implement this before me, feel free.

@simonpasquier
Copy link
Member

@dtrejod are you willing to carry on the PR?

@brian-brazil
Copy link
Contributor

Superseded by #5582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants