Skip to content

Add query log#6520

Merged
brian-brazil merged 10 commits intoprometheus:masterfrom
roidelapluie:slowquery
Jan 8, 2020
Merged

Add query log#6520
brian-brazil merged 10 commits intoprometheus:masterfrom
roidelapluie:slowquery

Conversation

@roidelapluie
Copy link
Member

It is set in the configuration file to be changed online.

Fixes #1315

Signed-off-by: Julien Pivotto roidelapluie@inuits.eu

@roidelapluie roidelapluie force-pushed the slowquery branch 4 times, most recently from 6d755d9 to 0983159 Compare December 25, 2019 16:37
@brian-brazil
Copy link
Contributor

Request logs still don't belong in application logs.

@roidelapluie
Copy link
Member Author

roidelapluie commented Dec 26, 2019

As stated before in this project, we try not to reinvent the wheel.

Given that file rotation is usually a solved problem, I think we can:

  • Not do it ourselve
  • Reopen the log file upon every reload (that would align us with other softwares). That makes us compatible with e.g. logrotate.

@roidelapluie roidelapluie force-pushed the slowquery branch 3 times, most recently from 034b277 to 07e99d1 Compare December 26, 2019 22:52
@brian-brazil
Copy link
Contributor

Reopen the log file upon every reload

That seems like a reasonable approach.

@roidelapluie roidelapluie changed the title Add option to log slow queries Add query log Dec 27, 2019
@roidelapluie roidelapluie marked this pull request as ready for review December 27, 2019 21:26
@roidelapluie roidelapluie force-pushed the slowquery branch 2 times, most recently from 1a15436 to 9ff502d Compare December 27, 2019 21:41
@roidelapluie
Copy link
Member Author

  • Updated to remove the "slow" part ; now we will log all queries independently of their duration.
  • Added tests to check that we reopen the files and keep them open in case of reload failure
  • Issue a warning if we can't close previous file (that does not make reload fail)
  • Added comments in code

@roidelapluie
Copy link
Member Author

I have also added the query log format to the list of things considered unstable so we have the ability to make the format evolve in the future.

@roidelapluie
Copy link
Member Author

roidelapluie commented Dec 27, 2019

Further questions @brian-brazil:

Should we add a metric query_log_enabled 1 or 0 ? (could be done as a separated PR)

Should we log as INFO in the application log the slow query log activation / deactivation / filepath change ? (could be done as a separated PR)

@brian-brazil
Copy link
Contributor

Should we log as INFO in the application log the slow query log activation / deactivation / filepath change ?

Sounds more debug to me offhand.

Should we add a metric query_log_enabled 1 or 0 ? (could be done as a separated PR)

Makes sense, this may have performance impact.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the dev summit this should include IP addresses when it's coming from the query API.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Julien Pivotto added 4 commits January 7, 2020 14:17
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@roidelapluie
Copy link
Member Author

roidelapluie commented Jan 7, 2020

Exemple log line:
{"clientIP":"127.0.0.1","end":"2020-01-07T13:54:16.128Z","query":"vector(1)","start":"2020-01-07T13:54:16.128Z","stats":{"timings":{"evalTotalTime":0.000026106,"resultSortTime":0,"queryPreparationTime":0.000012256,"innerEvalTime":0.000007964,"execQueueTime":0.000002085,"execTotalTime":0.000041266}},"step":"0s","ts":"2020-01-07T13:54:16.160Z"}

With error:
{"clientIP":"127.0.0.1","end":"2020-01-07T13:56:19.132Z","error":"found duplicate series for the match group {} on the right hand-side of the operation: [{__name__=\"up\", instance=\"127.0.1.2:9090\", job=\"prometheus\"}, {__name__=\"up\", instance=\"127.0.1.1:9090\", job=\"prometheus\"}];many-to-many matching not allowed: matching labels must be unique on one side","query":"vector(9999) + on () up","start":"2020-01-07T13:56:19.132Z","stats":{"timings":{"evalTotalTime":0.000224988,"resultSortTime":0,"queryPreparationTime":0.000078192,"innerEvalTime":0,"execQueueTime":0.000003185,"execTotalTime":0.000239282}},"step":"0s","ts":"2020-01-07T13:56:19.169Z"}

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@roidelapluie
Copy link
Member Author

Should we use a dedupe logger for the errors? That would imply to add another logger to the engine, specifically for the query log errors.

@brian-brazil
Copy link
Contributor

Dedupe logger loses us line numbers, and disk filling up is fatal to Prometheus unlike remote storage going away.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor style stuff

Julien Pivotto added 2 commits January 8, 2020 14:14
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@brian-brazil brian-brazil merged commit 9d9bc52 into prometheus:master Jan 8, 2020
@brian-brazil
Copy link
Contributor

Thanks!

}

// SetQueryLogger sets the query logger.
func (ng *Engine) SetQueryLogger(l QueryLogger) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just drop this return value? Seems it only returns nil


# File to which PromQL queries are logged.
# Reloading the configuration will reopen the file.
[ query_log_file: <string> ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly we discussed at the devsummit that we should document how to allow people who want to to have this be part of the stdout log stream and have some docs around it. I assume that it's just a matter of setting this to /dev/stdout, I just wonder about stream syncronization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best for other docs. Does CON still exist on Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a complete section? under Configuration? under Querying?

op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying is for the PromQL language.

It wouldn't be part of the reference docs at all, this is generic Unix/Windows knowledge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a small post on the blog, with example with logrotate etc ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blog post would be published after the release)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a blog with examples for logrotate and stdout sounds like a great idea, or a user guide.

@roidelapluie
Copy link
Member Author

roidelapluie commented Jan 9, 2020 via email

@cstyan
Copy link
Member

cstyan commented Feb 7, 2020

/prombench v2.15.2

@prombot
Copy link
Contributor

prombot commented Feb 7, 2020

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-6520 and v2.15.2

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.12.0

@cstyan
Copy link
Member

cstyan commented Feb 7, 2020

/prombench cancel

@prombot
Copy link
Contributor

prombot commented Feb 7, 2020

Benchmark cancel is in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to log slow queries and recording rules

7 participants