Add parser for the unofficial prometheus dump format#8320
Add parser for the unofficial prometheus dump format#8320ArthurSens wants to merge 2 commits intoprometheus:mainfrom ArthurSens:promdump_textparser
Conversation
Signed-off-by: arthursens <arthursens2005@gmail.com>
|
I think that the format of tsdb dump is already covered, it is the prometheus standard text parser. |
|
It really looks like it is the prometheus exposition format. |
|
Maybe just the handling of |
Yeah, that's what I understood as well. I've just tried to parse The metric name handling is indeed different. Also, the dump format always includes a timestamp, which is optional on the standard one |
|
cc @beorn7, you might want to take a look at this one :) |
Signed-off-by: arthursens <arthursens2005@gmail.com>
|
Why is a whole new metrics format being used here? We shouldn't be proliferating additional formats. The textparse directory is only for the two parsers used for scraping, and is tightly bound to the tsdb. No other parsers should go in here as that'll only cause confusion. |
|
While reading the codebase and figuring out how to backfill from the dump format, I saw that the current command That's why I thought that this directory should be the home for a new textparser used in a to-be-developed If this new command is going to be accepted and the textparse directory is indeed not the ideal place, where should this code live? 🤔 |
|
My point is that there shouldn't be a new parser at all, as we shouldn't be using any metrics formats that aren't the Prometheus or OpenMetrics format. |
|
Hmm ok, got your point Do you see this idea moving forward if the dump created a file in the Prometheus format instead of the current one? |
|
To provide more context: The premise here is what @ArthurSens described: You can already run
|
|
Personally, I would just go for (1) for now as it is low effort, unlocks the feature quickly, and doesn't commit us to anything beyond what we already do and what's written on the tin (it's just undump of whatever is dumped). If we later went with (2) or (3), we would still fulfill the contract of (1) implicitly. While (2) is certainly nice-to-have, I see it as the least necessary of all options. It's needlessly expensive if you just want to transfer some data between Prometheus instances and don't care about the format. And it doesn't unlock anything for non-Prometheus-related tools (which will be happy with JSON using some ad-hoc schema but those tools are unlikely to ever get native OpenMetrics import). It's only really useful once other tools exist that process files in OpenMetrics format. (3) kind of gives you the best of both worlds. We can design the format to be efficient for what we want here (including the "few times series with many samples" use case), and we open up import of the data by arbitrary tools. But it needs more upfront work to get it going. So I suggest to implement (1) with a migration towards (3) in mind. An OpenMetrics dump can be implemented independent of this effort at any time. |
That's a problem for reading when we're trying to produce 2h blocks. It's not a problem for writing out, as the tsdb interfaces are already as we want them. The same applies for the Prometheus text format (ignoring that multiple timestamps there is technically defined behaviour).
I'm not seeing how JSON would help or be efficient here compared to the existing metrics formats. We'd have all the exact same problems when it comes to reading it back in when trying to write out blocks efficiently. Plus we'd still be defining yet another new metrics format. I think we should be looking to eliminate the extra metrics format that was inadvertently created, and do 2. |
|
Though thinking on it, OM format doesn't work as it doesn't support stale NaNs. If the goal is to be able to dump and undump, I suspect that the only reasonable format is the tsdb block. |
|
Yeah, just dumping the pieces of data we want as blocks to copy into another Prometheus instance would be an elegant and direct way to solve the "let me transfer this piece of data from Prometheus A to Prometheus B". Definitely another option. WRT stale markers: That's a good point. I still think a text dump including the stale markers would be worthwhile. Originally, the dump command was introduced to let humans inspect what's actually in the TSDB. Stale markers are of potential interest in this use case, too. And they are interesting "for data science applications" etc., too. The revelation about stale markers proves the viability of option (1): Such a feature can just be added without breaking any contract. |
|
WRT efficiency: The strict ordering requirement of OpenMetrics could get in the way of efficient retrieval. Perhaps it won't, but we cannot know. OpenMetrics is not designed to be ordered in a way that fits the use case of dumping data from the Prometheus TSDB or reading it back. A format with more flexible ordering would be, well, more flexible. We would be free to pick the right trade-off in the ordering between dumping and un-dumping. Another aspect is that OpenMetrics requires us to repeat the whole metric identifier (name and labels) for each sample. |
It won't, the tsdb APIs always return samples for a given series in order. So dumping should be fine, reading it back is trickier due to what you likely want to then do with it.
Yeah, some way to view those that's not the remote read API would be nice. On the other hand, I've never had a need to view raw stale markers since implementing them so it would seem that that's not important for debugging in practice. I note that the current code will dump them as
Anything that's not effectively the raw tsdb block is going to be less efficient than a tsdb block.
I don't believe we've any stability guarantees around promtool. In relation to this issue, I think we should not do anything here right now. This PR is speculative in terms of use cases, and per the above it seems it is not the best way to go about that class of use case. I think the dump subcommand should be left to ad-hoc debugging/extraction, and we not try to tie it into use cases for which it is not suited merely because it seems like it might make sense at a surface level. |
But the ordering requirements of OpenMetrics are not per (Prometheus) series. Series belonging to the same metric (_count, _sum, _created, …) have to be stitched together. See https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md#metricpoint-1
I would have liked to.
That might even help in debugging cases, but it's also ambiguous and therefore dangerous.
That's true, but it also doesn't add anything of value to the discussion. The TSDB block only solves one use case. |
We have none of that information here, everything would be an unknown type. As things stand anyway.
Indeed, that should be fixed. |
|
Oh, nice to see that the PR at least brought a good discussion around the idea :) Thank you Beorn for also bringing the other ideas that were already discussed. Yeah, the approach I was aiming for in this PR was approach number (1). Despite the complexity of building a text parser, it was fairly easy to understand the promparse and do some minor changes to fit with the current dump format. Adding a backfiller on top of that parser shouldn't be too complicated too, since I can just copy the OpenMetrics one and apply the necessary changes. That being said, I agree that having a new formalized format and even worse, one with no stability contract between different Prometheus versions, might guide the community to create tooling based on several different and incompatible formats. I think we do need to choose an already formalized format, or if a new format is going to be created, it should at least be formalized with a contract.
Yep, after the above discussion, I also agree that there is no need for a review on the parser for now. There is no concrete use case, but that sounds like something that might be explored in the future, right? As I was already advised before, I think it's time to write a design doc around the idea and all the possible approaches 😅. I wasn't totally confident about what other approaches existed, but this PR alone has enough reading material to start the design document. Thanks again you both! |
As briefly discussed at Prometheus office hours and at the last Storage Working Group meeting, it might be useful to have a way to backfill blocks from a file resulted from
promtool tsdb dump.This is not intended to be used as an efficient strategy to move huge amounts of data between Prometheus instances, but as an easy way to "dump and undump" specific filtered samples. A more efficient dump-undump strategy for big amounts of data is still being discussed.
It is still necessary to add more logic to be able to do the backfilling via promtool, but I believe it will be easier to review separating into two PRs.
This PR is only adding a text parser to the dump format.
PS: I've never written a text parser before, I wrote this one by reading and trying to understand the OpenMetrics and Prometheus exposition format parsers and replicating. I couldn't find any documentation or tutorials for golex and the code generated following these instructions don't even compile if I don't do some manual changes
If anyone knows a good place where I could learn more about golex, that would be great 😛