Add XML parser using XPath queries#8047
Conversation
|
Looks like the build failure is related to shirou/gopsutil#912 |
|
This is also related to PR #7988 trying to bump gopsutils... |
e0f9244 to
9629438
Compare
|
Hello! It looks like we are trying to solve one problem in parallel (https://github.com/M0rdecay/telegraf/tree/xml_parser/plugins/parsers/xml). I like your solution, but I also have a couple of questions:
|
|
Hey @M0rdecay, indeed we are trying to solve the same problem... :-) Let me answer your questions:
I thought about this one and I think it can be done. I want to be able to specify something like and similar for tags. When the field_query_name/_value is given we take the values of the query relative to each field_query node. This allows to use attributes as names/values. If not given we simply take the node-names as field/tag names and values. The problems I see:
I think you can do this as a XPath query (see
This should also be possible with XPath syntax I think. This being said, I first want to get this merged without adding more complexity. Let's extend this in a second round. I'm looking forward to your opinion on my thoughts above! Any help appreciated! :-) |
|
Thank you for the quick response! I agree, let`s continue in the second round) I ask you to see the solution I proposed for your example - #7460 (comment) - I will wait for your opinion. Answering your questions and suggestions:
It seems to me that trying to provide maximum flexibility, we may fall into the hell of configuration. Upd. For example, I need to get from a document like this: <Document>
<Data>
<Server_first>
<Data>
<Value>1</Value>
</Data>
<Hosts>
<Host_1>
<Name>Host1.local</Name>
<Uptime>1000</Uptime>
<Connections>
<Total>15</Total>
<Current>2</Current>
</Connections>
</Host_1>
<Host_2>
<Name>Host2.local</Name>
<Uptime>1240</Uptime>
<Connections>
<Total>33</Total>
<Current>4</Current>
</Connections>
</Host_2>
</Hosts>
</Server_first>
<Server_second>
<Data>
<Value>1</Value>
</Data>
<Hosts>
<Host_3>
<Name>Host3.local</Name>
<Uptime>3000</Uptime>
<Connections>
<Total>35</Total>
<Current>33</Current>
</Connections>
</Host_3>
<Host_4>
<Name>Host4.local</Name>
<Uptime>5240</Uptime>
<Connections>
<Total>63</Total>
<Current>78</Current>
</Connections>
</Host_4>
</Hosts>
</Server_second>
</Data>
</Document>metrics like this: I'll be glad to see how this can be done with your parser! |
|
@M0rdecay here is the config to reproduce your output on the given XML: [[inputs.file]]
files = ["M0rdecay.xml"]
data_format = "xml"
[[inputs.file.xml]]
selected_nodes = "//Hosts/*[starts-with(name(), 'Host_')]"
[inputs.file.xml.tags]
name = "Name"
xml_node_name="name()"
[inputs.file.xml.fields_int]
current = "Connections/Current"
total = "Connections/Total"
uptime = "Uptime"It is a bit more verbose than your config without the field/tag_queries we discussed earlier, but as I said, I'm willing to add this later. |
|
@srebhan hello! Please, tell me, does your parser, when parsing an array, allow you to get tags/fields from the beginning of the document (like In one of my cases, I found that I was missing such functionality. |
|
Hey @M0rdecay, sure. The readme states that if you start the path with a |
0701a75 to
3aa9608
Compare
8778046 to
f7a62d1
Compare
|
@M0rdecay: |
Sure. I'll try to do it by tomorrow night. |
|
I have checked if this works for me with one example. For a part of document like this, information will be lost: <COMMAND>
<CMD>41</CMD>
<ALL>
<PENDING_COUNT>0</PENDING_COUNT>
<MIN_DURATION>0</MIN_DURATION>
<MAX_DURATION>0</MAX_DURATION>
<DURATION_SUM>0</DURATION_SUM>
<DURATION_SQR_SUM>0</DURATION_SQR_SUM>
<DET_COUNT>0</DET_COUNT>
</ALL>
<SUCC>
<PENDING_COUNT>0</PENDING_COUNT>
<MIN_DURATION>0</MIN_DURATION>
<MAX_DURATION>0</MAX_DURATION>
<DURATION_SUM>0</DURATION_SUM>
<DURATION_SQR_SUM>0</DURATION_SQR_SUM>
<DET_COUNT>0</DET_COUNT>
</SUCC>
<FAIL>
<PENDING_COUNT>0</PENDING_COUNT>
<MIN_DURATION>0</MIN_DURATION>
<MAX_DURATION>0</MAX_DURATION>
<DURATION_SUM>0</DURATION_SUM>
<DURATION_SQR_SUM>0</DURATION_SQR_SUM>
<DET_COUNT>0</DET_COUNT>
</FAIL>
</COMMAND> |
|
Okay, we've discussed the case again - it looks like it's working very well now. It will take me a little time for other tests. |
I like the current. After all, we first parse XML, and only then generate metrics. |
|
I think the parser is flexible and functional. The only thing that scares me is the complexity of the configuration and the requests themselves. I didn't succeed the first time -_- |
…ilar to the starlark processor.
…rt the user in finding empty queries.
|
Well, so far so good @ssoroka when is this PR expected to merge? |
Required for all PRs:
This PR adds a XML parser using XPath expressions to define fields, tags, etc. To achieve this, we use the underlying antchfx/xpath library. Please check there to see what expressions are supported.
The PR closes #1758 and #6968. The configuration syntax is borrowed from suggestions made by @danielnelson in #1758 (comment).
Furthermore, the code is heavily inspired by the json parser plugin.