feat(parsers/csv): Add metadata support to CSV parser plugin#10083
Conversation
ed79ce1 to
2feefd9
Compare
|
Hi @sspaink , I have another Linter issue. It is erroring things that I have not changed in the READEME.md file. Would you be able to have a look, please? I know how to fix the issue. But I am not sure if I should. Thanks, |
|
Hi, Can someone also tell me why the tiger not has given my job the bugfix label and not the feature label? Is there a place where I could find out what other labels there are and how to add them? Cheers, |
sspaink
left a comment
There was a problem hiding this comment.
@etycomputer thanks for working on this new feature, noticed some minor typos in the README.md so I've got some requested changed. To answer your questions:
-
The bot sometimes get the labels wrong, its a todo item to get it more accurate by using the conventional commit message. I've fixed the labels manually in the meantime :)
-
The markdown linter unfortunately doesn't just check your changes but the whole file, I decided to fix the linter errors in a separate pull request: #10093
bede9b0 to
d937d84
Compare
|
Hi @sspaink, The only part that I don't like is where I am using the different Readers. Basically, the part that gets messy is when I am trying to read lines to skip or parse metadata. in these cases, I don't want to use the CSV reader and just want to use a basic reader. I look forward to hearing from you, Kind regards, |
d937d84 to
c284d9b
Compare
|
🥳 This pull request decreases the Telegraf binary size by -0.01 % for linux amd64 (new size: 131.5 MB, nightly size 131.5 MB) 📦 Looks like new artifacts were built from this PR. Expand this list to get them **here**! 🐯Artifact URLs |
d493863 to
3442aac
Compare
3442aac to
75544ec
Compare
|
Hi @sspaink, When you have a few minutes could you have a look at this PR. I look forward to receiving your feedback. Kind regards, |
5b9dd16 to
65b960f
Compare
Hipska
left a comment
There was a problem hiding this comment.
General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.
Hey @Hipska, thanks for your rapid feedback. I really appreciate it. Yes, I agree with you too. Metadata is commonly going to be saved a tags. But just in case this is not the case I have allowed the option for it to be a field by default and adding them as tags is possible by listing the tag keys. The other reason behind considering the default for metadata to be fields is, we usually exactly know keys that are going to be tags but dynamic fields are harder to predict. Finally, the current CSV parse considers all columns to be fields unless they are flagged as tags, I am following the same pattern. Kind regards, |
93b1e69 to
3dec379
Compare
|
Hi @srebhan, I had a unit test that cover the Metadata example you mentioned but I have also updated the same unit test to also cover the empty row for skipped rows. Please look at the unit test |
| p = &Parser{ | ||
| HeaderRowCount: 1, | ||
| SkipRows: 2, | ||
| MetadataRows: 4, | ||
| Comment: "#", | ||
| TagColumns: []string{"type", "version"}, | ||
| MetadataSeparators: []string{":", "="}, | ||
| MetadataTrimSet: " #", | ||
| } | ||
| err = p.Init() | ||
| require.NoError(t, err) | ||
| testCSVRows := []string{ | ||
| "garbage nonsense that needs be skipped", | ||
| "", | ||
| "# version= 1.0\r\n", | ||
| "", | ||
| " invalid meta data that can be ignored.\r\n", | ||
| "file created: 2021-10-08T12:34:18+10:00", | ||
| "timestamp,type,name,status\n", | ||
| "2020-11-23T08:19:27+10:00,Reader,R002,1\r\n", | ||
| "#2020-11-04T13:23:04+10:00,Reader,R031,0\n", | ||
| "2020-11-04T13:29:47+10:00,Coordinator,C001,0", | ||
| } |
There was a problem hiding this comment.
Why defining the same twice? Couldn't p.Init() be run a second time to reset internal counters? Also testCSVrows and the resulting tests could be reused imho.
There was a problem hiding this comment.
No, Init() will not reset the counters because we are using the variable set by the config (i.e. the MetadataRows field) directly.
There was a problem hiding this comment.
Okay, but still defining the test csv rows and the resulting tests twice seems unnecessary and error prone.
There was a problem hiding this comment.
@etycomputer could you change to define the test csv only once please
There was a problem hiding this comment.
Still not changed or given any comment on why not..
There was a problem hiding this comment.
@srebhan I discussed with @etycomputer about this on Slack. Do you agree we should only test if the processor can handle csv's with metadata in this test only and maybe add another test that checks the compatibility of different line endings?
3dec379 to
7ea2cfe
Compare
|
@etycomputer I now see my fault. Even an empty line contains the |
|
@etycomputer I'm hoping to fix your CI error with #10497. After this (and a rebase from your side) I guess we are good to go. |
No problem at all. It's your job to ask and mine to make sure everything is answered. |
✨ Added support to parse metadata for CSV parser plugin 💡 Adding comment ✅ Updated the unit tests 📝 Updated the README.md
7ea2cfe to
ac807a2
Compare
The reason for this change is that there is a \n at the start of the simpleCSVWithHeader
|
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
|
I have rebased my branch and fixed all CI issues. It is ready for you to do the final review. I did have to fix a test bug introduced on 04/02/2022 by Rabhan. The issue is that there is an empty line at the start and one comment followed by a header and he has only two SkipRows and no header rows. I have fixed the test by changing the SkipRows to 3 |
srebhan
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks for working on this @etycomputer!
sspaink
left a comment
There was a problem hiding this comment.
Thank you for working on this, I have two questions for you if you could help answer them.
|
Thanks, everyone; it has been a long journey. |
|
@etycomputer and I appreciate your commitment and hard work to get this over the finishing line! |
|
Hey @etycomputer thanks for implementing this feat! In the merged version you put all metadata in tags, but the comment and the Readme still point out that metadata will be fields by default. |
|
The readme should definitely be updated to reflect the actual scenario! Please create a new issue for that. (Or fix it in your PR #10742) This was my opinion back when we started reviewing this PR, it has not been changed since then:
|
Required for all PRs:
resolves #10079
Added support to parse metadata for the CSV parser plugin.