feat(parsers/csv): Add metadata support to CSV parser plugin by etycomputer · Pull Request #10083 · influxdata/telegraf

etycomputer · 2021-11-09T22:16:19Z

Required for all PRs:

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

resolves #10079

Added support to parse metadata for the CSV parser plugin.

etycomputer · 2021-11-10T14:01:39Z

Hi @sspaink ,

I have another Linter issue. It is erroring things that I have not changed in the READEME.md file.

Would you be able to have a look, please?

I know how to fix the issue. But I am not sure if I should.

Thanks,
Ehsan

etycomputer · 2021-11-10T14:10:37Z

Hi,

Can someone also tell me why the tiger not has given my job the bugfix label and not the feature label? Is there a place where I could find out what other labels there are and how to add them?

Cheers,
Ehsan

sspaink

@etycomputer thanks for working on this new feature, noticed some minor typos in the README.md so I've got some requested changed. To answer your questions:

The bot sometimes get the labels wrong, its a todo item to get it more accurate by using the conventional commit message. I've fixed the labels manually in the meantime :)
The markdown linter unfortunately doesn't just check your changes but the whole file, I decided to fix the linter errors in a separate pull request: #10093

plugins/parsers/csv/README.md

etycomputer · 2021-11-11T04:31:56Z

Hi @sspaink,
Thanks for fixing the markdown linter issue on #10093 and your feedback regarding my new feature.
I have finished working on this job. I have tried to add all the scenarios that I can think of in the unit tests.

The only part that I don't like is where I am using the different Readers.
I have started learning to program in Go not more than three weeks, so if there is a more efficient way to reach the same goal please let me know.

Basically, the part that gets messy is when I am trying to read lines to skip or parse metadata. in these cases, I don't want to use the CSV reader and just want to use a basic reader.

I look forward to hearing from you,

Kind regards,
Ehsan

telegraf-tiger · 2021-11-11T04:55:06Z

🥳 This pull request decreases the Telegraf binary size by -0.01 % for linux amd64 (new size: 131.5 MB, nightly size 131.5 MB)

📦 Looks like new artifacts were built from this PR.

Expand this list to get them **here**! 🐯

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_armv7.tar.gz
armhf.deb	i386.rpm	freebsd_i386.tar.gz
i386.deb	ppc64le.rpm	linux_amd64.tar.gz
mips.deb	s390x.rpm	linux_arm64.tar.gz
mipsel.deb	x86_64.rpm	linux_armel.tar.gz
ppc64el.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

etycomputer · 2021-11-17T00:58:41Z

Hi @sspaink,

When you have a few minutes could you have a look at this PR.
I have fully tested it with the CSV files that I needed to parse and it is now working very well.

I look forward to receiving your feedback.

Kind regards,
Ehsan

Hipska

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

plugins/parsers/csv/parser.go

plugins/parsers/csv/README.md

etycomputer · 2021-11-22T12:17:09Z

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

Hey @Hipska, thanks for your rapid feedback. I really appreciate it. Yes, I agree with you too. Metadata is commonly going to be saved a tags. But just in case this is not the case I have allowed the option for it to be a field by default and adding them as tags is possible by listing the tag keys.

The other reason behind considering the default for metadata to be fields is, we usually exactly know keys that are going to be tags but dynamic fields are harder to predict.

Finally, the current CSV parse considers all columns to be fields unless they are flagged as tags, I am following the same pattern.

Kind regards,
Ehsan

etycomputer · 2022-01-18T02:03:32Z

Hi @srebhan, I had a unit test that cover the Metadata example you mentioned but I have also updated the same unit test to also cover the empty row for skipped rows. Please look at the unit test TestParseCSVFileWithMetadata(). The current code coves these scenarios at a higher layer.

Hipska · 2022-01-18T10:14:55Z

plugins/parsers/csv/parser_test.go

+	p = &Parser{
+		HeaderRowCount:     1,
+		SkipRows:           2,
+		MetadataRows:       4,
+		Comment:            "#",
+		TagColumns:         []string{"type", "version"},
+		MetadataSeparators: []string{":", "="},
+		MetadataTrimSet:    " #",
+	}
+	err = p.Init()
+	require.NoError(t, err)
+	testCSVRows := []string{
+		"garbage nonsense that needs be skipped",
+		"",
+		"# version= 1.0\r\n",
+		"",
+		"    invalid meta data that can be ignored.\r\n",
+		"file created: 2021-10-08T12:34:18+10:00",
+		"timestamp,type,name,status\n",
+		"2020-11-23T08:19:27+10:00,Reader,R002,1\r\n",
+		"#2020-11-04T13:23:04+10:00,Reader,R031,0\n",
+		"2020-11-04T13:29:47+10:00,Coordinator,C001,0",
+	}


Why defining the same twice? Couldn't p.Init() be run a second time to reset internal counters? Also testCSVrows and the resulting tests could be reused imho.

No, Init() will not reset the counters because we are using the variable set by the config (i.e. the MetadataRows field) directly.

Okay, but still defining the test csv rows and the resulting tests twice seems unnecessary and error prone.

@etycomputer could you change to define the test csv only once please

Still not changed or given any comment on why not..

@srebhan I discussed with @etycomputer about this on Slack. Do you agree we should only test if the processor can handle csv's with metadata in this test only and maybe add another test that checks the compatibility of different line endings?

I'm fine either way.

srebhan · 2022-01-24T10:47:32Z

@etycomputer I now see my fault. Even an empty line contains the \n which passes the if as the string is not empty. Sorry for the noise.

srebhan · 2022-01-24T11:35:03Z

@etycomputer I'm hoping to fix your CI error with #10497. After this (and a rebase from your side) I guess we are good to go.

etycomputer · 2022-01-24T20:01:14Z

@etycomputer I now see my fault. Even an empty line contains the \n which passes the if as the string is not empty. Sorry for the noise.

No problem at all. It's your job to ask and mine to make sure everything is answered.

✨ Added support to parse metadata for CSV parser plugin 💡 Adding comment ✅ Updated the unit tests 📝 Updated the README.md

The reason for this change is that there is a \n at the start of the simpleCSVWithHeader

telegraf-tiger · 2022-02-11T01:35:37Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

etycomputer · 2022-02-11T01:46:27Z

Hi @srebhan and @Hipska,

I have rebased my branch and fixed all CI issues.

It is ready for you to do the final review.

I did have to fix a test bug introduced on 04/02/2022 by Rabhan.

const simpleCSVWithHeader = `
# Simple CSV with header(s)
a,b,c
1.2,3.1415,ok
`

parser := &csv.Parser{
	MetricName:  "metricName",
	SkipRows:    2,
	ColumnNames: []string{"a", "b", "c"},
	TagColumns:  []string{"c"},
}

The issue is that there is an empty line at the start and one comment followed by a header and he has only two SkipRows and no header rows.

I have fixed the test by changing the SkipRows to 3

srebhan

Looks good to me. Thanks for working on this @etycomputer!

sspaink

Thank you for working on this, I have two questions for you if you could help answer them.

plugins/parsers/csv/parser.go

etycomputer · 2022-02-24T01:32:32Z

Thanks, everyone; it has been a long journey.
I appreciate your commitment and contributions.

srebhan · 2022-02-24T08:02:15Z

@etycomputer and I appreciate your commitment and hard work to get this over the finishing line!

JackSharebourg · 2022-02-27T16:08:44Z

Hey @etycomputer thanks for implementing this feat!

In the merged version you put all metadata in tags, but the comment and the Readme still point out that metadata will be fields by default.
IMHO I would also recommend to change this back. Metadata should be fields by default because this is also the current way for all columns to be parsed and with the tag key list they can easily be converted when needed.
This way metadata could also be used as the measurement, since the measurement is chosen only out of fields and not out of tags.
Using a processor to change metadata tags back to fields or as measurement every time this is needed is much more complicated and error prone then putting the metadata tags in the tag list. @Hipska what do you think?

Hipska · 2022-02-28T08:50:19Z

The readme should definitely be updated to reflect the actual scenario! Please create a new issue for that. (Or fix it in your PR #10742)

This was my opinion back when we started reviewing this PR, it has not been changed since then:

General idea looks good. Only it looks more logical that metadata would end up in metric tags instead of metric fields. Or maybe have it as an option if you don't agree.

telegraf-tiger bot added the fix pr to fix corresponding bug label Nov 9, 2021

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from ed79ce1 to 2feefd9 Compare November 10, 2021 03:47

sspaink added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed fix pr to fix corresponding bug labels Nov 10, 2021

etycomputer mentioned this pull request Nov 10, 2021

fix(inputs/mongodb): resolve all markdown linter issues in README.md #10077

Merged

sspaink mentioned this pull request Nov 10, 2021

fix(parser/csv): resolve linter issues #10093

Merged

sspaink suggested changes Nov 10, 2021

View reviewed changes

plugins/parsers/csv/README.md Outdated Show resolved Hide resolved

plugins/parsers/csv/README.md Outdated Show resolved Hide resolved

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 4 times, most recently from bede9b0 to d937d84 Compare November 11, 2021 04:13

etycomputer requested a review from sspaink November 11, 2021 04:19

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from d937d84 to c284d9b Compare November 11, 2021 04:34

etycomputer mentioned this pull request Nov 11, 2021

fix: directory monitor input plugin when data format is CSV and csv_skip_rows>0 and csv_header_row_count>=1 #9865

Merged

3 tasks

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 3 times, most recently from d493863 to 3442aac Compare November 16, 2021 22:55

etycomputer changed the title ~~feat: Add metadata support to CSV parser plugin~~ feat(parsers/csv): Add metadata support to CSV parser plugin Nov 16, 2021

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 3442aac to 75544ec Compare November 17, 2021 00:53

etycomputer mentioned this pull request Nov 17, 2021

Adding support to parse metadata before headers to the CSV parser #10079

Closed

Hipska requested a review from srebhan November 18, 2021 09:38

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch 2 times, most recently from 5b9dd16 to 65b960f Compare November 22, 2021 02:39

Hipska suggested changes Nov 22, 2021

View reviewed changes

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 93b1e69 to 3dec379 Compare January 18, 2022 01:58

Hipska reviewed Jan 18, 2022

View reviewed changes

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 3dec379 to 7ea2cfe Compare January 20, 2022 22:53

srebhan mentioned this pull request Jan 24, 2022

fix: Improve parser tests by using go-cmp/cmp #10497

Merged

3 tasks

ehsan-yazdi added 2 commits February 11, 2022 10:47

✨ Added support to parse metadata for CSV parser plugin

61048fc

✨ Added support to parse metadata for CSV parser plugin 💡 Adding comment ✅ Updated the unit tests 📝 Updated the README.md

✅ Updated the unit test to coved empty line for skipped rows

ac807a2

etycomputer force-pushed the feat-add-metadata-support-to-csv-parser branch from 7ea2cfe to ac807a2 Compare February 11, 2022 00:47

Fixed a test

e98fb0f

The reason for this change is that there is a \n at the start of the simpleCSVWithHeader

etycomputer requested review from Hipska and srebhan February 11, 2022 01:39

Hipska added the area/csv csv parser/serialiser related label Feb 15, 2022

srebhan approved these changes Feb 16, 2022

View reviewed changes

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Feb 16, 2022

sspaink suggested changes Feb 17, 2022

View reviewed changes

plugins/parsers/csv/parser.go Show resolved Hide resolved

plugins/parsers/csv/parser.go Show resolved Hide resolved

etycomputer requested review from sspaink and removed request for Hipska February 24, 2022 00:18

sspaink approved these changes Feb 24, 2022

View reviewed changes

sspaink merged commit 5adecc3 into influxdata:master Feb 24, 2022

JackSharebourg mentioned this pull request Feb 28, 2022

fix: metadata readme description csv parser #10750

Merged

3 tasks

MyaLongmire pushed a commit that referenced this pull request Jul 6, 2022

feat(parsers/csv): Add metadata support to CSV parser plugin (#10083)

ec61834

Conversation

etycomputer commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Required for all PRs:

Uh oh!

etycomputer commented Nov 10, 2021

Uh oh!

etycomputer commented Nov 10, 2021

Uh oh!

sspaink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

etycomputer commented Nov 11, 2021

Uh oh!

telegraf-tiger bot commented Nov 11, 2021

Artifact URLs

Uh oh!

etycomputer commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hipska left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etycomputer commented Nov 22, 2021

Uh oh!

etycomputer commented Jan 18, 2022

Uh oh!

Hipska Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

srebhan Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

Hipska Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

Hipska Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

Hipska Feb 16, 2022

Choose a reason for hiding this comment

Uh oh!

Hipska Feb 16, 2022

Choose a reason for hiding this comment

Uh oh!

srebhan Feb 17, 2022

Choose a reason for hiding this comment

Uh oh!

srebhan commented Jan 24, 2022

Uh oh!

srebhan commented Jan 24, 2022

Uh oh!

etycomputer commented Jan 24, 2022

Uh oh!

telegraf-tiger bot commented Feb 11, 2022

Artifact URLs

Uh oh!

etycomputer commented Feb 11, 2022

Uh oh!

srebhan left a comment

Choose a reason for hiding this comment

Uh oh!

sspaink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

etycomputer commented Feb 24, 2022

Uh oh!

srebhan commented Feb 24, 2022

Uh oh!

JackSharebourg commented Feb 27, 2022

Uh oh!

etycomputer commented Nov 9, 2021 •

edited

Loading

etycomputer commented Nov 17, 2021 •

edited

Loading

Hipska commented Feb 28, 2022 •

edited

Loading