Skip to content

Add multi-line csv support#5784

Merged
kkondaka merged 2 commits intoopensearch-project:mainfrom
kkondaka:csv-processor
Jun 16, 2025
Merged

Add multi-line csv support#5784
kkondaka merged 2 commits intoopensearch-project:mainfrom
kkondaka:csv-processor

Conversation

@kkondaka
Copy link
Copy Markdown
Collaborator

Description

Add multi-line csv support. When the multi_line config option is used, the input in the source key is expected to contain multiple lines of CSV format with first line containing column names.

Issues Resolved

Resolves #5783

Check List

  • [ X] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Comment on lines +100 to +103
Event clonedEvent = JacksonEvent.fromEvent(event);
final List<String> row = messageIterator.nextValue();
putDataInEvent(clonedEvent, header, row);
recordsOut.add(new Record<Event>(clonedEvent));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is creating new events, do we need to add them to acknowledgment set of the original event? Similar to what was done for SplitEventProcessor

protected void addToAcknowledgementSetFromOriginEvent(Event recordEvent, Event originRecordEvent) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks for pointing it out. I missed that.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
"is parsed. If there is no event header, no action is taken. Default value is true.")
private Boolean deleteHeader = DEFAULT_DELETE_HEADERS;

@JsonProperty(value = "multi_line", defaultValue = "false")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have multiline as the name.

@kkondaka kkondaka merged commit 14f53cd into opensearch-project:main Jun 16, 2025
49 of 51 checks passed
@kkondaka kkondaka added this to the v2.12 milestone Jun 24, 2025
@kkondaka kkondaka deleted the csv-processor branch July 1, 2025 17:04
JonahCalvo pushed a commit to JonahCalvo/os-data-prepper that referenced this pull request Jul 17, 2025
* Add multi-line csv support

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

---------

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Jonah Calvo <caljonah@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to process multi-line CSV data in csv processor

3 participants