Skip to content

Add detect format processor#5774

Merged
kkondaka merged 2 commits intoopensearch-project:mainfrom
kkondaka:detect-format
Jun 13, 2025
Merged

Add detect format processor#5774
kkondaka merged 2 commits intoopensearch-project:mainfrom
kkondaka:detect-format

Conversation

@kkondaka
Copy link
Copy Markdown
Collaborator

Description

Add detect format processor which detects format of the data in the specified field.

Issues Resolved

Resolves #5731

Check List

  • [ X] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [ X] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
sb2k16
sb2k16 previously approved these changes Jun 13, 2025
sourceData = sourceData.trim();

// JSON: Starts with { and ends with }
if (sourceData.startsWith("{") && sourceData.endsWith("}")) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For json, it could also start with [ and end with ]

if (commas == expectedCommas)
numMatches++;
}
if (numMatches >= lines.length/2) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we trying to look for half of the rows having same number of commas with in the number of lines we are checking, then probably this condition should look like numMatches >= Math.min(lines.length, 10)/2

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Copy link
Copy Markdown
Collaborator

@san81 san81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition to the processors

@kkondaka kkondaka merged commit 75acb38 into opensearch-project:main Jun 13, 2025
69 of 74 checks passed
@kkondaka kkondaka added this to the v2.12 milestone Jun 24, 2025
@kkondaka kkondaka deleted the detect-format branch July 1, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add content format detector processor

3 participants