Skip to content

Add support for replacing invalid characters in parsing processors#5823

Merged
kkondaka merged 6 commits intoopensearch-project:mainfrom
kkondaka:invalid-keys-handling
Jul 1, 2025
Merged

Add support for replacing invalid characters in parsing processors#5823
kkondaka merged 6 commits intoopensearch-project:mainfrom
kkondaka:invalid-keys-handling

Conversation

@kkondaka
Copy link
Copy Markdown
Collaborator

Description

Add support for replacing invalid characters in parsing processors
When normalize_key config option is used, csv processor, key-value processor and parse-json processor would replace invalid characters in the keys with "_"(underscore). These three processors currently creates keys with invalid characters.

parse-ion and parse-xml processors do not generate keys with invalid keys currently and that behavior stays same.

Issues Resolved

Resolves #5757

Check List

  • [X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

return key.length() > 1 && key.endsWith(SEPARATOR) ? key.substring(0, key.length() - 1) : key;
}

static String replaceInvalidCharacters(final String key) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for this function. I dont see any tests in JacksonEventKeyTest.java

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in JacksonEventTest.java. There is no difference because JacksonEvent is just a wrapper.

private final CsvMapper mapper;
private final CsvSchema schema;

private final Boolean normalizeKeys;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be a primitive boolean


@JsonProperty(value = "normalize_keys", defaultValue = "false")
@JsonPropertyDescription("If specified, replaces invalid characters with underscore character")
private Boolean normalizeKeys = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the default behavior be true instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We don't want to break existing behavior.


@Override
public Boolean getNormalizeKeys() {
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please check if we can add a test to cover this.

kkondaka added 2 commits June 26, 2025 11:35
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
@kkondaka kkondaka force-pushed the invalid-keys-handling branch from 7a70e0f to dc647d3 Compare June 26, 2025 18:39
Copy link
Copy Markdown
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Some minor issues below:

private Boolean deleteHeader = DEFAULT_DELETE_HEADERS;

@JsonProperty(value = "normalize_keys", defaultValue = "false")
@JsonPropertyDescription("If specified, replaces invalid characters with underscore character")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If set to true is probably clearer. Same for other processors.

Suggested change
@JsonPropertyDescription("If specified, replaces invalid characters with underscore character")
@JsonPropertyDescription("If set to true, replaces invalid characters with underscore character")

if (value instanceof Map) {
replaceInvalidChars((Map<String, Object>)value);
}
if (newKey != key) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be if (!newKey.equals(key))

Suggested change
if (newKey != key) {
if (!newKey.equals(key)) {

}


public static String replaceInvalidKeyChars(final String key) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be ideal to add this into the builder.

        final Log event = eventFactory.eventBuilder(LogEventBuilder.class)
                .withReplaceInvalidCharacters(true)
                .withData(json)
                .build();

All the work will be part of the EventBuilder code.

Then the codecs have very little work to do!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good idea but the current changes do not need this

if (normalizeKeys) {
key = JacksonEvent.replaceInvalidKeyChars(key);
}
event.put(key, data.get(providedHeaderColIdx));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be:

event.put(key, data.get(providedHeaderColIdx, true);

Where the last boolean is a property to replace invalid keys.

sb2k16
sb2k16 previously approved these changes Jun 26, 2025
Signed-off-by: Kondaka <krishkdk@amazon.com>
oeyh
oeyh previously approved these changes Jun 30, 2025
sb2k16
sb2k16 previously approved these changes Jun 30, 2025
Signed-off-by: Kondaka <krishkdk@amazon.com>
@kkondaka kkondaka dismissed stale reviews from sb2k16 and oeyh via 24b2304 June 30, 2025 18:08
sb2k16
sb2k16 previously approved these changes Jun 30, 2025
oeyh
oeyh previously approved these changes Jun 30, 2025
}
}

if (normalizeKeys) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this code now? And this utility method can be package protected to avoid using it.

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Copy link
Copy Markdown
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for consolidating the normalization logic!

@kkondaka kkondaka merged commit 48d5135 into opensearch-project:main Jul 1, 2025
45 of 47 checks passed
@kkondaka kkondaka deleted the invalid-keys-handling branch July 1, 2025 17:03
JonahCalvo pushed a commit to JonahCalvo/os-data-prepper that referenced this pull request Jul 17, 2025
…pensearch-project#5823)

* Add support for replacing invalid characters in parsing processors

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Fixed javadoc error

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Fixed checkstyle errors

Signed-off-by: Kondaka <krishkdk@amazon.com>

---------

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Jonah Calvo <caljonah@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename keys processor enhancements

4 participants