Skip to content

Add auto conversion option to convert_type processor#5782

Merged
kkondaka merged 3 commits intoopensearch-project:mainfrom
kkondaka:auto-convert
Jun 17, 2025
Merged

Add auto conversion option to convert_type processor#5782
kkondaka merged 3 commits intoopensearch-project:mainfrom
kkondaka:auto-convert

Conversation

@kkondaka
Copy link
Copy Markdown
Collaborator

Description

Add auto conversion option to convert_type processor. Adds the following config option to convert_type processor to automatically convert strings.

processor:
   - convert_type:
        <existing options>
        coerce_strings:
           time_formats : ["format1", "format2"] # optional

Issues Resolved

Resolves #5733

Check List

  • [ X] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Comment on lines +68 to +73
} else {
this.type = null;
this.converter = null;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably else block is not needed if we assign null at their initialization.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got error that variables are not initialized.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if null values are assigned at their initialization, it won't complain

final LocalDate localDateForDefaultValues = LocalDate.now(DEFAULT_ZONE_ID);

final DateTimeFormatterBuilder dateTimeFormatterBuilder = new DateTimeFormatterBuilder()
.appendPattern(pattern)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the pattern already includes zone?
If possible, avoiding the use of DEFAULT_ZONE_ID will make it more flexible

Object result = null;
try {
result = autoConvert(event, entry.getValue(), keyPrefix+entry.getKey()+"/");
if (result != null) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the value is null, do we want to still set null in that specific key? Looks like we are dropping that key here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? autoConvert only converts strings to boolean/integer/float/long/double. If the string is "null", it won't be converted to null, that's not something auto conversion currently handles.

for (DateTimeFormatter formatter : coerceDateTimeFormatters) {
try {
ZonedDateTime tmp = ZonedDateTime.parse(str, formatter);
long r = (long)tmp.toInstant().toEpochMilli();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning Instant type is better here, I guess

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We want long with milliseconds. How's Instant useful?

return null;
} else if (lstr.contains(".") || lstr.contains("e")) {
Double d = Double.parseDouble(lstr);
if (d <= Float.MAX_VALUE && d >= Float.MIN_VALUE) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably keeping it as Double is better?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like TargetType doesn't have Float. I guess, for consistency, I will remove Float.

}
return d;
} else if (Character.isDigit(firstChar) || firstChar == '-' || firstChar == '+') {
Long l = Long.parseLong(str);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseLong could throw exception. We may want to catch and ignore?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ignored in the caller of autoConvert

Copy link
Copy Markdown
Collaborator

@san81 san81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this enhancement. Just left a few comments.

kkondaka added 2 commits June 16, 2025 22:07
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
if (result != null) {
event.put(keyPrefix+entry.getKey(), result);
}
} catch (Exception ignored) {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth having a metric here to debug if users have issues with using coerce_strings

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coercion is best effort. Not sure if metric here makes sense. If there is a string starting with a digit like 1234-sdfjh-34 It will fail one or more of coercions but that's not necessarily metric worthy

}
}
return null;
} else if (lstr.contains(".") || lstr.contains("e")) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume this is double? Or if we try and fail it's just a no-op? What happens if parseDouble throws?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, It is a best effort. Float/doubles will always have one of both of these and they be converted properly. It is possible that a string like "sdfsdf.eeee" fits this condition and the code tries to convert and it will fail. And that's OK because it is a no-op. Yes, there is some wastage of cycles but there is no way to make sure a string cannot be coerced without actually parsing it. We are reducing unnecessary parse calls using this checks.

}

} else if (objValue instanceof Map) {
doAutoConversion(event, (Map<String, Object>)objValue, keyPrefix);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a limit on the recursion here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It will be finite, right?

String t3 = zonedDateTime.format(DateTimeFormatter.ofPattern(ConvertEntryTypeProcessorConfig.DEFAULT_TIME_STRING_FORMATS.get(2)));
String t4 = zonedDateTimePST.format(DateTimeFormatter.ofPattern(ConvertEntryTypeProcessorConfig.DEFAULT_TIME_STRING_FORMATS.get(3)));

final Map<String, Object> testData1 = new HashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add coverage for nested maps

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a coverage for nested maps already. testData1 is inside testData

@kkondaka kkondaka merged commit 69737c1 into opensearch-project:main Jun 17, 2025
46 of 47 checks passed
@kkondaka kkondaka added this to the v2.12 milestone Jun 24, 2025
@kkondaka kkondaka deleted the auto-convert branch July 1, 2025 17:04
JonahCalvo pushed a commit to JonahCalvo/os-data-prepper that referenced this pull request Jul 17, 2025
…ect#5782)

* Add auto conversion option to convert_type processor

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

* Modified to coerse floats to double

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>

---------

Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Jonah Calvo <caljonah@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convert_entry_type processor should support "automatic" mode

3 participants