Deal with Control Characters in DynamoDB#5437
Conversation
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
|
Not sure if I need to make the same change in Export Record Converter because I'm not sure what the export option does for dynamo db source |
| this.bytesProcessedSummary = pluginMetrics.summary(BYTES_PROCESSED); | ||
| this.streamConfig = streamConfig; | ||
|
|
||
| MAPPER.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true); |
There was a problem hiding this comment.
Thank you @Galactus22625 for working this.
I tend to think this is not the right approach. Here are a few reasons.
- The DDB records do in fact contain properly escaped control characters. So the ultimate bug appears to be that we are unescaping them (probably through the
EnhancedDocument). - This may result in allowing leniency that we don't actually want to allow. Again, this is not a problem with the input data.
- There may be other issues with the serialization we miss from this.
- Looking at the code, there is a bit of overhead in this EnhancedDocument approach anyway. We take the DDB record, make a JSON string, then deserialize it.
See this comment in particular:
It appears that the best options are:
- Contact the DDB team to see if this is the expected behavior of
EnhancedDocument. Perhaps by opening a GitHub issue. - or; Change the implementation to parse the DDB record using the approach I took in the proof-of-concept that I shared in the comment above.
There was a problem hiding this comment.
ok, I will try implementing a solution in the style of the proof-of-concept from the comment above
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
| private Map<String, Object> convertData(Map<String, AttributeValue> data) throws JsonProcessingException { | ||
| String jsonData = EnhancedDocument.fromAttributeValueMap(data).toJson(); | ||
| return MAPPER.readValue(jsonData, MAP_TYPE_REFERENCE); | ||
| private Map<String, Object> convertData(Map<String, AttributeValue> data) { |
There was a problem hiding this comment.
Please add unit tests to cover all these data types to verify that we handle them correctly.
| break; | ||
| case BS: // BS for Binary Set | ||
| result.put(attributeName, attributeValue.bs().stream() | ||
| .map(buffer -> BASE64_ENCODER.encodeToString(buffer.asByteArray())) |
There was a problem hiding this comment.
Is this the current behavior - base64 encoding the string?
There was a problem hiding this comment.
yes, i made sure the output matches what was being outputted before
There was a problem hiding this comment.
and also dynamo db only takes base64 binary values
|
|
||
| data.forEach(((attributeValue) -> { | ||
| //dealing with all dynamo db attribute types | ||
| switch (attributeValue.type()) { |
There was a problem hiding this comment.
You could consolidate this code with the code above.
private Object processAttributeValue(AttributeValue attributeValue) {
switch (attributeValue.type()) {
case N:
return new BigDecimal(attributeValue.n());
case B:
return BASE64_ENCODER.encodeToString(attributeValue.b().asByteArray());
....
}
}
And then use it. The following should work (or be close to working)
convertListData(...) {
return data.stream.map(this::processAttributeValue).collect(toList());
}
convertMapData(...) {
return data.stream.collect(toMap(attributeValue::type, this::processAttributeValue));
}
There was a problem hiding this comment.
ok, will consolidate
| result.add(null); | ||
| break; | ||
| default: | ||
| throw new IllegalStateException("Unsupported attribute type: " + attributeValue.type()); |
There was a problem hiding this comment.
This should probably be InvalidArgumentException.
There was a problem hiding this comment.
ok, will change. I was matching the error message that was being sent before.
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
dlvenable
left a comment
There was a problem hiding this comment.
Very nice. Thank you for this improvement!
* Deal with Control Characters in DynamoDB Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com>
* Deal with Control Characters in DynamoDB Signed-off-by: Maxwell Brown <mxwelwbr@amazon.com> Signed-off-by: George Chen <qchea@amazon.com>
Description
Manually Parse DDB Records
Issues Resolved
Resolves #5027 . Bug ingesting control characters from dynamo db
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.