[VL] Skip UTF-8 validation in JSON parsing#6661
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
@kecookier, do you care about the UTF-8 encoding validity of JSON string? Do we just need to align with Spark's behavior? cc @wang-zhun |
|
@philo-he We don't need UTF-8 validity, we just need to be consistent with Spark. |
|
@philo-he Thanks. Just keep it consistent with Spark. |
|
Has opened one thread in velox to discuss this issue: facebookincubator/velox#10639. If velox accepts this change, we can close this pr. |
c64d553 to
61d4df4
Compare
|
@wang-zhun, can you validate this pr with your cases? We can merge this one if CI passes and no issue is on your side. |
|
@philo-he Thank you, I’ve verified that this patch is effective for my scenario. beforenow |
|
@kecookier, could you approve this pr if you have no comment? |
|
Thanks. |


What changes were proposed in this pull request?
See discussion: #5253 (comment)
By default, simdjson lib enables UTF-8 validation to check input json string. However, Spark always disregards the UTF-8 encoding invalidity of input.