-
Notifications
You must be signed in to change notification settings - Fork 190
[Enhancement] Support Illegal Character in Regex Name Group #4549
Copy link
Copy link
Open
Labels
PPLPiped processing languagePiped processing languageenhancementNew feature or requestNew feature or request
Description
Description
Currently, regex based extraction commands which are using java regex library has a limitation of including special characters such as (-, _ ,@) in the named captured group for creating a new column in the result site. Here are some related issues:
- [FEATURE] Support
_/-as parsed field name #3944 - [Enhancement] Error handling for unsupported characters in java regex library #4467
PR: #4434 enhanced the experience of unify the error handling of this for both parse and rex commands. Here is the current behavior
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'{
"query": "source=accounts | rex field=email \"(?<username>[^@]+)@(?<domain_name>[^.]+)\" | fields email, username, domain_name | head 3"
}' | jq
{
"error": {
"reason": "Invalid Query",
"details": "Invalid capture group name 'domain_name'. Java regex group names must start with a letter and contain only letters and digits.",
"type": "IllegalArgumentException"
},
"status": 400
}However, Coming from the #4434 (comment) @ykmr1224 pointed we should be able to support the invalid characters by rewriting regex and map extracted values back to original name.
Expected Behavior
e.g.: (?<user_name>.+)(?<username>.+)(?<username1>.+) => (?<username2>.+)(?<username>(?<username1>.+), mapping = {username2 => user_name, username => username, username1 => username1}
Exit Criteria
- Proper testing cover all the edge cases of re-writing - reference to [Enhancement] Error handling for illegal character usage in java regex named capture group #4434 (comment)
- Double check the debugging flows (e.g.
/_explainand server log) make sure this will not be lead into any confusions - Performance testing to make sure no notable performance downgrade
- Update the documentations if the behavior changed (both
parseandrex)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languageenhancementNew feature or requestNew feature or request
Type
Projects
Status
Not Started