Skip to content

[Enhancement] Support Illegal Character in Regex Name Group #4549

@RyanL1997

Description

@RyanL1997

Description

Currently, regex based extraction commands which are using java regex library has a limitation of including special characters such as (-, _ ,@) in the named captured group for creating a new column in the result site. Here are some related issues:

PR: #4434 enhanced the experience of unify the error handling of this for both parse and rex commands. Here is the current behavior

curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'{
    "query": "source=accounts | rex field=email \"(?<username>[^@]+)@(?<domain_name>[^.]+)\" | fields email, username, domain_name | head 3"
  }' | jq

{
  "error": {
    "reason": "Invalid Query",
    "details": "Invalid capture group name 'domain_name'. Java regex group names must start with a letter and contain only letters and digits.",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

However, Coming from the #4434 (comment) @ykmr1224 pointed we should be able to support the invalid characters by rewriting regex and map extracted values back to original name.

Expected Behavior

e.g.: (?<user_name>.+)(?<username>.+)(?<username1>.+) => (?<username2>.+)(?<username>(?<username1>.+), mapping = {username2 => user_name, username => username, username1 => username1}

Exit Criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languageenhancementNew feature or request

    Type

    No type

    Projects

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions