Skip to content

Regular expression should allowed input be limited? #15

@mspiegel

Description

@mspiegel

Hello! I went back and read the regular expression section for the JSON schema standard. It turns out that a full ECMA 262 (aka Javascript) regular expression engine is not required. The specification says that implementations SHOULD follow ECMA 262 but not that it MUST use it. The specification also recommends that "schema authors SHOULD limit themselves to the following regular expression tokens..."

One possible feature to this library is to add validation to enforce that regular expressions are limited to the tokens that are recommended in the schema. This is one way to ensure compatibility between Java regular expressions and ECMA 262 regular expressions. The recommended tokens are:

  • individual Unicode characters, as defined by the JSON specification [RFC7159];
  • simple character classes ([abc]), range character classes ([a-z]);
  • complemented character classes ([^abc], [^a-z]);
  • simple quantifiers: "+" (one or more), "" (zero or more), "?" (zero or one), and their lazy versions ("+?", "?", "??");
  • range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
  • the beginning-of-input ("^") and end-of-input ("$") anchors;
  • simple grouping ("(...)") and alternation ("|").

It would be a little tricky to implement the logic that only accepted the above tokens. Is it a good idea? What if someone really wants to use Java specific regular expressions? Should the library stop them?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions