-
Notifications
You must be signed in to change notification settings - Fork 340
Regular expression should allowed input be limited? #15
Description
Hello! I went back and read the regular expression section for the JSON schema standard. It turns out that a full ECMA 262 (aka Javascript) regular expression engine is not required. The specification says that implementations SHOULD follow ECMA 262 but not that it MUST use it. The specification also recommends that "schema authors SHOULD limit themselves to the following regular expression tokens..."
One possible feature to this library is to add validation to enforce that regular expressions are limited to the tokens that are recommended in the schema. This is one way to ensure compatibility between Java regular expressions and ECMA 262 regular expressions. The recommended tokens are:
- individual Unicode characters, as defined by the JSON specification [RFC7159];
- simple character classes ([abc]), range character classes ([a-z]);
- complemented character classes ([^abc], [^a-z]);
- simple quantifiers: "+" (one or more), "" (zero or more), "?" (zero or one), and their lazy versions ("+?", "?", "??");
- range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
- the beginning-of-input ("^") and end-of-input ("$") anchors;
- simple grouping ("(...)") and alternation ("|").
It would be a little tricky to implement the logic that only accepted the above tokens. Is it a good idea? What if someone really wants to use Java specific regular expressions? Should the library stop them?