PatternValidator does not correctly validate input with newlines

Hi,

I have found that neither java regex based (`PatternValidatorJava`) nor joni based (`PatternValidatorEcma262`) pattern validation does not work correctly with newlines.

Any of implementation does not correctly interpret `^` and `$` anchors. I would expect that, when I use them at the start and end of pattern (eg. `^[a-z]{1,10}$`) they would not allow to pass any trailing newline character (eg. `abc\n` should **not** be matched). There are separate problems with both implementation, so I will describe them individually.

1. **Joni**
The problem is with default configuration for ECMAScript syntax in Joni library, which has multiline matching by default enabled. From the json-schema-validator code:
```java
  private boolean matches(String value) {
      if (compiledRegex == null) {
          return true;
      }

      byte[] bytes = value.getBytes();
      return compiledRegex.matcher(bytes).search(0, bytes.length, Option.NONE) >= 0;
  }
```
For the fast fix, the last line can be changed to:
```java
      return compiledRegex.matcher(bytes).search(0, bytes.length, -Option.MULTILINE) >= 0;
```
but I have also rised [issue](https://github.com/jruby/joni/issues/57) in the Joni library, as I believe that this is not correct default (ECMAScript has disabled multiline matching by default).

What's more interesting, because of enabled multiline matching, currently this input `\r\nab\nab\n` will **match** this pattern `^[a-z]{1,10}$` and pass validation. We want to allow single character-only word, and the entire sentence passes.

2. **Java built-in regex find vs match**
Current implementation of matching with java regex looks like this:
```java
  private boolean matches(String value) {
      return compiledPattern == null || compiledPattern.matcher(value).find();
  }
``` 
The problem is how the find method works. From the documentation: `Attempts to find the next subsequence of the input sequence that matches the pattern.` So this function matches subsequence of input and for the two JDKs, which I tried, returns `true` for  input `abab\n` and pattern `^[a-z]{1,10}$`, despite that I used `^` and `$` anchors. So any ending newline character will be always allowed for such patterns.

Possible solution is to use `matches` method which attempts to match the entire region against the pattern, but this will result in implicit adding `^` and `$` anchors to every pattern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PatternValidator does not correctly validate input with newlines #495

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PatternValidator does not correctly validate input with newlines #495

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions