invalid UTF-8 sequences "crashes" the logstash 


**Logstash information**:

tested with  versions:
8.8.1,  7.12.1 and 7.17.10


**JVM** (e.g. `java -version`):
openjdk 11.0.19 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)


**OS version** (`uname -a` if on a Unix-like system):
Linux qadebuglog 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux

**Description of the problem including expected versus actual behavior**:

Pipeline
```
input {
  tcp {
    port => 33039
    codec => fluent
  }

filter {
    if [log] =~ "\A\{.+\}" {
      json {
        source => "log"
      }
      if "_jsonparsefailure" not in [tags] {
        mutate {
          remove_field => ["log"]
        }
      }
    }
    else {
      mutate {
        rename => ["log", "message"]
      }
    }
  }
}
output {
    file {
        path => "/var/log/logstash/data.log"
    }
}

```

If the "log" field contains a invalid UTF-8 sequence, logstash stopps itself (see logfile)
The issue happens on this line
 if [log] =~ "\A\{.+\}" {

yes, I think it would be possible to write the filters more elegant, but I think invalid UTF-8 shouldn't "crash" the logstash itself.

last year someone had posted this issue on https://discuss.elastic.co/t/input-tcp-codec-fluent-invalid-byte-sequence-in-utf-8-in-regex/296290  unfortunately there wasn't any response.   

**Steps to reproduce**:
deliver a json line like this via a fluentd instance 
`{"log": "�"}`

the invalid bytesequence is a 0x3c character



**Provide logs (if relevant)**:
```
[2023-06-13T15:55:50,315][ERROR][logstash.javapipeline    ][main] Pipeline worker error, the pipeline will be stopped {:pipeline_id=>"main", :error=>"(ArgumentError) invalid byte sequence in UTF-8", :exception=>Java::OrgJrubyExceptions::ArgumentError, :backtrace=>["org.jruby.RubyRegexp.match?(org/jruby/RubyRegexp.java:1170)", "RUBY.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:304)"], :thread=>"#<Thread:0x2c8de6c@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 sleep>"}
[2023-06-13T15:55:52,330][WARN ][logstash.javapipeline    ][main] Waiting for input plugin to close {:pipeline_id=>"main", :thread=>"#<Thread:0x2c8de6c@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
[2023-06-13T15:55:54,658][INFO ][logstash.javapipeline    ][main] Pipeline terminated {"pipeline.id"=>"main"}
[2023-06-13T15:55:54,965][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
[2023-06-13T15:55:54,972][INFO ][logstash.runner          ] Logstash shut down.

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

invalid UTF-8 sequences "crashes" the logstash #15091

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

invalid UTF-8 sequences "crashes" the logstash #15091

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions