Skip to content

invalid UTF-8 sequences "crashes" the logstash  #15091

@mlausch

Description

@mlausch

Logstash information:

tested with versions:
8.8.1, 7.12.1 and 7.17.10

JVM (e.g. java -version):
openjdk 11.0.19 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux qadebuglog 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Pipeline

input {
  tcp {
    port => 33039
    codec => fluent
  }

filter {
    if [log] =~ "\A\{.+\}" {
      json {
        source => "log"
      }
      if "_jsonparsefailure" not in [tags] {
        mutate {
          remove_field => ["log"]
        }
      }
    }
    else {
      mutate {
        rename => ["log", "message"]
      }
    }
  }
}
output {
    file {
        path => "/var/log/logstash/data.log"
    }
}

If the "log" field contains a invalid UTF-8 sequence, logstash stopps itself (see logfile)
The issue happens on this line
if [log] =~ "\A{.+}" {

yes, I think it would be possible to write the filters more elegant, but I think invalid UTF-8 shouldn't "crash" the logstash itself.

last year someone had posted this issue on https://discuss.elastic.co/t/input-tcp-codec-fluent-invalid-byte-sequence-in-utf-8-in-regex/296290 unfortunately there wasn't any response.

Steps to reproduce:
deliver a json line like this via a fluentd instance
{"log": "�"}

the invalid bytesequence is a 0x3c character

Provide logs (if relevant):

[2023-06-13T15:55:50,315][ERROR][logstash.javapipeline    ][main] Pipeline worker error, the pipeline will be stopped {:pipeline_id=>"main", :error=>"(ArgumentError) invalid byte sequence in UTF-8", :exception=>Java::OrgJrubyExceptions::ArgumentError, :backtrace=>["org.jruby.RubyRegexp.match?(org/jruby/RubyRegexp.java:1170)", "RUBY.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:304)"], :thread=>"#<Thread:0x2c8de6c@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 sleep>"}
[2023-06-13T15:55:52,330][WARN ][logstash.javapipeline    ][main] Waiting for input plugin to close {:pipeline_id=>"main", :thread=>"#<Thread:0x2c8de6c@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
[2023-06-13T15:55:54,658][INFO ][logstash.javapipeline    ][main] Pipeline terminated {"pipeline.id"=>"main"}
[2023-06-13T15:55:54,965][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
[2023-06-13T15:55:54,972][INFO ][logstash.runner          ] Logstash shut down.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions