Skip to content

rfc5424 syslog not parsed properly #2815

@ZigZagT

Description

@ZigZagT

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug

rfc5424 syslog can't be parsed with only regex.

Examples for incorrect behavior

  1. [] characters in message:
    Give this log message:
[e@123 ...][meta ...] [this is message]

because fluentd parse extradata with (\[(.*)\] , it will consider the real message part as extra data as well, which is incorrect

source at https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/parser_syslog.rb#L30
rfc reference at https://tools.ietf.org/html/rfc5424#section-6.3

  1. can't deal with unicode BOM
    as docmented here https://tools.ietf.org/html/rfc5424#section-6.4 the MSG

If a syslog application encodes MSG in UTF-8, the string MUST start
with the Unicode byte order mask (BOM), which for UTF-8 is ABNF
%xEF.BB.BF.

regex just can't deal with it.

Your Environment
fluentd v1.9.1

Proposals for fixing

For issues like given in the examples, I can fix it by slightly enhance the regex. However, since rfc5424 is indeed a binary-based protocol, regex is definitely not the way that we should go at the end of the day.

Instead, we need to find a way to properly implement rfc5424 parsing, probably by integrate with a 3rd party library. However, I can't help with this part in a reasonable short term since I don't know ruby at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions