Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
rfc5424 syslog can't be parsed with only regex.
Examples for incorrect behavior
[] characters in message:
Give this log message:
[e@123 ...][meta ...] [this is message]
because fluentd parse extradata with (\[(.*)\] , it will consider the real message part as extra data as well, which is incorrect
source at https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/parser_syslog.rb#L30
rfc reference at https://tools.ietf.org/html/rfc5424#section-6.3
- can't deal with unicode BOM
as docmented here https://tools.ietf.org/html/rfc5424#section-6.4 the MSG
If a syslog application encodes MSG in UTF-8, the string MUST start
with the Unicode byte order mask (BOM), which for UTF-8 is ABNF
%xEF.BB.BF.
regex just can't deal with it.
Your Environment
fluentd v1.9.1
Proposals for fixing
For issues like given in the examples, I can fix it by slightly enhance the regex. However, since rfc5424 is indeed a binary-based protocol, regex is definitely not the way that we should go at the end of the day.
Instead, we need to find a way to properly implement rfc5424 parsing, probably by integrate with a 3rd party library. However, I can't help with this part in a reasonable short term since I don't know ruby at all.
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
rfc5424 syslog can't be parsed with only regex.
Examples for incorrect behavior
[]characters in message:Give this log message:
because fluentd parse extradata with
(\[(.*)\], it will consider the real message part as extra data as well, which is incorrectsource at https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/parser_syslog.rb#L30
rfc reference at https://tools.ietf.org/html/rfc5424#section-6.3
as docmented here https://tools.ietf.org/html/rfc5424#section-6.4 the MSG
regex just can't deal with it.
Your Environment
fluentd v1.9.1
Proposals for fixing
For issues like given in the examples, I can fix it by slightly enhance the regex. However, since
rfc5424is indeed a binary-based protocol, regex is definitely not the way that we should go at the end of the day.Instead, we need to find a way to properly implement rfc5424 parsing, probably by integrate with a 3rd party library. However, I can't help with this part in a reasonable short term since I don't know ruby at all.