Bug
When compressing unstructured logs using CLP-JSON --unstructured (via log-converter + KV-IR ingestion), the compression worker fails with:
Failed to parse timestamp `2015-03-23 05:48:30,122 ` against known timestamp patterns.
This occurs on log lines where a timestamp is followed only by a trailing space and no message content (e.g., 2015-03-23 05:48:30,122 \n).
Root cause: The cTimestampSchema regex in LogConverter.cpp contains an optional timezone group ([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1} whose leading [ ]{0,1} greedily matches a trailing space even when no timezone content follows. This causes log_surgeon to capture the timestamp as "2015-03-23 05:48:30,122 " (with trailing space). The clp_s timestamp parser (TimestampParser.cpp:1841) then rejects it because it requires the pattern to consume the entire timestamp string, and no known pattern accounts for a trailing space.
Note that CLP-TEXT (clp) handles this case differently — its TimestampPattern::parse_timestamp only requires the format to be fully consumed (prefix match), not the entire line. The trailing space becomes part of the message content.
Affected file in the Hive 24hr dataset: hive-24hr/i-89ca0986/hive_formatted.log (line 1 is a timestamp-only line with a trailing space).
CLP version
3f66363
Environment
Ubuntu (Linux 6.8.0-107-generic)
Reproduction steps
- Download the Hive 24hr dataset.
- Compress using
clp-s with unstructured mode (which routes through log-converter + KV-IR ingestion).
- Compression fails on
hive-24hr/i-89ca0986/hive_formatted.log with the error:
Failed to parse timestamp `2015-03-23 05:48:30,122 ` against known timestamp patterns.
- The same file compresses successfully using CLP-TEXT (
clp).
Bug
When compressing unstructured logs using CLP-JSON
--unstructured(vialog-converter+ KV-IR ingestion), the compression worker fails with:This occurs on log lines where a timestamp is followed only by a trailing space and no message content (e.g.,
2015-03-23 05:48:30,122 \n).Root cause: The
cTimestampSchemaregex inLogConverter.cppcontains an optional timezone group([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1}whose leading[ ]{0,1}greedily matches a trailing space even when no timezone content follows. This causeslog_surgeonto capture the timestamp as"2015-03-23 05:48:30,122 "(with trailing space). Theclp_stimestamp parser (TimestampParser.cpp:1841) then rejects it because it requires the pattern to consume the entire timestamp string, and no known pattern accounts for a trailing space.Note that CLP-TEXT (
clp) handles this case differently — itsTimestampPattern::parse_timestamponly requires the format to be fully consumed (prefix match), not the entire line. The trailing space becomes part of the message content.Affected file in the Hive 24hr dataset:
hive-24hr/i-89ca0986/hive_formatted.log(line 1 is a timestamp-only line with a trailing space).CLP version
3f66363
Environment
Ubuntu (Linux 6.8.0-107-generic)
Reproduction steps
clp-swith unstructured mode (which routes throughlog-converter+ KV-IR ingestion).hive-24hr/i-89ca0986/hive_formatted.logwith the error:clp).