Skip to content

Trailing space in timestamp-only log lines causes timestamp parsing failure #2187

@junhaoliao

Description

@junhaoliao

Bug

When compressing unstructured logs using CLP-JSON --unstructured (via log-converter + KV-IR ingestion), the compression worker fails with:

Failed to parse timestamp `2015-03-23 05:48:30,122 ` against known timestamp patterns.

This occurs on log lines where a timestamp is followed only by a trailing space and no message content (e.g., 2015-03-23 05:48:30,122 \n).

Root cause: The cTimestampSchema regex in LogConverter.cpp contains an optional timezone group ([ ]{0,1}(UTC){0,1}([\+\-]\d{2}(:{0,1}\d{2}){0,1}){0,1}Z{0,1}){0,1} whose leading [ ]{0,1} greedily matches a trailing space even when no timezone content follows. This causes log_surgeon to capture the timestamp as "2015-03-23 05:48:30,122 " (with trailing space). The clp_s timestamp parser (TimestampParser.cpp:1841) then rejects it because it requires the pattern to consume the entire timestamp string, and no known pattern accounts for a trailing space.

Note that CLP-TEXT (clp) handles this case differently — its TimestampPattern::parse_timestamp only requires the format to be fully consumed (prefix match), not the entire line. The trailing space becomes part of the message content.

Affected file in the Hive 24hr dataset: hive-24hr/i-89ca0986/hive_formatted.log (line 1 is a timestamp-only line with a trailing space).

CLP version

3f66363

Environment

Ubuntu (Linux 6.8.0-107-generic)

Reproduction steps

  1. Download the Hive 24hr dataset.
  2. Compress using clp-s with unstructured mode (which routes through log-converter + KV-IR ingestion).
  3. Compression fails on hive-24hr/i-89ca0986/hive_formatted.log with the error:
    Failed to parse timestamp `2015-03-23 05:48:30,122 ` against known timestamp patterns.
    
  4. The same file compresses successfully using CLP-TEXT (clp).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions