Skip to content

CLP-JSON package fails to ingest unstructured text log files with events larger than 64 MiB #2176

@junhaoliao

Description

@junhaoliao

Bug

When using the CLP-JSON package (i.e., the clp-s storage engine, not the CLP-Text package) to ingest unstructured text log files via the --unstructured flag, the log-converter binary fails if any single log event exceeds 64 MiB. The error message is:

[error] Failed to convert input <path> to structured representation: generic - Numerical result out of range

Root cause

The log-converter converts unstructured text logs into KV-pair IR streams before feeding them to clp-s for compression. Its internal buffer starts at 64 KiB and doubles as needed to accommodate large log events, but is capped at 64 MiB (LogConverter.hpp:41):

static constexpr size_t cDefaultBufferSize{64ULL * 1024ULL};  // 64 KiB
static constexpr size_t cMaxBufferSize{64ULL * 1024ULL * 1024ULL};  // 64 MiB

When a log event exceeds this cap, grow_buffer_if_full() returns std::errc::result_out_of_range, which propagates through refill_buffer()convert_file() → and is logged by log_converter.cpp:71-78 as the error shown above.

Note: the KV-pair IR format also has additional size limits (logtype ~2 GiB, dictionary variable ~2 GiB, IR metadata 64 KiB) that would become relevant if cMaxBufferSize is raised. These are tracked separately in #2175

CLP version

3b4d13f

Environment

Any environment using the CLP-JSON package (clp-s storage engine) to ingest unstructured text log files with the --unstructured flag (via sbin/compress.sh or the WebUI compression page).

Reproduction steps

  1. Create a text log file containing a single log event larger than 64 MiB. For example, a single-line Hadoop/Hive log:
    $ wc -c hive-single-event-64MB.txt
    67108866 hive-single-event-64MB.txt
    $ wc -l hive-single-event-64MB.txt
    0 hive-single-event-64MB.txt
    
  2. Ingest the file using the CLP-JSON package with --unstructured (e.g., via sbin/compress.sh or the WebUI with the unstructured option).
  3. Observe the error in the compression worker stderr log:
    [error] Failed to convert input /mnt/logs/.../hive-single-event-64MB.txt to structured representation: generic - Numerical result out of range
    

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions