Bug
When using the CLP-JSON package (i.e., the clp-s storage engine, not the CLP-Text package) to ingest unstructured text log files via the --unstructured flag, the log-converter binary fails if any single log event exceeds 64 MiB. The error message is:
[error] Failed to convert input <path> to structured representation: generic - Numerical result out of range
Root cause
The log-converter converts unstructured text logs into KV-pair IR streams before feeding them to clp-s for compression. Its internal buffer starts at 64 KiB and doubles as needed to accommodate large log events, but is capped at 64 MiB (LogConverter.hpp:41):
static constexpr size_t cDefaultBufferSize{64ULL * 1024ULL}; // 64 KiB
static constexpr size_t cMaxBufferSize{64ULL * 1024ULL * 1024ULL}; // 64 MiB
When a log event exceeds this cap, grow_buffer_if_full() returns std::errc::result_out_of_range, which propagates through refill_buffer() → convert_file() → and is logged by log_converter.cpp:71-78 as the error shown above.
Note: the KV-pair IR format also has additional size limits (logtype ~2 GiB, dictionary variable ~2 GiB, IR metadata 64 KiB) that would become relevant if cMaxBufferSize is raised. These are tracked separately in #2175
CLP version
3b4d13f
Environment
Any environment using the CLP-JSON package (clp-s storage engine) to ingest unstructured text log files with the --unstructured flag (via sbin/compress.sh or the WebUI compression page).
Reproduction steps
- Create a text log file containing a single log event larger than 64 MiB. For example, a single-line Hadoop/Hive log:
$ wc -c hive-single-event-64MB.txt
67108866 hive-single-event-64MB.txt
$ wc -l hive-single-event-64MB.txt
0 hive-single-event-64MB.txt
- Ingest the file using the CLP-JSON package with
--unstructured (e.g., via sbin/compress.sh or the WebUI with the unstructured option).
- Observe the error in the compression worker stderr log:
[error] Failed to convert input /mnt/logs/.../hive-single-event-64MB.txt to structured representation: generic - Numerical result out of range
Bug
When using the CLP-JSON package (i.e., the
clp-sstorage engine, not the CLP-Text package) to ingest unstructured text log files via the--unstructuredflag, thelog-converterbinary fails if any single log event exceeds 64 MiB. The error message is:Root cause
The
log-converterconverts unstructured text logs into KV-pair IR streams before feeding them toclp-sfor compression. Its internal buffer starts at 64 KiB and doubles as needed to accommodate large log events, but is capped at 64 MiB (LogConverter.hpp:41):When a log event exceeds this cap,
grow_buffer_if_full()returnsstd::errc::result_out_of_range, which propagates throughrefill_buffer()→convert_file()→ and is logged bylog_converter.cpp:71-78as the error shown above.Note: the KV-pair IR format also has additional size limits (logtype ~2 GiB, dictionary variable ~2 GiB, IR metadata 64 KiB) that would become relevant if
cMaxBufferSizeis raised. These are tracked separately in #2175CLP version
3b4d13f
Environment
Any environment using the CLP-JSON package (
clp-sstorage engine) to ingest unstructured text log files with the--unstructuredflag (viasbin/compress.shor the WebUI compression page).Reproduction steps
--unstructured(e.g., viasbin/compress.shor the WebUI with the unstructured option).