refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader; Rename BufferedFileReader to BufferedReader.#523
Conversation
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
components/core/src/clp/BufferReader.hpp (1)
36-39: Document nullptr on empty peekpeek_buffer now sets buf=nullptr when peek_size==0. Please note this in the header to avoid caller assumptions.
Apply this doc tweak:
- * @param buf Returns a pointer to the remaining content in the buffer + * @param buf Returns a pointer to the remaining content in the buffer (nullptr if none)components/core/src/clp/BufferReader.cpp (2)
43-49: Avoid signed/unsigned mix; compute bytes read relative to buffer_headUsing ptrdiff_t then subtracting a size_t risks warnings and confusion. Compute directly from buffer_head.
Apply:
- auto const delim_pos{delim_ptr - m_internal_buf}; - num_bytes_read = (delim_pos - m_internal_buf_pos) + 1; + num_bytes_read = static_cast<size_t>(delim_ptr - buffer_head) + 1;
60-71: Set num_bytes_read on zero-length readsEarly-return path for num_bytes_to_read==0 should set num_bytes_read=0 for consistency.
Apply:
- if (0 == remaining_data_size) { - if (0 == num_bytes_to_read) { - num_bytes_read = 0; - return ErrorCode_Success; - } + if (0 == remaining_data_size) { + if (0 == num_bytes_to_read) { + num_bytes_read = 0; + return ErrorCode_Success; + }(Already correct here; ensure same convention in BufferedFileReader::try_read.)
components/core/src/clp/BufferedFileReader.cpp (1)
206-223: Critical: over-reading when checkpoint set inflates buffer massivelynum_bytes_to_read should be the delta to the next boundary, not the absolute rounded position. Current code can request gigabytes.
Apply:
- num_bytes_to_read = int_round_up_to_multiple( - buffer_end_pos + num_bytes_to_refill, - m_base_buffer_size - ); + num_bytes_to_read = int_round_up_to_multiple( + buffer_end_pos + num_bytes_to_refill, + m_base_buffer_size + ) - buffer_end_pos;components/core/src/clp/BufferedFileReader.hpp (1)
92-93: Fix copy-assignment signatureThe deleted copy-assignment should take const&, not by value.
Apply:
- auto operator=(BufferedFileReader) -> BufferedFileReader& = delete; + auto operator=(BufferedFileReader const&) -> BufferedFileReader& = delete;
♻️ Duplicate comments (3)
components/core/src/clp/BufferReader.cpp (1)
11-15: Bounds check on pos is correct (matches prior feedback)The pos > data_size guard prevents invalid construction. Good catch and resolution.
components/core/src/clp/BufferedFileReader.cpp (1)
159-163: EOF only when zero bytes read — correctMatches intended semantics discussed earlier. Keep as-is.
components/core/src/clp/clp/FileCompressor.cpp (1)
124-126: canonical() may throw; tracked separatelyException handling around std::filesystem::canonical was discussed and is tracked; fine to defer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/core/src/clp/BufferReader.cpp(1 hunks)components/core/src/clp/BufferReader.hpp(1 hunks)components/core/src/clp/BufferedFileReader.cpp(7 hunks)components/core/src/clp/BufferedFileReader.hpp(8 hunks)components/core/src/clp/clp/FileCompressor.cpp(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/BufferReader.cppcomponents/core/src/clp/BufferReader.hppcomponents/core/src/clp/clp/FileCompressor.cppcomponents/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/BufferReader.cppcomponents/core/src/clp/BufferReader.hppcomponents/core/src/clp/clp/FileCompressor.cppcomponents/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-12-10T16:56:33.545Z
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.
Applied to files:
components/core/src/clp/BufferReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/src/clp/BufferReader.cppcomponents/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
🧬 Code graph analysis (4)
components/core/src/clp/BufferReader.cpp (1)
components/core/src/clp/BufferReader.hpp (8)
BufferReader(26-26)BufferReader(28-28)BufferReader(30-30)pos(77-77)pos(83-83)OperationFailed(16-17)buf(39-39)buf(68-69)
components/core/src/clp/clp/FileCompressor.cpp (2)
components/core/src/clp/clp/FileCompressor.hpp (6)
target_data_size_of_dicts(43-50)target_data_size_of_dicts(68-77)target_data_size_of_dicts(79-87)target_data_size_of_dicts(89-97)target_data_size_of_dicts(110-118)target_data_size_of_dicts(131-139)components/core/src/clp/LibarchiveReader.hpp (1)
reader(47-47)
components/core/src/clp/BufferedFileReader.cpp (5)
components/core/src/clp/BufferedFileReader.hpp (13)
BufferedFileReader(78-81)BufferedFileReader(83-84)BufferedFileReader(83-83)BufferedFileReader(87-87)BufferedFileReader(90-90)BufferedFileReader(91-91)OperationFailed(43-49)OperationFailed(51-58)buf(110-110)buf(161-162)pos(133-133)pos(149-149)pos(208-208)components/core/src/clp/BufferReader.hpp (5)
OperationFailed(16-17)buf(39-39)buf(68-69)pos(77-77)pos(83-83)components/core/src/clp/StringReader.cpp (4)
try_seek_from_begin(41-48)try_seek_from_begin(41-41)try_read(18-39)try_read(18-18)components/core/src/clp/BufferReader.cpp (4)
try_seek_from_begin(83-89)try_seek_from_begin(83-83)try_read(60-81)try_read(60-61)components/core/src/clp/FileReader.cpp (4)
try_seek_from_begin(53-60)try_seek_from_begin(53-53)try_read(34-51)try_read(34-34)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
BufferedFileReader(21-44)components/core/src/clp/BufferReader.hpp (5)
nodiscard(20-22)nodiscard(33-33)nodiscard(99-101)pos(77-77)pos(83-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (11)
components/core/src/clp/BufferReader.hpp (1)
26-27: Default constructor addition looks goodEnables safe default-initialization for embedders (e.g., BufferedFileReader). No issues.
components/core/src/clp/BufferReader.cpp (1)
21-22: Confirm callers handle nullptr from peek_bufferBehaviour changed to return nullptr when empty. Ensure all callers (new and existing) check peek_size before dereferencing buf.
components/core/src/clp/BufferedFileReader.cpp (2)
26-33: Constructor validation LGTMNull-check and error messaging are appropriate; base_buffer_size multiple-of-page guard is good.
242-245: Reinitializing BufferReader on EOF is correctEnsures peek returns nullptr when empty; consistent with BufferReader change.
components/core/src/clp/clp/FileCompressor.cpp (3)
129-130: BufferedFileReader adoption LGTMConstruction via make_unique aligns with the new ReaderInterface path.
132-149: Error logging paths OKDifferentiating errno vs generic reader errors is reasonable here. No change requested.
189-220: Wrapper parse_and_encode reads wellClear split between heuristic and library paths; good API.
components/core/src/clp/BufferedFileReader.hpp (4)
72-85: Ctor overloads and null-check (in .cpp) align with designAPI shape is good; default-size delegating ctor is convenient.
95-101: Doc: return references to internal functions“Same as refill_reader_buffer” is fine here; consistent with implementation.
145-147: Doc correctnessReferencing both BufferReader::try_seek_from_begin and ReaderInterface::try_seek_from_begin is appropriate given the code path.
204-209: get_buffer_end_pos and update_pos APIs are clearGood encapsulation of position math.
BufferedFileReader wrap a ReaderInterface instead of a FileReader.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/tests/test-BufferedFileReader.cpp (1)
80-84: Assert zero bytes read at EOFWhen
try_readreturnsErrorCode_EndOfFile,num_bytes_readshould be 0. Tighten the assertion to catch regressions.REQUIRE(ErrorCode_EndOfFile == reader.try_read(read_buf.begin() + buf_pos, num_bytes_to_read, num_bytes_read)); + REQUIRE(0 == num_bytes_read);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/tests/test-BufferedFileReader.cpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/test-BufferedFileReader.cpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/tests/test-BufferedFileReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/tests/test-BufferedFileReader.cpp
🧬 Code graph analysis (1)
components/core/tests/test-BufferedFileReader.cpp (4)
components/core/src/clp/BufferedFileReader.hpp (6)
BufferedFileReader(78-81)BufferedFileReader(83-84)BufferedFileReader(83-83)BufferedFileReader(87-87)BufferedFileReader(90-90)BufferedFileReader(91-91)components/core/src/clp/BufferedFileReader.cpp (1)
BufferedFileReader(21-44)components/core/src/clp/FileReader.hpp (2)
FileReader(28-28)FileReader(30-30)components/core/src/clp/FileReader.cpp (2)
FileReader(15-23)FileReader(25-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (2)
components/core/tests/test-BufferedFileReader.cpp (2)
45-45: Good adoption of ReaderInterface in testsConstructing
BufferedFileReaderwithmake_unique<FileDescriptorReader>(...)matches the refactor intent and prevents double-buffering.
291-299: LGTM: Non-zero initial position coverageSeeking the
FileDescriptorReaderbefore handing it toBufferedFileReaderprovides solid coverage for non-zero starting offsets; the mirroredFileReaderreference is a good oracle.Please confirm this path also runs under TSAN/ASAN configurations in CI to catch any hidden UB in the reader stack.
| #include <array> | ||
| #include <random> | ||
|
|
||
| #include <boost/filesystem.hpp> | ||
| #include <catch2/catch_test_macros.hpp> | ||
| #include <catch2/generators/catch_generators.hpp> | ||
|
|
||
| #include "../src/clp/BufferedFileReader.hpp" | ||
| #include "../src/clp/FileDescriptorReader.hpp" | ||
| #include "../src/clp/FileReader.hpp" |
There was a problem hiding this comment.
🧹 Nitpick
Add missing headers for std::max and memcmp
std::max (used on Line 158) and memcmp (used multiple times) require <algorithm> and <cstring> respectively; avoid relying on transitive includes.
#include <array>
+#include <algorithm>
#include <random>
+#include <cstring>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <array> | |
| #include <random> | |
| #include <boost/filesystem.hpp> | |
| #include <catch2/catch_test_macros.hpp> | |
| #include <catch2/generators/catch_generators.hpp> | |
| #include "../src/clp/BufferedFileReader.hpp" | |
| #include "../src/clp/FileDescriptorReader.hpp" | |
| #include "../src/clp/FileReader.hpp" | |
| #include <array> | |
| #include <algorithm> | |
| #include <random> | |
| #include <cstring> | |
| #include <boost/filesystem.hpp> | |
| #include <catch2/catch_test_macros.hpp> | |
| #include <catch2/generators/catch_generators.hpp> | |
| #include "../src/clp/BufferedFileReader.hpp" | |
| #include "../src/clp/FileDescriptorReader.hpp" | |
| #include "../src/clp/FileReader.hpp" |
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp lines 1 to 10, the test
uses std::max and memcmp but is missing the required headers; add #include
<algorithm> for std::max and #include <cstring> for memcmp at the top of the
file (with the other standard headers) to avoid relying on transitive includes.
| auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>(); | ||
| auto& test_data = *test_data_uniq_ptr; | ||
| for (size_t i = 0; i < test_data.size(); ++i) { |
There was a problem hiding this comment.
🧹 Nitpick
Consider std::vector over heap-allocated std::array
unique_ptr<std::array<...>> adds template-size coupling and indirection. std::vector<char> is simpler and dynamic while keeping contiguous storage.
Example for the first test block:
- auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>();
- auto& test_data = *test_data_uniq_ptr;
+ std::vector<char> test_data(test_data_size);- auto read_buf_uniq_ptr = make_unique<std::array<char, test_data_size>>();
- auto& read_buf = *read_buf_uniq_ptr;
+ std::vector<char> read_buf(test_data_size);Similarly for the delimiter test’s test_data.
Also applies to: 42-44, 274-276
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 29-31 (and
similarly at 42-44 and 274-276), replace the heap-allocated
unique_ptr<std::array<char, test_data_size>> pattern with a plain
std::vector<char> to avoid template-size coupling and extra indirection: create
a std::vector<char> test_data; resize or reserve it to test_data_size, populate
via test_data[i] or std::fill, and use test_data.data() where a contiguous
pointer is needed; apply the same change to the delimiter test and the other
referenced blocks, adjusting types and accesses accordingly.
| string const test_file_path{"BufferedFileReader.test"}; | ||
| // Write to test file |
There was a problem hiding this comment.
Use unique temp paths to avoid test flakiness and collisions in parallel runs
Hardcoded filenames can collide across parallel CI shards or stale artefacts. Prefer boost::filesystem::temp_directory_path() + unique_path().
- string const test_file_path{"BufferedFileReader.test"};
+ auto const tmp_dir = boost::filesystem::temp_directory_path();
+ string const test_file_path = (tmp_dir / boost::filesystem::unique_path("BufferedFileReader-%%%%%%%%.test")).string();- string const test_file_path{"BufferedFileReader.delimiter.test"};
+ auto const tmp_dir = boost::filesystem::temp_directory_path();
+ string const test_file_path = (tmp_dir / boost::filesystem::unique_path("BufferedFileReaderDelimiter-%%%%%%%%.test")).string();Also applies to: 282-283
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 35-36 (and
also update the occurrence at lines 282-283), the test uses a hardcoded filename
"BufferedFileReader.test" which can collide in parallel CI or with stale
artifacts; replace these hardcoded paths by constructing unique temp file paths
using boost::filesystem::temp_directory_path() /
boost::filesystem::unique_path() (or equivalent APIs), create/write the file at
that unique path for the test, and ensure cleanup after the test to avoid
residue.
| // Initialize random number generator | ||
| std::random_device rd; | ||
| std::mt19937 gen(rd()); | ||
| std::uniform_int_distribution<> uniformly_distributed_alphabet('a', 'a' + cNumAlphabets - 1); | ||
|
|
There was a problem hiding this comment.
Seed RNG deterministically for reproducible tests
Using std::random_device makes failures non-reproducible. Use a fixed seed.
- std::random_device rd;
- std::mt19937 gen(rd());
+ // Fixed seed for reproducibility in CI
+ std::mt19937 gen(0xC0FFEEu);🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 267 to 271,
the RNG is seeded with std::random_device which makes test runs
non-deterministic; replace the nondeterministic seeding with a fixed
deterministic seed (e.g. use a constant seed value or configurable deterministic
seed) when constructing std::mt19937 so the uniform distribution produces
reproducible sequences across test runs.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/core/src/clp/BufferedFileReader.hpp (1)
90-93: Fix deleted copy-assignment signatureDeleting operator=(BufferedFileReader) (by value) is non-idiomatic and redundant with the rvalue overload. Delete the const& overload instead.
- auto operator=(BufferedFileReader) -> BufferedFileReader& = delete; - auto operator=(BufferedFileReader&&) -> BufferedFileReader& = delete; + auto operator=(BufferedFileReader const&) -> BufferedFileReader& = delete; + auto operator=(BufferedFileReader&&) -> BufferedFileReader& = delete;components/core/src/clp/BufferedFileReader.cpp (1)
214-223: Critical: over‑read when checkpoint is set (absolute vs delta math)num_bytes_to_read is computed as an absolute end position, then used as a byte count for read/resize. This can trigger huge reads/resizes.
- num_bytes_to_read = int_round_up_to_multiple( - buffer_end_pos + num_bytes_to_refill, - m_base_buffer_size - ); - // Grow the buffer if necessary - if (num_bytes_to_read > available_buffer_space) { - m_buffer.resize(data_size + num_bytes_to_read); - } + auto const desired_end_pos = int_round_up_to_multiple( + buffer_end_pos + num_bytes_to_refill, + m_base_buffer_size + ); + num_bytes_to_read = desired_end_pos - buffer_end_pos; + // Grow the buffer if necessary + if (num_bytes_to_read > available_buffer_space) { + m_buffer.resize(data_size + num_bytes_to_read); + } next_buffer_pos = data_size;
♻️ Duplicate comments (5)
components/core/src/clp/BufferedFileReader.hpp (2)
158-159: Doc correctly references BufferReader::try_readMatches the implementation detail noted earlier; keep as-is.
74-81: Document constructor params and null-check behaviourDocstring is missing details for reader_interface and the null-check enforced in the .cpp.
- * @param reader_interface + * @param reader_interface Shared ownership of the underlying input source; must be non-null. + * Throws OperationFailed(ErrorCode_BadParam) if null.components/core/src/clp/BufferedFileReader.cpp (3)
57-59: Fix invalid comparison: std::optional<size_t> vs size_tThis is ill‑formed and won’t compile. Compare the contained value.
- if (m_checkpoint_pos.has_value() && m_checkpoint_pos < m_pos + if (m_checkpoint_pos.has_value() && m_checkpoint_pos.value() < m_pos && m_buffer_reader.get_buffer_size() != m_base_buffer_size)
39-43: Non‑zero start bug: initialise buffer begin to reader positionIf the underlying reader starts at a non‑zero position, the buffer’s absolute origin must match it. Otherwise, absolute/relative math (e.g., get_buffer_end_pos, seeks, refills) is inconsistent.
Apply:
- m_pos = m_reader->get_pos(); - m_buffer_begin_pos = 0; + m_pos = m_reader->get_pos(); + m_buffer_begin_pos = m_pos; m_buffer_reader = BufferReader{m_buffer.data(), 0}; m_highest_read_pos = m_pos;
130-135: Zero‑length read semantics and nullptr handlingFor num_bytes_to_read==0, set num_bytes_read=0 and succeed regardless of buf. Current order returns BadParam for nullptr + 0.
- if (nullptr == buf) { - return ErrorCode_BadParam; - } - if (num_bytes_to_read == 0) { - return ErrorCode_Success; - } + if (0 == num_bytes_to_read) { + num_bytes_read = 0; + return ErrorCode_Success; + } + if (nullptr == buf) { + return ErrorCode_BadParam; + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/BufferedFileReader.cpp(7 hunks)components/core/src/clp/BufferedFileReader.hpp(8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/src/clp/BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hpp
🧬 Code graph analysis (2)
components/core/src/clp/BufferedFileReader.cpp (6)
components/core/src/clp/BufferedFileReader.hpp (13)
BufferedFileReader(78-81)BufferedFileReader(83-84)BufferedFileReader(83-83)BufferedFileReader(87-87)BufferedFileReader(90-90)BufferedFileReader(91-91)OperationFailed(43-49)OperationFailed(51-58)buf(110-110)buf(161-162)pos(133-133)pos(149-149)pos(208-208)components/core/src/clp/BufferReader.hpp (5)
OperationFailed(16-17)buf(39-39)buf(68-69)pos(77-77)pos(83-83)components/core/src/clp/StringReader.cpp (4)
try_seek_from_begin(41-48)try_seek_from_begin(41-41)try_read(18-39)try_read(18-18)components/core/src/clp/BufferReader.cpp (4)
try_seek_from_begin(83-89)try_seek_from_begin(83-83)try_read(60-81)try_read(60-61)components/core/src/clp/FileReader.cpp (4)
try_seek_from_begin(53-60)try_seek_from_begin(53-53)try_read(34-51)try_read(34-34)components/core/src/clp/math_utils.hpp (2)
int_round_up_to_multiple(13-18)int_round_up_to_multiple(13-13)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
BufferedFileReader(20-43)components/core/src/clp/BufferReader.hpp (5)
nodiscard(20-22)nodiscard(33-33)nodiscard(99-101)pos(77-77)pos(83-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (8)
components/core/src/clp/BufferedFileReader.hpp (6)
83-85: LGTM: delegating ctor uses default buffer sizeThe delegating constructor is clean and explicit.
86-88: LGTM: trivial virtual dtorDefaulted and marked override; good.
98-101: LGTM: failure semantics reference the right helperReturning the same error as refill_reader_buffer keeps docs precise.
172-173: LGTM: ties delimiter read failures to refill pathAccurately describes the failure source.
186-188: LGTM: refill helper clearly states propagated error sourceThe contract aligns with wrapping ReaderInterface.
205-206: LGTM: buffer end computed from BufferReader sizeUsing get_buffer_size() keeps this in sync with rebinds.
components/core/src/clp/BufferedFileReader.cpp (2)
241-243: Good: buffer reader reset points at newly appended dataRe‑initialising BufferReader with size and starting pos at next_buffer_pos correctly exposes only the newly refilled region.
If ReaderInterface ever returns Success with num_bytes_read==0 for a positive request, the read loop could spin. Please confirm the contract guarantees EndOfFile in that case.
248-257: LGTM: buffer compaction is correctUsing get_pos()/get_buffer_size() to compute remaining, erasing the consumed prefix, updating m_buffer_begin_pos, and re‑seeding BufferReader is sound.
| * @param file_pos | ||
| * @return \p file_pos relative to the beginning of the buffer | ||
| * @return file_pos relative to the beginning of the buffer | ||
| */ | ||
| [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t { | ||
| return file_pos - m_buffer_begin_pos; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick
Guard against underflow in get_buffer_relative_pos
If file_pos < m_buffer_begin_pos, subtraction underflows. Add a debug assert and document the precondition.
- [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
- return file_pos - m_buffer_begin_pos;
- }
+ [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
+ // Precondition: file_pos >= m_buffer_begin_pos
+ assert(file_pos >= m_buffer_begin_pos);
+ return file_pos - m_buffer_begin_pos;
+ }Add the include near the other headers:
#include <cstddef>
+#include <cassert>🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 197 to 202,
get_buffer_relative_pos can underflow when file_pos < m_buffer_begin_pos; add a
debug assertion of the precondition (e.g., assert(file_pos >=
m_buffer_begin_pos)) and update the comment to document the precondition that
file_pos must be >= m_buffer_begin_pos, and add #include <cassert> alongside the
other headers near the top of the file.
| auto update_pos(size_t pos) -> void; | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick
Document update_pos responsibilities
Please document whether update_pos also advances m_highest_read_pos and any invariants (e.g., pos >= m_pos).
Proposed doc stub:
- auto update_pos(size_t pos) -> void;
+ // Advances m_pos to pos; if pos exceeds m_highest_read_pos, updates it as well.
+ // Precondition: pos >= m_buffer_begin_pos and pos <= get_buffer_end_pos().
+ auto update_pos(size_t pos) -> void;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto update_pos(size_t pos) -> void; | |
| // Advances m_pos to pos; if pos exceeds m_highest_read_pos, updates it as well. | |
| // Precondition: pos >= m_buffer_begin_pos and pos <= get_buffer_end_pos(). | |
| auto update_pos(size_t pos) -> void; |
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 208-209, add a
doc comment for update_pos that states its responsibilities: specify whether it
advances m_highest_read_pos (yes/no), the precondition invariant (e.g., pos >=
m_pos), any postconditions (new m_pos == pos and when m_highest_read_pos is
updated to max(old, pos)), side effects (no IO, no allocation), and
thread-safety/exception behavior; ensure the comment is concise, uses the same
doxygen/style convention as surrounding methods, and include a short example or
note if callers must validate pos before calling.
| size_t m_pos{0}; | ||
| std::shared_ptr<ReaderInterface> m_reader; | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick
Centralize BufferReader rebind after buffer reallocations
Since BufferReader holds a raw pointer into m_buffer, any resize that changes capacity must rebind it. Consider a small helper to prevent drift.
private:
+ // Rebinds m_buffer_reader to the current m_buffer data and size.
+ inline void rebind_buffer_reader(size_t size) {
+ m_buffer_reader = BufferReader{m_buffer.data(), size};
+ }Then use rebind_buffer_reader(...) in places that resize m_buffer (e.g., after drop_content_before_current_pos()).
Also applies to: 220-221
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 214-216 (and also
220-221), BufferReader holds a raw pointer into m_buffer so any reallocation of
m_buffer can invalidate that pointer; add a small private helper method
rebind_buffer_reader() that resets the BufferReader to point at m_buffer.data()
+ m_pos (and any other needed offsets) and call this helper immediately after
every place that can change m_buffer capacity/resize (e.g., after
drop_content_before_current_pos() and any other buffer-resizing code paths) to
ensure the raw pointer is updated and prevent pointer drift.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp/BufferedFileReader.cpp (1)
214-223: Over-read in checkpoint mode: num_bytes_to_read computed as absolute position.This can trigger unnecessary large reads and buffer growth. Compute delta from buffer_end_pos.
- num_bytes_to_read = int_round_up_to_multiple( - buffer_end_pos + num_bytes_to_refill, - m_base_buffer_size - ); + auto const target_end = int_round_up_to_multiple( + buffer_end_pos + num_bytes_to_refill, + m_base_buffer_size + ); + num_bytes_to_read = target_end - buffer_end_pos;
♻️ Duplicate comments (5)
components/core/tests/test-BufferedFileReader.cpp (4)
1-7: Add missing headers for std::max and memcmp.Compilation relies on transitive includes; add the explicit headers.
#include <array> +#include <algorithm> // std::max #include <random> +#include <cstring> // std::memcmp
29-31: Prefer std::vector over heap-allocated std::array.Simpler, fewer indirections; contiguous storage preserved.
- auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>(); - auto& test_data = *test_data_uniq_ptr; + std::vector<char> test_data(test_data_size); @@ - auto read_buf_uniq_ptr = make_unique<std::array<char, test_data_size>>(); - auto& read_buf = *read_buf_uniq_ptr; + std::vector<char> read_buf(test_data_size); @@ - auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>(); - auto& test_data = *test_data_uniq_ptr; + std::vector<char> test_data(test_data_size);Also applies to: 42-44, 274-276
35-35: Use unique temp paths to avoid collisions in parallel runs.Hardcoded filenames are flaky in CI.
- string const test_file_path{"BufferedFileReader.test"}; + auto const tmp_dir = boost::filesystem::temp_directory_path(); + string const test_file_path = + (tmp_dir / boost::filesystem::unique_path("BufferedFileReader-%%%%%%%%.test")).string(); @@ - string const test_file_path{"BufferedFileReader.delimiter.test"}; + auto const tmp_dir = boost::filesystem::temp_directory_path(); + string const test_file_path = + (tmp_dir / boost::filesystem::unique_path("BufferedFileReaderDelimiter-%%%%%%%%.test")).string();Also applies to: 282-282
267-271: Seed RNG deterministically for reproducible tests.Avoid non‑deterministic failures.
- std::random_device rd; - std::mt19937 gen(rd()); + // Fixed seed for reproducibility + std::mt19937 gen(0xC0FFEEu);components/core/src/clp/BufferedFileReader.hpp (1)
145-147: Resolve conflicting @return lines.This method can fail from BufferReader seek or underlying ReaderInterface operations; document unambiguously.
- * @return Same as BufferReader::try_seek_from_begin if it fails - * @return Same as ReaderInterface::try_seek_from_begin if it fails + * @return Same as BufferReader::try_seek_from_begin; may also return the error from + * ReaderInterface::try_seek_from_begin/refill when advancing the underlying reader.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/BufferedFileReader.cpp(7 hunks)components/core/src/clp/BufferedFileReader.hpp(8 hunks)components/core/tests/test-BufferedFileReader.cpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/test-BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hppcomponents/core/src/clp/BufferedFileReader.cpp
🧠 Learnings (5)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/tests/test-BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hppcomponents/core/src/clp/BufferedFileReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/tests/test-BufferedFileReader.cppcomponents/core/src/clp/BufferedFileReader.hppcomponents/core/src/clp/BufferedFileReader.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/tests/test-BufferedFileReader.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/tests/test-BufferedFileReader.cpp
🧬 Code graph analysis (3)
components/core/tests/test-BufferedFileReader.cpp (3)
components/core/src/clp/BufferedFileReader.cpp (1)
BufferedFileReader(20-43)components/core/src/clp/BufferedFileReader.hpp (6)
BufferedFileReader(78-81)BufferedFileReader(83-84)BufferedFileReader(83-83)BufferedFileReader(87-87)BufferedFileReader(90-90)BufferedFileReader(91-91)components/core/src/clp/FileReader.cpp (2)
FileReader(15-23)FileReader(25-32)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
BufferedFileReader(20-43)components/core/src/clp/BufferReader.hpp (5)
nodiscard(20-22)nodiscard(33-33)nodiscard(99-101)pos(77-77)pos(83-83)
components/core/src/clp/BufferedFileReader.cpp (4)
components/core/src/clp/BufferedFileReader.hpp (13)
BufferedFileReader(78-81)BufferedFileReader(83-84)BufferedFileReader(83-83)BufferedFileReader(87-87)BufferedFileReader(90-90)BufferedFileReader(91-91)OperationFailed(43-49)OperationFailed(51-58)buf(110-110)buf(161-162)pos(133-133)pos(149-149)pos(213-213)components/core/src/clp/BufferReader.hpp (5)
OperationFailed(16-17)buf(39-39)buf(68-69)pos(77-77)pos(83-83)components/core/src/clp/BufferReader.cpp (4)
try_seek_from_begin(83-89)try_seek_from_begin(83-83)try_read(60-81)try_read(60-61)components/core/src/clp/math_utils.hpp (2)
int_round_up_to_multiple(13-18)int_round_up_to_multiple(13-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-14, true)
| auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode { | ||
| if (-1 == m_fd) { | ||
| return ErrorCode_NotInit; | ||
| } | ||
| if (m_buffer_reader->get_buffer_size() > 0) { | ||
| if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) { | ||
| return ErrorCode_Success; | ||
| } | ||
| return refill_reader_buffer(m_base_buffer_size); |
There was a problem hiding this comment.
🧹 Nitpick
Refill predicate reads clearer as “empty buffer” check.
Current condition is correct; minor readability tweak optional.
- if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) {
+ // If there is unread data, no refill is needed
+ if (m_buffer_reader.get_pos() != m_buffer_reader.get_buffer_size()) {
return ErrorCode_Success;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode { | |
| if (-1 == m_fd) { | |
| return ErrorCode_NotInit; | |
| } | |
| if (m_buffer_reader->get_buffer_size() > 0) { | |
| if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) { | |
| return ErrorCode_Success; | |
| } | |
| return refill_reader_buffer(m_base_buffer_size); | |
| auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode { | |
| // If there is unread data, no refill is needed | |
| if (m_buffer_reader.get_pos() != m_buffer_reader.get_buffer_size()) { | |
| return ErrorCode_Success; | |
| } | |
| return refill_reader_buffer(m_base_buffer_size); | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.cpp around lines 45 to 49, the
refill predicate is correct but reads less clearly; change the buffer-empty
check to a clearer expression (e.g., if (m_buffer_reader.get_pos() <
m_buffer_reader.get_buffer_size()) return ErrorCode_Success; or explicitly check
for empty with if (m_buffer_reader.get_pos() >=
m_buffer_reader.get_buffer_size()) then call refill_reader_buffer), keeping the
existing behavior but making the intent (“buffer empty” vs “not empty”) more
readable.
| if (nullptr == buf) { | ||
| return ErrorCode_BadParam; | ||
| } | ||
| num_bytes_read = 0; | ||
| if (num_bytes_to_read == 0) { | ||
| return ErrorCode_Success; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick
Align null-buffer handling with zero-byte reads.
Permit buf == nullptr when num_bytes_to_read == 0 to match BufferReader semantics and avoid surprising BadParam.
- if (nullptr == buf) {
- return ErrorCode_BadParam;
- }
- num_bytes_read = 0;
- if (num_bytes_to_read == 0) {
+ num_bytes_read = 0;
+ if (0 == num_bytes_to_read) {
return ErrorCode_Success;
}
+ if (nullptr == buf) {
+ return ErrorCode_BadParam;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (nullptr == buf) { | |
| return ErrorCode_BadParam; | |
| } | |
| num_bytes_read = 0; | |
| if (num_bytes_to_read == 0) { | |
| return ErrorCode_Success; | |
| } | |
| num_bytes_read = 0; | |
| if (0 == num_bytes_to_read) { | |
| return ErrorCode_Success; | |
| } | |
| if (nullptr == buf) { | |
| return ErrorCode_BadParam; | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.cpp around lines 130 to 136, the
code rejects a null buf before checking for a zero-length read; change the logic
so that zero-byte reads return ErrorCode_Success (and set num_bytes_read = 0)
even if buf is nullptr—either move the num_bytes_to_read == 0 check before the
buf null check or explicitly allow buf == nullptr when num_bytes_to_read == 0,
keeping the existing behavior for non-zero reads.
| #define CLP_BUFFEREDFILEREADER_HPP | ||
|
|
||
| #include <cstdio> | ||
| #include <cstddef> |
There was a problem hiding this comment.
🧹 Nitpick
Guard get_buffer_relative_pos against underflow (debug).
Assert the precondition to catch misuse in debug builds.
-#include <cstddef>
+#include <cstddef>
+#include <cassert>
@@
- [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
- return file_pos - m_buffer_begin_pos;
- }
+ [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
+ assert(file_pos >= m_buffer_begin_pos);
+ return file_pos - m_buffer_begin_pos;
+ }Also applies to: 200-206
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around line 4 and also apply
to lines ~200-206, add a debug-only precondition assert in
get_buffer_relative_pos to guard against unsigned underflow when computing the
relative position; specifically assert that the absolute position (or pointer)
is not before the buffer start (or that pos >= buffer_start) before performing
the subtraction, so misuse is caught in debug builds and the subtraction cannot
underflow.
| // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset | ||
| buffered_file_reader.clear_checkpoint(); | ||
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | ||
|
|
||
| // Validate that a FileReader and a BufferedFileReader return the same strings (split by |
There was a problem hiding this comment.
🧹 Nitpick
Also test set+clear checkpoint with no reads.
Current check only clears without a prior set; add set_checkpoint() to cover the regression discussed in earlier reviews.
- // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
- buffered_file_reader.clear_checkpoint();
+ // Validate setting then clearing a checkpoint without any reading keeps the beginning offset
+ buffered_file_reader.set_checkpoint();
+ buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset | |
| buffered_file_reader.clear_checkpoint(); | |
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | |
| // Validate that a FileReader and a BufferedFileReader return the same strings (split by | |
| // Validate setting then clearing a checkpoint without any reading keeps the beginning offset | |
| buffered_file_reader.set_checkpoint(); | |
| buffered_file_reader.clear_checkpoint(); | |
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | |
| // Validate that a FileReader and a BufferedFileReader return the same strings (split by |
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 300 to 304,
the test only calls clear_checkpoint() without first calling set_checkpoint();
update the test to call buffered_file_reader.set_checkpoint() immediately before
buffered_file_reader.clear_checkpoint() so it validates the set+clear checkpoint
behavior when no reads have occurred, and keep the subsequent REQUIRE asserting
reader_begin_offset == buffered_file_reader.get_pos().
LinZhihao-723
left a comment
There was a problem hiding this comment.
Did a large refactoring on the latest commit, which includes:
- Renamed it to
BufferedReader, since the source is no longer a file. - Updated the docstring to reflect the latest changes.
- Use
source readerto refer to the underlying reader. - Polish some docstrings.
- Update the docstring to match our latest docstring guideline.
- Use
- Renamed some variables to add
in_srcto indicate that the position is relative to the source reader.
Notice that I didn't do any logical refactoring on the existing implementation. All unit tests passed locally, but I can't conclude all edge cases are covered.
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/core/src/clp/clp/FileCompressor.hpp (1)
100-110: Doc nit: clarify wording (“archive” vs. “archive_writer”).“Tries to compress the given file as if it were a generic archive_writer” is confusing. It’s compressing input “as an archive”, not “as an archive_writer”.
- * Tries to compress the given file as if it were a generic archive_writer + * Tries to compress the given file as if it were a generic archivecomponents/core/tests/test-BufferedReader.cpp (2)
61-63: Fix outdated comment.- // Read a large chunk of data, so BufferedFileReader will refill the + // Read a large chunk of data, so BufferedReader will refill the
180-184: Use try_seek_from_begin consistently (and assert result).Direct seek_from_begin here bypasses the test’s error checks.
- reader.seek_from_begin(seek_pos_2); + REQUIRE(ErrorCode_Success == reader.try_seek_from_begin(seek_pos_2));
♻️ Duplicate comments (1)
components/core/src/clp/clp/FileCompressor.cpp (1)
124-124: Known issue: Handle exceptions fromstd::filesystem::canonicalAs discussed in previous reviews,
std::filesystem::canonicalcan throw exceptions if the file doesn't exist or there are permission issues. This is tracked in Issue #553.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
components/core/CMakeLists.txt(2 hunks)components/core/src/clp/BufferedReader.cpp(1 hunks)components/core/src/clp/BufferedReader.hpp(1 hunks)components/core/src/clp/clp/CMakeLists.txt(1 hunks)components/core/src/clp/clp/FileCompressor.cpp(7 hunks)components/core/src/clp/clp/FileCompressor.hpp(4 hunks)components/core/tests/test-BufferedReader.cpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/BufferedReader.hppcomponents/core/tests/test-BufferedReader.cppcomponents/core/src/clp/BufferedReader.cppcomponents/core/src/clp/clp/FileCompressor.hppcomponents/core/src/clp/clp/FileCompressor.cpp
🧠 Learnings (14)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/clp/CMakeLists.txtcomponents/core/CMakeLists.txtcomponents/core/src/clp/BufferedReader.hppcomponents/core/tests/test-BufferedReader.cppcomponents/core/src/clp/BufferedReader.cppcomponents/core/src/clp/clp/FileCompressor.hppcomponents/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/CMakeLists.txt
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/CMakeLists.txtcomponents/core/src/clp/BufferedReader.hppcomponents/core/tests/test-BufferedReader.cppcomponents/core/src/clp/BufferedReader.cpp
📚 Learning: 2024-12-10T16:56:33.545Z
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.
Applied to files:
components/core/src/clp/BufferedReader.cpp
📚 Learning: 2024-10-24T14:25:17.978Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Applied to files:
components/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp/clp/FileCompressor.hppcomponents/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
🧬 Code graph analysis (5)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
BufferedReader(20-43)
components/core/tests/test-BufferedReader.cpp (2)
components/core/src/clp/BufferedReader.cpp (1)
BufferedReader(20-43)components/core/src/clp/BufferedReader.hpp (6)
BufferedReader(72-75)BufferedReader(77-78)BufferedReader(77-77)BufferedReader(81-81)BufferedReader(84-84)BufferedReader(85-85)
components/core/src/clp/BufferedReader.cpp (1)
components/core/src/clp/BufferedReader.hpp (15)
BufferedReader(72-75)BufferedReader(77-78)BufferedReader(77-77)BufferedReader(81-81)BufferedReader(84-84)BufferedReader(85-85)OperationFailed(37-43)OperationFailed(45-52)buf(104-104)buf(156-157)pos(127-127)pos(143-143)pos(208-208)num_bytes_to_refill(184-184)delim(171-172)
components/core/src/clp/clp/FileCompressor.hpp (4)
components/core/src/clp/streaming_compression/zstd/Decompressor.hpp (1)
reader(83-83)components/core/src/clp/streaming_compression/Decompressor.hpp (1)
reader(54-54)components/core/src/clp/streaming_compression/passthrough/Decompressor.hpp (1)
reader(77-77)components/core/src/clp/LibarchiveReader.hpp (1)
reader(47-47)
components/core/src/clp/clp/FileCompressor.cpp (2)
components/core/src/clp/clp/FileCompressor.hpp (6)
target_data_size_of_dicts(43-50)target_data_size_of_dicts(68-77)target_data_size_of_dicts(79-87)target_data_size_of_dicts(89-97)target_data_size_of_dicts(110-118)target_data_size_of_dicts(131-139)components/core/src/clp/LibarchiveReader.hpp (1)
reader(47-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (10)
components/core/src/clp/clp/FileCompressor.hpp (1)
58-77: New parse_and_encode wrapper: API shape is fine.Signature and docs align with the ReaderInterface refactor and void return. No issues from an API standpoint.
components/core/src/clp/BufferedReader.cpp (5)
233-239: Consider validating buffer allocation before readThe code attempts to read into
&m_buffer[next_buffer_pos]without explicitly checking if the resize operations (lines 219 or implicit vector capacity) were successful. Whilestd::vector::resize()will throwstd::bad_allocon failure, it's worth noting this implicit dependency.The implementation correctly handles buffer management with appropriate resizing before reads.
138-139: Prefer explicit condition check per coding guidelinesPer the coding guidelines, prefer
false == <expression>rather than using the negation operator.- while (false == dst_view.empty()) { + while (false == dst_view.empty()) {Note: This line already follows the guideline correctly. No change needed.
166-168: Prefer explicit condition check per coding guidelinesPer the coding guidelines, prefer
false == <expression>rather than using the negation operator.- if (false == append) { + if (false == append) {Note: This line already follows the guideline correctly. No change needed.
33-34: Prefer explicit condition check per coding guidelinesPer the coding guidelines, prefer
false == <expression>rather than using the negation operator.- if (base_buffer_size % cMinBufferSize != 0) { + if (false == (base_buffer_size % cMinBufferSize == 0)) {Likely an incorrect or invalid review comment.
67-69: Unnecessary empty return checkThe empty check before return is redundant since returning from a void function with no value is valid regardless.
auto BufferedReader::clear_checkpoint() -> void { - if (false == m_checkpoint_pos.has_value()) { - return; - } - - auto error_code = try_seek_from_begin(m_highest_read_in_src_pos); - if (ErrorCode_Success != error_code) { - // Should never happen - throw OperationFailed(error_code, __FILENAME__, __LINE__); + if (m_checkpoint_pos.has_value()) { + auto error_code = try_seek_from_begin(m_highest_read_in_src_pos); + if (ErrorCode_Success != error_code) { + // Should never happen + throw OperationFailed(error_code, __FILENAME__, __LINE__); + } + drop_content_before_current_pos(); + m_checkpoint_pos.reset(); } - drop_content_before_current_pos(); - m_checkpoint_pos.reset(); }⛔ Skipped due to learnings
Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/BufferedFileReader.cpp:96-106 Timestamp: 2024-10-24T14:45:26.265Z Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.components/core/src/clp/clp/FileCompressor.cpp (4)
129-129: LGTM! Clean migration to BufferedReaderThe refactoring to use
BufferedReaderwithFileReaderwrapped in a smart pointer is well-implemented and aligns with the PR objectives of supporting ReaderInterface-based sources.
189-220: Well-structured delegation patternThe new
parse_and_encodewrapper method cleanly delegates to the appropriate encoding strategy based on theuse_heuristicflag. This design improves maintainability and supports the ReaderInterface abstraction.
393-402: Excellent use of unified interfaceThe migration to use the new
parse_and_encodewrapper withm_libarchive_file_readerdemonstrates good code reuse and maintains consistency across different reader types.
311-311: Prefer consistent variable declaration styleConsider moving the variable declaration closer to its first use (line 313) for better code locality.
- string filename_if_compressed; if (file_boost_path.has_stem()) { - filename_if_compressed = file_boost_path.stem().string(); + string filename_if_compressed = file_boost_path.stem().string(); } else { - filename_if_compressed = file_boost_path.filename().string(); + string filename_if_compressed = file_boost_path.filename().string(); } + auto error_code = m_libarchive_reader.try_open(file_reader, filename_if_compressed);Wait, the variable is used after the if-else block. The current placement is correct.
| buffered_file_reader.clear_checkpoint(); | ||
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | ||
|
|
||
| // Validate that a FileReader and a BufferedFileReader return the same strings (split by | ||
| // delimiters) | ||
| clp::ErrorCode error_code{ErrorCode_Success}; | ||
| auto delimiter = (char)('a' + (std::rand() % (cNumAlphabets))); | ||
| ErrorCode error_code{ErrorCode_Success}; | ||
| auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen)); |
There was a problem hiding this comment.
🧹 Nitpick
Comment/name consistency: refer to BufferedReader.
- // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
+ // Validate that clearing a checkpoint without any reading doesn't change the beginning offset
@@
- // Validate that a FileReader and a BufferedFileReader return the same strings (split by
+ // Validate that a FileReader and a BufferedReader return the same strings (split by📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| buffered_file_reader.clear_checkpoint(); | |
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | |
| // Validate that a FileReader and a BufferedFileReader return the same strings (split by | |
| // delimiters) | |
| clp::ErrorCode error_code{ErrorCode_Success}; | |
| auto delimiter = (char)('a' + (std::rand() % (cNumAlphabets))); | |
| ErrorCode error_code{ErrorCode_Success}; | |
| auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen)); | |
| // Validate that clearing a checkpoint without any reading doesn't change the beginning offset | |
| buffered_file_reader.clear_checkpoint(); | |
| REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); | |
| // Validate that a FileReader and a BufferedReader return the same strings (split by | |
| // delimiters) | |
| ErrorCode error_code{ErrorCode_Success}; | |
| auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen)); |
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedReader.cpp around lines 301 to 307, the
test uses wording/identifiers referencing "BufferedFileReader" while the
file/class is BufferedReader; update the comment and any variable identifiers
for consistency: change the comment text to mention BufferedReader and rename
variables (and related setup/usage) from buffered_file_reader to buffered_reader
(and reader_begin_offset accordingly) or vice versa so names match the actual
class under test, and update any includes or factory calls if necessary to use
the correct BufferedReader symbol.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
components/core/tests/test-BufferedReader.cpp (1)
61-63: Comment naming: refer to BufferedReader.- // Read a large chunk of data, so BufferedFileReader will refill the + // Read a large chunk of data, so BufferedReader will refill thecomponents/core/src/clp/clp/FileCompressor.hpp (3)
34-49: Doc missing for new parameteruse_heuristic.Please add
@param use_heuristicto thecompress_filecomment for completeness.
109-117: Add param detail forfile_readerin archive path.State that
file_readeris positioned at the start of the container and will be read as the archive source.* @param archive_writer - * @param file_reader + * @param file_reader Reader positioned at the start of the container; consumed but not closed. * @param use_heuristic
4-16: Avoid relying on transitive includes; includeReaderInterface.hppexplicitly.This header declares functions taking
ReaderInterface&but does not include it directly. Add the include (or forward-declare and qualify as needed) to reduce coupling onLibarchiveReader.hpp.#include "../ir/LogEventDeserializer.hpp" #include "../LibarchiveFileReader.hpp" #include "../LibarchiveReader.hpp" #include "../MessageParser.hpp" #include "../ParsedMessage.hpp" #include "../streaming_archive/writer/Archive.hpp" #include "FileToCompress.hpp" +#include "../ReaderInterface.hpp"components/core/src/clp/clp/FileCompressor.cpp (1)
321-327: Consider loggingerror_codefromtry_openfor diagnosability.Including the specific
ErrorCodecan ease triage.- if (ErrorCode_Success != error_code) { - SPDLOG_ERROR( - "Cannot compress {} - failed to open with libarchive.", - file_to_compress.get_path().c_str() - ); + if (ErrorCode_Success != error_code) { + SPDLOG_ERROR( + "Cannot compress {} - failed to open with libarchive, error={}", + file_to_compress.get_path().c_str(), + error_code + );
♻️ Duplicate comments (11)
components/core/tests/test-BufferedReader.cpp (10)
1-2: Add missing standard headers for algorithms, C string ops, and smart pointers.Required for std::max, memcmp, and std::make_unique/std::make_shared.
Apply:
#include <array> -#include <random> +#include <random> +#include <algorithm> // std::max +#include <cstring> // std::memcmp +#include <memory> // std::make_unique, std::make_shared
26-26: Rename test tag to match the new class.-TEST_CASE("Test reading data", "[BufferedFileReader]") { +TEST_CASE("Test reading data", "[BufferedReader]") {
28-33: std::array size must be compile‑time constant. Make test_data_size constexpr (and use unsigned literal).- size_t const test_data_size = 4L * 1024 * 1024 + 1; // 4MB + 1 + constexpr size_t test_data_size = 4ULL * 1024 * 1024 + 1; // 4 MiB + 1
35-35: Align temporary filename with the new class name (clarity).- string const test_file_path{"BufferedFileReader.test"}; + string const test_file_path{"BufferedReader.test"};
44-46: Construct BufferedReader with a shared_ptr explicitly (matches API and avoids subtle conversions).- size_t const base_buffer_size = BufferedReader::cMinBufferSize << 4; - BufferedReader reader{make_unique<FileDescriptorReader>(test_file_path), base_buffer_size}; + size_t const base_buffer_size = BufferedReader::cMinBufferSize << 4; + BufferedReader reader{std::make_shared<FileDescriptorReader>(test_file_path), base_buffer_size};
266-266: Rename second test tag to “[BufferedReader]”.-TEST_CASE("Test delimiter", "[BufferedFileReader]") { +TEST_CASE("Test delimiter", "[BufferedReader]") {
273-279: Make size constexpr here as well (same compile‑time constant issue).- size_t const test_data_size = 1L * 1024 * 1024 + 1; // 1MB + constexpr size_t test_data_size = 1ULL * 1024 * 1024 + 1; // 1 MiB + 1
281-283: Rename delimiter test filename for consistency.- string const test_file_path{"BufferedFileReader.delimiter.test"}; + string const test_file_path{"BufferedReader.delimiter.test"};
288-299: Use shared_ptr for reader construction, and align variable names/comments with BufferedReader.- // Instantiate BufferedFileReader and the reference FileReader from a non-zero pos - auto fd_reader = make_unique<FileDescriptorReader>(test_file_path); - fd_reader->seek_from_begin(reader_begin_offset); - BufferedReader buffered_file_reader{std::move(fd_reader)}; - string test_string; + // Instantiate BufferedReader and the reference FileReader from a non-zero position + auto fd_reader = std::make_shared<FileDescriptorReader>(test_file_path); + fd_reader->seek_from_begin(reader_begin_offset); + BufferedReader buffered_reader{fd_reader}; + string test_string; @@ - FileReader ref_file_reader{test_file_path}; + FileReader ref_file_reader{test_file_path};
300-307: Grammar/name consistency and follow‑through on rename.- // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset - buffered_file_reader.clear_checkpoint(); - REQUIRE(reader_begin_offset == buffered_file_reader.get_pos()); + // Validate that clearing a checkpoint without any reading doesn't change the beginning offset + buffered_reader.clear_checkpoint(); + REQUIRE(reader_begin_offset == buffered_reader.get_pos()); @@ - // Validate that a FileReader and a BufferedFileReader return the same strings (split by + // Validate that a FileReader and a BufferedReader return the same strings (split by @@ - auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen)); + auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));Also update subsequent uses:
- auto const error_code2 - = buffered_file_reader.try_read_to_delimiter(delimiter, true, false, test_string); + auto const error_code2 + = buffered_reader.try_read_to_delimiter(delimiter, true, false, test_string);components/core/src/clp/clp/FileCompressor.cpp (1)
125-126:std::filesystem::canonicalmay throw; tracked in Issue #553.No action in this PR per prior discussion, just flagging that this still may throw on missing/permission errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/clp/FileCompressor.cpp(7 hunks)components/core/src/clp/clp/FileCompressor.hpp(3 hunks)components/core/tests/test-BufferedReader.cpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/test-BufferedReader.cppcomponents/core/src/clp/clp/FileCompressor.cppcomponents/core/src/clp/clp/FileCompressor.hpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/tests/test-BufferedReader.cppcomponents/core/src/clp/clp/FileCompressor.cppcomponents/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/tests/test-BufferedReader.cppcomponents/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/tests/test-BufferedReader.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp/clp/FileCompressor.cppcomponents/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-24T14:25:17.978Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Applied to files:
components/core/src/clp/clp/FileCompressor.hpp
🧬 Code graph analysis (3)
components/core/tests/test-BufferedReader.cpp (2)
components/core/src/clp/BufferedReader.hpp (6)
BufferedReader(72-75)BufferedReader(77-78)BufferedReader(77-77)BufferedReader(81-81)BufferedReader(84-84)BufferedReader(85-85)components/core/src/clp/BufferedReader.cpp (1)
BufferedReader(20-43)
components/core/src/clp/clp/FileCompressor.cpp (1)
components/core/src/clp/LibarchiveReader.hpp (1)
reader(47-47)
components/core/src/clp/clp/FileCompressor.hpp (4)
components/core/src/clp/streaming_compression/zstd/Decompressor.hpp (1)
reader(83-83)components/core/src/clp/streaming_compression/Decompressor.hpp (1)
reader(54-54)components/core/src/clp/streaming_compression/passthrough/Decompressor.hpp (1)
reader(77-77)components/core/src/clp/LibarchiveReader.hpp (1)
reader(47-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: package-image
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/core/src/clp/clp/FileCompressor.cpp (4)
157-166: Exception propagation bypasses profiler stop; confirm intended behaviour.If
parse_and_encodethrows,stop_continuous_measurementwon’t run. If that’s undesired, consider a small scope guard or catching here and settingsucceeded = false.- if (is_utf8_encoded({utf8_validation_buf, utf8_validation_buf_len})) { - parse_and_encode( ... ); - } else { + if (is_utf8_encoded({utf8_validation_buf, utf8_validation_buf_len})) { + try { + parse_and_encode( ... ); + } catch (std::exception const& e) { + SPDLOG_ERROR("Failed to parse/encode {}: {}", file_to_compress.get_path(), e.what()); + succeeded = false; + } + } else {
190-221: Wrapper looks good; consistent routing byuse_heuristic.The dispatcher cleanly centralizes the path choice. No concerns.
395-403: LGTM: archive member path uses the sameparse_and_encodewrapper.Consistent with the non-archive path; good reuse.
5-8: No unused<iostream>orunique_ptr>includes/usings to remove
<iostream>and itsstd::cout/std::endlusages appear later in the file, andstd::unique_ptr/std::make_uniqueare in use, so these includes and using-declarations must remain.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
components/core/src/clp/BufferedReader.hpp (2)
28-30: Doc fix: don't claim "page size" when enforcing a fixed 4 KiB multiple.The comment states "multiple of the page size" but the implementation enforces a fixed 4 KiB multiple (cMinBufferSize).
Apply this diff to fix the documentation:
- * NOTE: This class restricts the buffer size to a multiple of the page size and we avoid reading - * anything less than a page to avoid multiple page faults. + * NOTE: This class restricts the buffer size to a multiple of cMinBufferSize (4 KiB) and avoids + * reading less than that to reduce page faults.
86-87: Use canonical signatures for deleted assignment operators.Using
operator=(BufferedReader const&)instead of the typical by-value parameter for copy assignment is preferable for clarity.The current code already uses the canonical signature
operator=(BufferedReader const&)for the copy assignment operator, which is good. The move assignment operator on line 87 is also correctly deleted. No changes needed here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/BufferedReader.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/BufferedReader.hpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/BufferedReader.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/src/clp/BufferedReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
BufferedReader(20-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (3)
components/core/src/clp/BufferedReader.hpp (3)
95-95: Well-designed error handling delegation.Good practice forwarding
refill_reader_buffer's return values for consistent error propagation.
127-127: Clear and accurate documentation.The method documentation clearly states it returns the position in the source reader, which aligns with the implementation.
156-157: Documentation correctly referencesBufferReader::try_read.Based on the retrieved learnings, the documentation appropriately references
BufferReader::try_readsinceBufferedReader::try_readexplicitly calls it.
| [[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t { | ||
| return in_src_pos - m_buffer_begin_in_src_pos; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick
Ensure overflow safety in position calculation.
The subtraction in_src_pos - m_buffer_begin_in_src_pos could underflow if in_src_pos is less than m_buffer_begin_in_src_pos.
Consider adding a debug assertion to catch programming errors:
[[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
+ // Ensure the position is within the buffered range
+ assert(in_src_pos >= m_buffer_begin_in_src_pos);
return in_src_pos - m_buffer_begin_in_src_pos;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t { | |
| return in_src_pos - m_buffer_begin_in_src_pos; | |
| } | |
| [[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t { | |
| // Ensure the position is within the buffered range | |
| assert(in_src_pos >= m_buffer_begin_in_src_pos); | |
| return in_src_pos - m_buffer_begin_in_src_pos; | |
| } |
🤖 Prompt for AI Agents
components/core/src/clp/BufferedReader.hpp around lines 195-197: the subtraction
return in_src_pos - m_buffer_begin_in_src_pos can underflow if in_src_pos <
m_buffer_begin_in_src_pos; add a debug-time assertion (e.g.,
assert/instrumentation/Expects) that in_src_pos >= m_buffer_begin_in_src_pos to
catch programming errors, then perform the subtraction only after the check;
also consider a clear fail-fast behavior for release builds (throw
std::out_of_range or clamp) if you want to avoid silent underflow in production.
BufferedFileReader wrap a ReaderInterface instead of a FileReader.BufferedFileReader wrap a ReaderInterface instead of a FileReader; rename BufferedFileReader to BufferedReader.
gibber9809
left a comment
There was a problem hiding this comment.
After the recent edits these changes LGTM. Directly edited the PR title as well.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
components/core/src/clp/clp/CMakeLists.txt (1)
4-5: BufferedReader source swap looks good; verify no stale BufferedFileReader refs remain.Change is correct. Please re-check repo for lingering BufferedFileReader sources/includes/tests to avoid duplicate/unused files.
#!/bin/bash # Find any remaining BufferedFileReader references (code, CMake, tests, lint) rg -n --hidden --no-ignore -S '\bBufferedFileReader\b|test-BufferedFileReader\.cpp' || true fd -HI 'BufferedFileReader.*' || truecomponents/core/CMakeLists.txt (1)
444-446: Unit test/source rename to BufferedReader is aligned; ensure legacy refs are removed project‑wide.Looks consistent with the refactor. Please confirm no leftover BufferedFileReader files/references (incl. taskfiles/lint.yaml, glt).
#!/bin/bash # Repo-wide verification for stale BufferedFileReader artifacts rg -n --hidden --no-ignore -S '\bBufferedFileReader\b|test-BufferedFileReader\.cpp' || true fd -HI 'BufferedFileReader.*|test-BufferedFileReader.cpp' || trueAlso applies to: 692-692
components/core/src/clp/BufferedReader.hpp (2)
28-29: Doc fix: “page size” → cMinBufferSize (4 KiB).Update to reflect the actual constraint.
- * NOTE: This class restricts the buffer size to a multiple of the page size and we avoid reading - * anything less than a page to avoid multiple page faults. + * NOTE: This class restricts the buffer size to a multiple of cMinBufferSize (4 KiB) and avoids + * reading less than that to reduce page faults.
195-197: Guard against underflow in position math.Add a debug assertion before subtraction.
[[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t { + assert(in_src_pos >= m_buffer_begin_in_src_pos); return in_src_pos - m_buffer_begin_in_src_pos; }Also add the header at the top of this file:
#include <cassert>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/CMakeLists.txt(2 hunks)components/core/src/clp/BufferedReader.hpp(1 hunks)components/core/src/clp/clp/CMakeLists.txt(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/BufferedReader.hpp
🧠 Learnings (5)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp/BufferedReader.hpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/CMakeLists.txt
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Applied to files:
components/core/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp/BufferedReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
BufferedReader(20-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (1)
components/core/src/clp/BufferedReader.hpp (1)
84-88: Good: canonical deletion of copy/move assignment operators.Signatures are explicit and standard.
BufferedFileReader wrap a ReaderInterface instead of a FileReader; rename BufferedFileReader to BufferedReader.BufferedFileReader wrap a ReaderInterface instead of a FileReader; Rename BufferedFileReader to BufferedReader.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Updated title directly. Otherwise lgtm.
Description
The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:
Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.
Validation performed
1.Made sure exisitng Unit test passes.
2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos.
3. Compressed and decompressed a single file, verified the decompressed file matches original
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Build