Skip to content

refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader; Rename BufferedFileReader to BufferedReader.#523

Merged
LinZhihao-723 merged 84 commits into
y-scope:mainfrom
haiqi96:BufferedReaderRefactor
Sep 18, 2025
Merged

Conversation

@haiqi96

@haiqi96 haiqi96 commented Aug 22, 2024

Copy link
Copy Markdown
Contributor

Description

The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:

  1. Refactored BufferedFileReader to use RAII model.
  2. Remove the internal FileDescriptor and C-style read/write from BufferedFileReader
  3. Update the instantiation of all BufferedFileReader instance to use a FileDescriptorReader (instead of FileReader) to avoid double buffering.
  4. Rewrite some BufferedFileReader apis using modern C++ features.

Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.

Validation performed

1.Made sure exisitng Unit test passes.
2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos.
3. Compressed and decompressed a single file, verified the decompressed file matches original

Summary by CodeRabbit

  • New Features

    • Added a new buffered reader backed by a pluggable ReaderInterface with checkpointing, delimiter-aware reads and a unified parse-and-encode entry supporting an optional heuristic.
  • Refactor

    • Replaced descriptor/file-based I/O with ReaderInterface-backed buffering; public open/close file APIs removed; compression APIs now accept external reader sources.
  • Bug Fixes

    • Improved BufferReader constructor validation and safer peek semantics.
  • Tests

    • Unit tests updated to exercise the new buffered reader and file-descriptor reader.
  • Build

    • Build inputs updated to use the new buffered reader sources.

@gibber9809 gibber9809 requested a review from a team as a code owner September 16, 2025 18:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
components/core/src/clp/BufferReader.hpp (1)

36-39: Document nullptr on empty peek

peek_buffer now sets buf=nullptr when peek_size==0. Please note this in the header to avoid caller assumptions.

Apply this doc tweak:

- * @param buf Returns a pointer to the remaining content in the buffer
+ * @param buf Returns a pointer to the remaining content in the buffer (nullptr if none)
components/core/src/clp/BufferReader.cpp (2)

43-49: Avoid signed/unsigned mix; compute bytes read relative to buffer_head

Using ptrdiff_t then subtracting a size_t risks warnings and confusion. Compute directly from buffer_head.

Apply:

-        auto const delim_pos{delim_ptr - m_internal_buf};
-        num_bytes_read = (delim_pos - m_internal_buf_pos) + 1;
+        num_bytes_read = static_cast<size_t>(delim_ptr - buffer_head) + 1;

60-71: Set num_bytes_read on zero-length reads

Early-return path for num_bytes_to_read==0 should set num_bytes_read=0 for consistency.

Apply:

-    if (0 == remaining_data_size) {
-        if (0 == num_bytes_to_read) {
-            num_bytes_read = 0;
-            return ErrorCode_Success;
-        }
+    if (0 == remaining_data_size) {
+        if (0 == num_bytes_to_read) {
+            num_bytes_read = 0;
+            return ErrorCode_Success;
+        }

(Already correct here; ensure same convention in BufferedFileReader::try_read.)

components/core/src/clp/BufferedFileReader.cpp (1)

206-223: Critical: over-reading when checkpoint set inflates buffer massively

num_bytes_to_read should be the delta to the next boundary, not the absolute rounded position. Current code can request gigabytes.

Apply:

-        num_bytes_to_read = int_round_up_to_multiple(
-                buffer_end_pos + num_bytes_to_refill,
-                m_base_buffer_size
-        );
+        num_bytes_to_read = int_round_up_to_multiple(
+                buffer_end_pos + num_bytes_to_refill,
+                m_base_buffer_size
+        ) - buffer_end_pos;
components/core/src/clp/BufferedFileReader.hpp (1)

92-93: Fix copy-assignment signature

The deleted copy-assignment should take const&, not by value.

Apply:

-    auto operator=(BufferedFileReader) -> BufferedFileReader& = delete;
+    auto operator=(BufferedFileReader const&) -> BufferedFileReader& = delete;
♻️ Duplicate comments (3)
components/core/src/clp/BufferReader.cpp (1)

11-15: Bounds check on pos is correct (matches prior feedback)

The pos > data_size guard prevents invalid construction. Good catch and resolution.

components/core/src/clp/BufferedFileReader.cpp (1)

159-163: EOF only when zero bytes read — correct

Matches intended semantics discussed earlier. Keep as-is.

components/core/src/clp/clp/FileCompressor.cpp (1)

124-126: canonical() may throw; tracked separately

Exception handling around std::filesystem::canonical was discussed and is tracked; fine to defer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d843ec and 95eae6b.

📒 Files selected for processing (5)
  • components/core/src/clp/BufferReader.cpp (1 hunks)
  • components/core/src/clp/BufferReader.hpp (1 hunks)
  • components/core/src/clp/BufferedFileReader.cpp (7 hunks)
  • components/core/src/clp/BufferedFileReader.hpp (8 hunks)
  • components/core/src/clp/clp/FileCompressor.cpp (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/BufferReader.cpp
  • components/core/src/clp/BufferReader.hpp
  • components/core/src/clp/clp/FileCompressor.cpp
  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
🧠 Learnings (11)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/BufferReader.cpp
  • components/core/src/clp/BufferReader.hpp
  • components/core/src/clp/clp/FileCompressor.cpp
  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-12-10T16:56:33.545Z
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.

Applied to files:

  • components/core/src/clp/BufferReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/src/clp/BufferReader.cpp
  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
🧬 Code graph analysis (4)
components/core/src/clp/BufferReader.cpp (1)
components/core/src/clp/BufferReader.hpp (8)
  • BufferReader (26-26)
  • BufferReader (28-28)
  • BufferReader (30-30)
  • pos (77-77)
  • pos (83-83)
  • OperationFailed (16-17)
  • buf (39-39)
  • buf (68-69)
components/core/src/clp/clp/FileCompressor.cpp (2)
components/core/src/clp/clp/FileCompressor.hpp (6)
  • target_data_size_of_dicts (43-50)
  • target_data_size_of_dicts (68-77)
  • target_data_size_of_dicts (79-87)
  • target_data_size_of_dicts (89-97)
  • target_data_size_of_dicts (110-118)
  • target_data_size_of_dicts (131-139)
components/core/src/clp/LibarchiveReader.hpp (1)
  • reader (47-47)
components/core/src/clp/BufferedFileReader.cpp (5)
components/core/src/clp/BufferedFileReader.hpp (13)
  • BufferedFileReader (78-81)
  • BufferedFileReader (83-84)
  • BufferedFileReader (83-83)
  • BufferedFileReader (87-87)
  • BufferedFileReader (90-90)
  • BufferedFileReader (91-91)
  • OperationFailed (43-49)
  • OperationFailed (51-58)
  • buf (110-110)
  • buf (161-162)
  • pos (133-133)
  • pos (149-149)
  • pos (208-208)
components/core/src/clp/BufferReader.hpp (5)
  • OperationFailed (16-17)
  • buf (39-39)
  • buf (68-69)
  • pos (77-77)
  • pos (83-83)
components/core/src/clp/StringReader.cpp (4)
  • try_seek_from_begin (41-48)
  • try_seek_from_begin (41-41)
  • try_read (18-39)
  • try_read (18-18)
components/core/src/clp/BufferReader.cpp (4)
  • try_seek_from_begin (83-89)
  • try_seek_from_begin (83-83)
  • try_read (60-81)
  • try_read (60-61)
components/core/src/clp/FileReader.cpp (4)
  • try_seek_from_begin (53-60)
  • try_seek_from_begin (53-53)
  • try_read (34-51)
  • try_read (34-34)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
  • BufferedFileReader (21-44)
components/core/src/clp/BufferReader.hpp (5)
  • nodiscard (20-22)
  • nodiscard (33-33)
  • nodiscard (99-101)
  • pos (77-77)
  • pos (83-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (11)
components/core/src/clp/BufferReader.hpp (1)

26-27: Default constructor addition looks good

Enables safe default-initialization for embedders (e.g., BufferedFileReader). No issues.

components/core/src/clp/BufferReader.cpp (1)

21-22: Confirm callers handle nullptr from peek_buffer

Behaviour changed to return nullptr when empty. Ensure all callers (new and existing) check peek_size before dereferencing buf.

components/core/src/clp/BufferedFileReader.cpp (2)

26-33: Constructor validation LGTM

Null-check and error messaging are appropriate; base_buffer_size multiple-of-page guard is good.


242-245: Reinitializing BufferReader on EOF is correct

Ensures peek returns nullptr when empty; consistent with BufferReader change.

components/core/src/clp/clp/FileCompressor.cpp (3)

129-130: BufferedFileReader adoption LGTM

Construction via make_unique aligns with the new ReaderInterface path.


132-149: Error logging paths OK

Differentiating errno vs generic reader errors is reasonable here. No change requested.


189-220: Wrapper parse_and_encode reads well

Clear split between heuristic and library paths; good API.

components/core/src/clp/BufferedFileReader.hpp (4)

72-85: Ctor overloads and null-check (in .cpp) align with design

API shape is good; default-size delegating ctor is convenient.


95-101: Doc: return references to internal functions

“Same as refill_reader_buffer” is fine here; consistent with implementation.


145-147: Doc correctness

Referencing both BufferReader::try_seek_from_begin and ReaderInterface::try_seek_from_begin is appropriate given the code path.


204-209: get_buffer_end_pos and update_pos APIs are clear

Good encapsulation of position math.

Comment thread components/core/src/clp/BufferedFileReader.cpp Outdated
Comment thread components/core/src/clp/BufferedFileReader.cpp Outdated
@gibber9809 gibber9809 changed the title clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader. Sep 16, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/tests/test-BufferedFileReader.cpp (1)

80-84: Assert zero bytes read at EOF

When try_read returns ErrorCode_EndOfFile, num_bytes_read should be 0. Tighten the assertion to catch regressions.

         REQUIRE(ErrorCode_EndOfFile
                 == reader.try_read(read_buf.begin() + buf_pos, num_bytes_to_read, num_bytes_read));
+        REQUIRE(0 == num_bytes_read);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95eae6b and 1e7d6fa.

📒 Files selected for processing (1)
  • components/core/tests/test-BufferedFileReader.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-BufferedFileReader.cpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
🧬 Code graph analysis (1)
components/core/tests/test-BufferedFileReader.cpp (4)
components/core/src/clp/BufferedFileReader.hpp (6)
  • BufferedFileReader (78-81)
  • BufferedFileReader (83-84)
  • BufferedFileReader (83-83)
  • BufferedFileReader (87-87)
  • BufferedFileReader (90-90)
  • BufferedFileReader (91-91)
components/core/src/clp/BufferedFileReader.cpp (1)
  • BufferedFileReader (21-44)
components/core/src/clp/FileReader.hpp (2)
  • FileReader (28-28)
  • FileReader (30-30)
components/core/src/clp/FileReader.cpp (2)
  • FileReader (15-23)
  • FileReader (25-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (2)
components/core/tests/test-BufferedFileReader.cpp (2)

45-45: Good adoption of ReaderInterface in tests

Constructing BufferedFileReader with make_unique<FileDescriptorReader>(...) matches the refactor intent and prevents double-buffering.


291-299: LGTM: Non-zero initial position coverage

Seeking the FileDescriptorReader before handing it to BufferedFileReader provides solid coverage for non-zero starting offsets; the mirrored FileReader reference is a good oracle.

Please confirm this path also runs under TSAN/ASAN configurations in CI to catch any hidden UB in the reader stack.

Comment on lines 1 to 10
#include <array>
#include <random>

#include <boost/filesystem.hpp>
#include <catch2/catch_test_macros.hpp>
#include <catch2/generators/catch_generators.hpp>

#include "../src/clp/BufferedFileReader.hpp"
#include "../src/clp/FileDescriptorReader.hpp"
#include "../src/clp/FileReader.hpp"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Add missing headers for std::max and memcmp

std::max (used on Line 158) and memcmp (used multiple times) require <algorithm> and <cstring> respectively; avoid relying on transitive includes.

 #include <array>
+#include <algorithm>
 #include <random>
+#include <cstring>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <array>
#include <random>
#include <boost/filesystem.hpp>
#include <catch2/catch_test_macros.hpp>
#include <catch2/generators/catch_generators.hpp>
#include "../src/clp/BufferedFileReader.hpp"
#include "../src/clp/FileDescriptorReader.hpp"
#include "../src/clp/FileReader.hpp"
#include <array>
#include <algorithm>
#include <random>
#include <cstring>
#include <boost/filesystem.hpp>
#include <catch2/catch_test_macros.hpp>
#include <catch2/generators/catch_generators.hpp>
#include "../src/clp/BufferedFileReader.hpp"
#include "../src/clp/FileDescriptorReader.hpp"
#include "../src/clp/FileReader.hpp"
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp lines 1 to 10, the test
uses std::max and memcmp but is missing the required headers; add #include
<algorithm> for std::max and #include <cstring> for memcmp at the top of the
file (with the other standard headers) to avoid relying on transitive includes.

Comment on lines +29 to 31
auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>();
auto& test_data = *test_data_uniq_ptr;
for (size_t i = 0; i < test_data.size(); ++i) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Consider std::vector over heap-allocated std::array

unique_ptr<std::array<...>> adds template-size coupling and indirection. std::vector<char> is simpler and dynamic while keeping contiguous storage.

Example for the first test block:

-    auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>();
-    auto& test_data = *test_data_uniq_ptr;
+    std::vector<char> test_data(test_data_size);
-    auto read_buf_uniq_ptr = make_unique<std::array<char, test_data_size>>();
-    auto& read_buf = *read_buf_uniq_ptr;
+    std::vector<char> read_buf(test_data_size);

Similarly for the delimiter test’s test_data.

Also applies to: 42-44, 274-276

🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 29-31 (and
similarly at 42-44 and 274-276), replace the heap-allocated
unique_ptr<std::array<char, test_data_size>> pattern with a plain
std::vector<char> to avoid template-size coupling and extra indirection: create
a std::vector<char> test_data; resize or reserve it to test_data_size, populate
via test_data[i] or std::fill, and use test_data.data() where a contiguous
pointer is needed; apply the same change to the delimiter test and the other
referenced blocks, adjusting types and accesses accordingly.

Comment on lines 35 to 36
string const test_file_path{"BufferedFileReader.test"};
// Write to test file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use unique temp paths to avoid test flakiness and collisions in parallel runs

Hardcoded filenames can collide across parallel CI shards or stale artefacts. Prefer boost::filesystem::temp_directory_path() + unique_path().

-    string const test_file_path{"BufferedFileReader.test"};
+    auto const tmp_dir = boost::filesystem::temp_directory_path();
+    string const test_file_path = (tmp_dir / boost::filesystem::unique_path("BufferedFileReader-%%%%%%%%.test")).string();
-    string const test_file_path{"BufferedFileReader.delimiter.test"};
+    auto const tmp_dir = boost::filesystem::temp_directory_path();
+    string const test_file_path = (tmp_dir / boost::filesystem::unique_path("BufferedFileReaderDelimiter-%%%%%%%%.test")).string();

Also applies to: 282-283

🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 35-36 (and
also update the occurrence at lines 282-283), the test uses a hardcoded filename
"BufferedFileReader.test" which can collide in parallel CI or with stale
artifacts; replace these hardcoded paths by constructing unique temp file paths
using boost::filesystem::temp_directory_path() /
boost::filesystem::unique_path() (or equivalent APIs), create/write the file at
that unique path for the test, and ensure cleanup after the test to avoid
residue.

Comment on lines +267 to +271
// Initialize random number generator
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> uniformly_distributed_alphabet('a', 'a' + cNumAlphabets - 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Seed RNG deterministically for reproducible tests

Using std::random_device makes failures non-reproducible. Use a fixed seed.

-    std::random_device rd;
-    std::mt19937 gen(rd());
+    // Fixed seed for reproducibility in CI
+    std::mt19937 gen(0xC0FFEEu);
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 267 to 271,
the RNG is seeded with std::random_device which makes test runs
non-deterministic; replace the nondeterministic seeding with a fixed
deterministic seed (e.g. use a constant seed value or configurable deterministic
seed) when constructing std::mt19937 so the uniform distribution produces
reproducible sequences across test runs.

Comment thread components/core/tests/test-BufferedFileReader.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/core/src/clp/BufferedFileReader.hpp (1)

90-93: Fix deleted copy-assignment signature

Deleting operator=(BufferedFileReader) (by value) is non-idiomatic and redundant with the rvalue overload. Delete the const& overload instead.

-    auto operator=(BufferedFileReader) -> BufferedFileReader& = delete;
-    auto operator=(BufferedFileReader&&) -> BufferedFileReader& = delete;
+    auto operator=(BufferedFileReader const&) -> BufferedFileReader& = delete;
+    auto operator=(BufferedFileReader&&) -> BufferedFileReader& = delete;
components/core/src/clp/BufferedFileReader.cpp (1)

214-223: Critical: over‑read when checkpoint is set (absolute vs delta math)

num_bytes_to_read is computed as an absolute end position, then used as a byte count for read/resize. This can trigger huge reads/resizes.

-        num_bytes_to_read = int_round_up_to_multiple(
-                buffer_end_pos + num_bytes_to_refill,
-                m_base_buffer_size
-        );
-        // Grow the buffer if necessary
-        if (num_bytes_to_read > available_buffer_space) {
-            m_buffer.resize(data_size + num_bytes_to_read);
-        }
+        auto const desired_end_pos = int_round_up_to_multiple(
+                buffer_end_pos + num_bytes_to_refill,
+                m_base_buffer_size
+        );
+        num_bytes_to_read = desired_end_pos - buffer_end_pos;
+        // Grow the buffer if necessary
+        if (num_bytes_to_read > available_buffer_space) {
+            m_buffer.resize(data_size + num_bytes_to_read);
+        }
         next_buffer_pos = data_size;
♻️ Duplicate comments (5)
components/core/src/clp/BufferedFileReader.hpp (2)

158-159: Doc correctly references BufferReader::try_read

Matches the implementation detail noted earlier; keep as-is.


74-81: Document constructor params and null-check behaviour

Docstring is missing details for reader_interface and the null-check enforced in the .cpp.

- * @param reader_interface
+ * @param reader_interface Shared ownership of the underlying input source; must be non-null.
+ *                         Throws OperationFailed(ErrorCode_BadParam) if null.
components/core/src/clp/BufferedFileReader.cpp (3)

57-59: Fix invalid comparison: std::optional<size_t> vs size_t

This is ill‑formed and won’t compile. Compare the contained value.

-    if (m_checkpoint_pos.has_value() && m_checkpoint_pos < m_pos
+    if (m_checkpoint_pos.has_value() && m_checkpoint_pos.value() < m_pos
         && m_buffer_reader.get_buffer_size() != m_base_buffer_size)

39-43: Non‑zero start bug: initialise buffer begin to reader position

If the underlying reader starts at a non‑zero position, the buffer’s absolute origin must match it. Otherwise, absolute/relative math (e.g., get_buffer_end_pos, seeks, refills) is inconsistent.

Apply:

-    m_pos = m_reader->get_pos();
-    m_buffer_begin_pos = 0;
+    m_pos = m_reader->get_pos();
+    m_buffer_begin_pos = m_pos;
     m_buffer_reader = BufferReader{m_buffer.data(), 0};
     m_highest_read_pos = m_pos;

130-135: Zero‑length read semantics and nullptr handling

For num_bytes_to_read==0, set num_bytes_read=0 and succeed regardless of buf. Current order returns BadParam for nullptr + 0.

-    if (nullptr == buf) {
-        return ErrorCode_BadParam;
-    }
-    if (num_bytes_to_read == 0) {
-        return ErrorCode_Success;
-    }
+    if (0 == num_bytes_to_read) {
+        num_bytes_read = 0;
+        return ErrorCode_Success;
+    }
+    if (nullptr == buf) {
+        return ErrorCode_BadParam;
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e7d6fa and 1bf6cff.

📒 Files selected for processing (2)
  • components/core/src/clp/BufferedFileReader.cpp (7 hunks)
  • components/core/src/clp/BufferedFileReader.hpp (8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/src/clp/BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
🧬 Code graph analysis (2)
components/core/src/clp/BufferedFileReader.cpp (6)
components/core/src/clp/BufferedFileReader.hpp (13)
  • BufferedFileReader (78-81)
  • BufferedFileReader (83-84)
  • BufferedFileReader (83-83)
  • BufferedFileReader (87-87)
  • BufferedFileReader (90-90)
  • BufferedFileReader (91-91)
  • OperationFailed (43-49)
  • OperationFailed (51-58)
  • buf (110-110)
  • buf (161-162)
  • pos (133-133)
  • pos (149-149)
  • pos (208-208)
components/core/src/clp/BufferReader.hpp (5)
  • OperationFailed (16-17)
  • buf (39-39)
  • buf (68-69)
  • pos (77-77)
  • pos (83-83)
components/core/src/clp/StringReader.cpp (4)
  • try_seek_from_begin (41-48)
  • try_seek_from_begin (41-41)
  • try_read (18-39)
  • try_read (18-18)
components/core/src/clp/BufferReader.cpp (4)
  • try_seek_from_begin (83-89)
  • try_seek_from_begin (83-83)
  • try_read (60-81)
  • try_read (60-61)
components/core/src/clp/FileReader.cpp (4)
  • try_seek_from_begin (53-60)
  • try_seek_from_begin (53-53)
  • try_read (34-51)
  • try_read (34-34)
components/core/src/clp/math_utils.hpp (2)
  • int_round_up_to_multiple (13-18)
  • int_round_up_to_multiple (13-13)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
  • BufferedFileReader (20-43)
components/core/src/clp/BufferReader.hpp (5)
  • nodiscard (20-22)
  • nodiscard (33-33)
  • nodiscard (99-101)
  • pos (77-77)
  • pos (83-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (8)
components/core/src/clp/BufferedFileReader.hpp (6)

83-85: LGTM: delegating ctor uses default buffer size

The delegating constructor is clean and explicit.


86-88: LGTM: trivial virtual dtor

Defaulted and marked override; good.


98-101: LGTM: failure semantics reference the right helper

Returning the same error as refill_reader_buffer keeps docs precise.


172-173: LGTM: ties delimiter read failures to refill path

Accurately describes the failure source.


186-188: LGTM: refill helper clearly states propagated error source

The contract aligns with wrapping ReaderInterface.


205-206: LGTM: buffer end computed from BufferReader size

Using get_buffer_size() keeps this in sync with rebinds.

components/core/src/clp/BufferedFileReader.cpp (2)

241-243: Good: buffer reader reset points at newly appended data

Re‑initialising BufferReader with size and starting pos at next_buffer_pos correctly exposes only the newly refilled region.

If ReaderInterface ever returns Success with num_bytes_read==0 for a positive request, the read loop could spin. Please confirm the contract guarantees EndOfFile in that case.


248-257: LGTM: buffer compaction is correct

Using get_pos()/get_buffer_size() to compute remaining, erasing the consumed prefix, updating m_buffer_begin_pos, and re‑seeding BufferReader is sound.

Comment thread components/core/src/clp/BufferedFileReader.cpp Outdated
Comment thread components/core/src/clp/BufferedFileReader.hpp Outdated
Comment thread components/core/src/clp/BufferedFileReader.hpp Outdated
Comment thread components/core/src/clp/BufferedFileReader.hpp Outdated
Comment on lines 197 to 202
* @param file_pos
* @return \p file_pos relative to the beginning of the buffer
* @return file_pos relative to the beginning of the buffer
*/
[[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
return file_pos - m_buffer_begin_pos;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Guard against underflow in get_buffer_relative_pos

If file_pos < m_buffer_begin_pos, subtraction underflows. Add a debug assert and document the precondition.

-    [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
-        return file_pos - m_buffer_begin_pos;
-    }
+    [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
+        // Precondition: file_pos >= m_buffer_begin_pos
+        assert(file_pos >= m_buffer_begin_pos);
+        return file_pos - m_buffer_begin_pos;
+    }

Add the include near the other headers:

 #include <cstddef>
+#include <cassert>
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 197 to 202,
get_buffer_relative_pos can underflow when file_pos < m_buffer_begin_pos; add a
debug assertion of the precondition (e.g., assert(file_pos >=
m_buffer_begin_pos)) and update the comment to document the precondition that
file_pos must be >= m_buffer_begin_pos, and add #include <cassert> alongside the
other headers near the top of the file.

Comment on lines 208 to 209
auto update_pos(size_t pos) -> void;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Document update_pos responsibilities

Please document whether update_pos also advances m_highest_read_pos and any invariants (e.g., pos >= m_pos).

Proposed doc stub:

-    auto update_pos(size_t pos) -> void;
+    // Advances m_pos to pos; if pos exceeds m_highest_read_pos, updates it as well.
+    // Precondition: pos >= m_buffer_begin_pos and pos <= get_buffer_end_pos().
+    auto update_pos(size_t pos) -> void;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto update_pos(size_t pos) -> void;
// Advances m_pos to pos; if pos exceeds m_highest_read_pos, updates it as well.
// Precondition: pos >= m_buffer_begin_pos and pos <= get_buffer_end_pos().
auto update_pos(size_t pos) -> void;
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 208-209, add a
doc comment for update_pos that states its responsibilities: specify whether it
advances m_highest_read_pos (yes/no), the precondition invariant (e.g., pos >=
m_pos), any postconditions (new m_pos == pos and when m_highest_read_pos is
updated to max(old, pos)), side effects (no IO, no allocation), and
thread-safety/exception behavior; ensure the comment is concise, uses the same
doxygen/style convention as surrounding methods, and include a short example or
note if callers must validate pos before calling.

Comment on lines 214 to 216
size_t m_pos{0};
std::shared_ptr<ReaderInterface> m_reader;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Centralize BufferReader rebind after buffer reallocations

Since BufferReader holds a raw pointer into m_buffer, any resize that changes capacity must rebind it. Consider a small helper to prevent drift.

 private:
+    // Rebinds m_buffer_reader to the current m_buffer data and size.
+    inline void rebind_buffer_reader(size_t size) {
+        m_buffer_reader = BufferReader{m_buffer.data(), size};
+    }

Then use rebind_buffer_reader(...) in places that resize m_buffer (e.g., after drop_content_before_current_pos()).

Also applies to: 220-221

🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around lines 214-216 (and also
220-221), BufferReader holds a raw pointer into m_buffer so any reallocation of
m_buffer can invalidate that pointer; add a small private helper method
rebind_buffer_reader() that resets the BufferReader to point at m_buffer.data()
+ m_pos (and any other needed offsets) and call this helper immediately after
every place that can change m_buffer capacity/resize (e.g., after
drop_content_before_current_pos() and any other buffer-resizing code paths) to
ensure the raw pointer is updated and prevent pointer drift.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/src/clp/BufferedFileReader.cpp (1)

214-223: Over-read in checkpoint mode: num_bytes_to_read computed as absolute position.

This can trigger unnecessary large reads and buffer growth. Compute delta from buffer_end_pos.

-        num_bytes_to_read = int_round_up_to_multiple(
-                buffer_end_pos + num_bytes_to_refill,
-                m_base_buffer_size
-        );
+        auto const target_end = int_round_up_to_multiple(
+                buffer_end_pos + num_bytes_to_refill,
+                m_base_buffer_size
+        );
+        num_bytes_to_read = target_end - buffer_end_pos;
♻️ Duplicate comments (5)
components/core/tests/test-BufferedFileReader.cpp (4)

1-7: Add missing headers for std::max and memcmp.

Compilation relies on transitive includes; add the explicit headers.

 #include <array>
+#include <algorithm>   // std::max
 #include <random>
+#include <cstring>     // std::memcmp

29-31: Prefer std::vector over heap-allocated std::array.

Simpler, fewer indirections; contiguous storage preserved.

-    auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>();
-    auto& test_data = *test_data_uniq_ptr;
+    std::vector<char> test_data(test_data_size);
@@
-    auto read_buf_uniq_ptr = make_unique<std::array<char, test_data_size>>();
-    auto& read_buf = *read_buf_uniq_ptr;
+    std::vector<char> read_buf(test_data_size);
@@
-    auto test_data_uniq_ptr = make_unique<std::array<char, test_data_size>>();
-    auto& test_data = *test_data_uniq_ptr;
+    std::vector<char> test_data(test_data_size);

Also applies to: 42-44, 274-276


35-35: Use unique temp paths to avoid collisions in parallel runs.

Hardcoded filenames are flaky in CI.

-    string const test_file_path{"BufferedFileReader.test"};
+    auto const tmp_dir = boost::filesystem::temp_directory_path();
+    string const test_file_path =
+        (tmp_dir / boost::filesystem::unique_path("BufferedFileReader-%%%%%%%%.test")).string();
@@
-    string const test_file_path{"BufferedFileReader.delimiter.test"};
+    auto const tmp_dir = boost::filesystem::temp_directory_path();
+    string const test_file_path =
+        (tmp_dir / boost::filesystem::unique_path("BufferedFileReaderDelimiter-%%%%%%%%.test")).string();

Also applies to: 282-282


267-271: Seed RNG deterministically for reproducible tests.

Avoid non‑deterministic failures.

-    std::random_device rd;
-    std::mt19937 gen(rd());
+    // Fixed seed for reproducibility
+    std::mt19937 gen(0xC0FFEEu);
components/core/src/clp/BufferedFileReader.hpp (1)

145-147: Resolve conflicting @return lines.

This method can fail from BufferReader seek or underlying ReaderInterface operations; document unambiguously.

-     * @return Same as BufferReader::try_seek_from_begin if it fails
-     * @return Same as ReaderInterface::try_seek_from_begin if it fails
+     * @return Same as BufferReader::try_seek_from_begin; may also return the error from
+     * ReaderInterface::try_seek_from_begin/refill when advancing the underlying reader.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bf6cff and b01f086.

📒 Files selected for processing (3)
  • components/core/src/clp/BufferedFileReader.cpp (7 hunks)
  • components/core/src/clp/BufferedFileReader.hpp (8 hunks)
  • components/core/tests/test-BufferedFileReader.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
  • components/core/src/clp/BufferedFileReader.cpp
🧠 Learnings (5)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
  • components/core/src/clp/BufferedFileReader.cpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
  • components/core/src/clp/BufferedFileReader.hpp
  • components/core/src/clp/BufferedFileReader.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/tests/test-BufferedFileReader.cpp
🧬 Code graph analysis (3)
components/core/tests/test-BufferedFileReader.cpp (3)
components/core/src/clp/BufferedFileReader.cpp (1)
  • BufferedFileReader (20-43)
components/core/src/clp/BufferedFileReader.hpp (6)
  • BufferedFileReader (78-81)
  • BufferedFileReader (83-84)
  • BufferedFileReader (83-83)
  • BufferedFileReader (87-87)
  • BufferedFileReader (90-90)
  • BufferedFileReader (91-91)
components/core/src/clp/FileReader.cpp (2)
  • FileReader (15-23)
  • FileReader (25-32)
components/core/src/clp/BufferedFileReader.hpp (2)
components/core/src/clp/BufferedFileReader.cpp (1)
  • BufferedFileReader (20-43)
components/core/src/clp/BufferReader.hpp (5)
  • nodiscard (20-22)
  • nodiscard (33-33)
  • nodiscard (99-101)
  • pos (77-77)
  • pos (83-83)
components/core/src/clp/BufferedFileReader.cpp (4)
components/core/src/clp/BufferedFileReader.hpp (13)
  • BufferedFileReader (78-81)
  • BufferedFileReader (83-84)
  • BufferedFileReader (83-83)
  • BufferedFileReader (87-87)
  • BufferedFileReader (90-90)
  • BufferedFileReader (91-91)
  • OperationFailed (43-49)
  • OperationFailed (51-58)
  • buf (110-110)
  • buf (161-162)
  • pos (133-133)
  • pos (149-149)
  • pos (213-213)
components/core/src/clp/BufferReader.hpp (5)
  • OperationFailed (16-17)
  • buf (39-39)
  • buf (68-69)
  • pos (77-77)
  • pos (83-83)
components/core/src/clp/BufferReader.cpp (4)
  • try_seek_from_begin (83-89)
  • try_seek_from_begin (83-83)
  • try_read (60-81)
  • try_read (60-61)
components/core/src/clp/math_utils.hpp (2)
  • int_round_up_to_multiple (13-18)
  • int_round_up_to_multiple (13-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-14, true)

Comment on lines 45 to 49
auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (m_buffer_reader->get_buffer_size() > 0) {
if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) {
return ErrorCode_Success;
}
return refill_reader_buffer(m_base_buffer_size);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Refill predicate reads clearer as “empty buffer” check.

Current condition is correct; minor readability tweak optional.

-    if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) {
+    // If there is unread data, no refill is needed
+    if (m_buffer_reader.get_pos() != m_buffer_reader.get_buffer_size()) {
         return ErrorCode_Success;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (m_buffer_reader->get_buffer_size() > 0) {
if (m_buffer_reader.get_buffer_size() != m_buffer_reader.get_pos()) {
return ErrorCode_Success;
}
return refill_reader_buffer(m_base_buffer_size);
auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode {
// If there is unread data, no refill is needed
if (m_buffer_reader.get_pos() != m_buffer_reader.get_buffer_size()) {
return ErrorCode_Success;
}
return refill_reader_buffer(m_base_buffer_size);
}
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.cpp around lines 45 to 49, the
refill predicate is correct but reads less clearly; change the buffer-empty
check to a clearer expression (e.g., if (m_buffer_reader.get_pos() <
m_buffer_reader.get_buffer_size()) return ErrorCode_Success; or explicitly check
for empty with if (m_buffer_reader.get_pos() >=
m_buffer_reader.get_buffer_size()) then call refill_reader_buffer), keeping the
existing behavior but making the intent (“buffer empty” vs “not empty”) more
readable.

Comment on lines 130 to 136
if (nullptr == buf) {
return ErrorCode_BadParam;
}
num_bytes_read = 0;
if (num_bytes_to_read == 0) {
return ErrorCode_Success;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Align null-buffer handling with zero-byte reads.

Permit buf == nullptr when num_bytes_to_read == 0 to match BufferReader semantics and avoid surprising BadParam.

-    if (nullptr == buf) {
-        return ErrorCode_BadParam;
-    }
-    num_bytes_read = 0;
-    if (num_bytes_to_read == 0) {
+    num_bytes_read = 0;
+    if (0 == num_bytes_to_read) {
         return ErrorCode_Success;
     }
+    if (nullptr == buf) {
+        return ErrorCode_BadParam;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (nullptr == buf) {
return ErrorCode_BadParam;
}
num_bytes_read = 0;
if (num_bytes_to_read == 0) {
return ErrorCode_Success;
}
num_bytes_read = 0;
if (0 == num_bytes_to_read) {
return ErrorCode_Success;
}
if (nullptr == buf) {
return ErrorCode_BadParam;
}
🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.cpp around lines 130 to 136, the
code rejects a null buf before checking for a zero-length read; change the logic
so that zero-byte reads return ErrorCode_Success (and set num_bytes_read = 0)
even if buf is nullptr—either move the num_bytes_to_read == 0 check before the
buf null check or explicitly allow buf == nullptr when num_bytes_to_read == 0,
keeping the existing behavior for non-zero reads.

#define CLP_BUFFEREDFILEREADER_HPP

#include <cstdio>
#include <cstddef>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Guard get_buffer_relative_pos against underflow (debug).

Assert the precondition to catch misuse in debug builds.

-#include <cstddef>
+#include <cstddef>
+#include <cassert>
@@
-    [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
-        return file_pos - m_buffer_begin_pos;
-    }
+    [[nodiscard]] auto get_buffer_relative_pos(size_t file_pos) const -> size_t {
+        assert(file_pos >= m_buffer_begin_pos);
+        return file_pos - m_buffer_begin_pos;
+    }

Also applies to: 200-206

🤖 Prompt for AI Agents
In components/core/src/clp/BufferedFileReader.hpp around line 4 and also apply
to lines ~200-206, add a debug-only precondition assert in
get_buffer_relative_pos to guard against unsigned underflow when computing the
relative position; specifically assert that the absolute position (or pointer)
is not before the buffer start (or that pos >= buffer_start) before performing
the subtraction, so misuse is caught in debug builds and the subtraction cannot
underflow.

Comment on lines 300 to 304
// Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());

// Validate that a FileReader and a BufferedFileReader return the same strings (split by

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Also test set+clear checkpoint with no reads.

Current check only clears without a prior set; add set_checkpoint() to cover the regression discussed in earlier reviews.

-    // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
-    buffered_file_reader.clear_checkpoint();
+    // Validate setting then clearing a checkpoint without any reading keeps the beginning offset
+    buffered_file_reader.set_checkpoint();
+    buffered_file_reader.clear_checkpoint();
     REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
// Validate that a FileReader and a BufferedFileReader return the same strings (split by
// Validate setting then clearing a checkpoint without any reading keeps the beginning offset
buffered_file_reader.set_checkpoint();
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
// Validate that a FileReader and a BufferedFileReader return the same strings (split by
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedFileReader.cpp around lines 300 to 304,
the test only calls clear_checkpoint() without first calling set_checkpoint();
update the test to call buffered_file_reader.set_checkpoint() immediately before
buffered_file_reader.clear_checkpoint() so it validates the set+clear checkpoint
behavior when no reads have occurred, and keep the subsequent REQUIRE asserting
reader_begin_offset == buffered_file_reader.get_pos().

@LinZhihao-723 LinZhihao-723 self-requested a review September 17, 2025 18:21

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a large refactoring on the latest commit, which includes:

  • Renamed it to BufferedReader, since the source is no longer a file.
  • Updated the docstring to reflect the latest changes.
    • Use source reader to refer to the underlying reader.
    • Polish some docstrings.
    • Update the docstring to match our latest docstring guideline.
  • Renamed some variables to add in_src to indicate that the position is relative to the source reader.

Notice that I didn't do any logical refactoring on the existing implementation. All unit tests passed locally, but I can't conclude all edge cases are covered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
components/core/src/clp/clp/FileCompressor.hpp (1)

100-110: Doc nit: clarify wording (“archive” vs. “archive_writer”).

“Tries to compress the given file as if it were a generic archive_writer” is confusing. It’s compressing input “as an archive”, not “as an archive_writer”.

- * Tries to compress the given file as if it were a generic archive_writer
+ * Tries to compress the given file as if it were a generic archive
components/core/tests/test-BufferedReader.cpp (2)

61-63: Fix outdated comment.

-        // Read a large chunk of data, so BufferedFileReader will refill the
+        // Read a large chunk of data, so BufferedReader will refill the

180-184: Use try_seek_from_begin consistently (and assert result).

Direct seek_from_begin here bypasses the test’s error checks.

-        reader.seek_from_begin(seek_pos_2);
+        REQUIRE(ErrorCode_Success == reader.try_seek_from_begin(seek_pos_2));
♻️ Duplicate comments (1)
components/core/src/clp/clp/FileCompressor.cpp (1)

124-124: Known issue: Handle exceptions from std::filesystem::canonical

As discussed in previous reviews, std::filesystem::canonical can throw exceptions if the file doesn't exist or there are permission issues. This is tracked in Issue #553.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b01f086 and 2fa7e03.

📒 Files selected for processing (7)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/BufferedReader.cpp (1 hunks)
  • components/core/src/clp/BufferedReader.hpp (1 hunks)
  • components/core/src/clp/clp/CMakeLists.txt (1 hunks)
  • components/core/src/clp/clp/FileCompressor.cpp (7 hunks)
  • components/core/src/clp/clp/FileCompressor.hpp (4 hunks)
  • components/core/tests/test-BufferedReader.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/BufferedReader.hpp
  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/BufferedReader.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
  • components/core/src/clp/clp/FileCompressor.cpp
🧠 Learnings (14)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/CMakeLists.txt
  • components/core/src/clp/BufferedReader.hpp
  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/BufferedReader.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • components/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/BufferedReader.hpp
  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/BufferedReader.cpp
📚 Learning: 2024-12-10T16:56:33.545Z
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.

Applied to files:

  • components/core/src/clp/BufferedReader.cpp
📚 Learning: 2024-10-24T14:25:17.978Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.hpp
  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
🧬 Code graph analysis (5)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
  • BufferedReader (20-43)
components/core/tests/test-BufferedReader.cpp (2)
components/core/src/clp/BufferedReader.cpp (1)
  • BufferedReader (20-43)
components/core/src/clp/BufferedReader.hpp (6)
  • BufferedReader (72-75)
  • BufferedReader (77-78)
  • BufferedReader (77-77)
  • BufferedReader (81-81)
  • BufferedReader (84-84)
  • BufferedReader (85-85)
components/core/src/clp/BufferedReader.cpp (1)
components/core/src/clp/BufferedReader.hpp (15)
  • BufferedReader (72-75)
  • BufferedReader (77-78)
  • BufferedReader (77-77)
  • BufferedReader (81-81)
  • BufferedReader (84-84)
  • BufferedReader (85-85)
  • OperationFailed (37-43)
  • OperationFailed (45-52)
  • buf (104-104)
  • buf (156-157)
  • pos (127-127)
  • pos (143-143)
  • pos (208-208)
  • num_bytes_to_refill (184-184)
  • delim (171-172)
components/core/src/clp/clp/FileCompressor.hpp (4)
components/core/src/clp/streaming_compression/zstd/Decompressor.hpp (1)
  • reader (83-83)
components/core/src/clp/streaming_compression/Decompressor.hpp (1)
  • reader (54-54)
components/core/src/clp/streaming_compression/passthrough/Decompressor.hpp (1)
  • reader (77-77)
components/core/src/clp/LibarchiveReader.hpp (1)
  • reader (47-47)
components/core/src/clp/clp/FileCompressor.cpp (2)
components/core/src/clp/clp/FileCompressor.hpp (6)
  • target_data_size_of_dicts (43-50)
  • target_data_size_of_dicts (68-77)
  • target_data_size_of_dicts (79-87)
  • target_data_size_of_dicts (89-97)
  • target_data_size_of_dicts (110-118)
  • target_data_size_of_dicts (131-139)
components/core/src/clp/LibarchiveReader.hpp (1)
  • reader (47-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (10)
components/core/src/clp/clp/FileCompressor.hpp (1)

58-77: New parse_and_encode wrapper: API shape is fine.

Signature and docs align with the ReaderInterface refactor and void return. No issues from an API standpoint.

components/core/src/clp/BufferedReader.cpp (5)

233-239: Consider validating buffer allocation before read

The code attempts to read into &m_buffer[next_buffer_pos] without explicitly checking if the resize operations (lines 219 or implicit vector capacity) were successful. While std::vector::resize() will throw std::bad_alloc on failure, it's worth noting this implicit dependency.

The implementation correctly handles buffer management with appropriate resizing before reads.


138-139: Prefer explicit condition check per coding guidelines

Per the coding guidelines, prefer false == <expression> rather than using the negation operator.

-    while (false == dst_view.empty()) {
+    while (false == dst_view.empty()) {

Note: This line already follows the guideline correctly. No change needed.


166-168: Prefer explicit condition check per coding guidelines

Per the coding guidelines, prefer false == <expression> rather than using the negation operator.

-    if (false == append) {
+    if (false == append) {

Note: This line already follows the guideline correctly. No change needed.


33-34: Prefer explicit condition check per coding guidelines

Per the coding guidelines, prefer false == <expression> rather than using the negation operator.

-    if (base_buffer_size % cMinBufferSize != 0) {
+    if (false == (base_buffer_size % cMinBufferSize == 0)) {

Likely an incorrect or invalid review comment.


67-69: Unnecessary empty return check

The empty check before return is redundant since returning from a void function with no value is valid regardless.

 auto BufferedReader::clear_checkpoint() -> void {
-    if (false == m_checkpoint_pos.has_value()) {
-        return;
-    }
-
-    auto error_code = try_seek_from_begin(m_highest_read_in_src_pos);
-    if (ErrorCode_Success != error_code) {
-        // Should never happen
-        throw OperationFailed(error_code, __FILENAME__, __LINE__);
+    if (m_checkpoint_pos.has_value()) {
+        auto error_code = try_seek_from_begin(m_highest_read_in_src_pos);
+        if (ErrorCode_Success != error_code) {
+            // Should never happen
+            throw OperationFailed(error_code, __FILENAME__, __LINE__);
+        }
+        drop_content_before_current_pos();
+        m_checkpoint_pos.reset();
     }
-    drop_content_before_current_pos();
-    m_checkpoint_pos.reset();
 }
⛔ Skipped due to learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
components/core/src/clp/clp/FileCompressor.cpp (4)

129-129: LGTM! Clean migration to BufferedReader

The refactoring to use BufferedReader with FileReader wrapped in a smart pointer is well-implemented and aligns with the PR objectives of supporting ReaderInterface-based sources.


189-220: Well-structured delegation pattern

The new parse_and_encode wrapper method cleanly delegates to the appropriate encoding strategy based on the use_heuristic flag. This design improves maintainability and supports the ReaderInterface abstraction.


393-402: Excellent use of unified interface

The migration to use the new parse_and_encode wrapper with m_libarchive_file_reader demonstrates good code reuse and maintains consistency across different reader types.


311-311: Prefer consistent variable declaration style

Consider moving the variable declaration closer to its first use (line 313) for better code locality.

-    string filename_if_compressed;
     if (file_boost_path.has_stem()) {
-        filename_if_compressed = file_boost_path.stem().string();
+        string filename_if_compressed = file_boost_path.stem().string();
     } else {
-        filename_if_compressed = file_boost_path.filename().string();
+        string filename_if_compressed = file_boost_path.filename().string();
     }
+    auto error_code = m_libarchive_reader.try_open(file_reader, filename_if_compressed);

Wait, the variable is used after the if-else block. The current placement is correct.

Comment thread components/core/CMakeLists.txt
Comment thread components/core/src/clp/BufferedReader.cpp
Comment thread components/core/src/clp/BufferedReader.hpp
Comment thread components/core/src/clp/BufferedReader.hpp
Comment thread components/core/src/clp/clp/CMakeLists.txt
Comment thread components/core/tests/test-BufferedReader.cpp
Comment thread components/core/tests/test-BufferedReader.cpp Outdated
Comment thread components/core/tests/test-BufferedReader.cpp
Comment thread components/core/tests/test-BufferedReader.cpp
Comment on lines +301 to +307
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());

// Validate that a FileReader and a BufferedFileReader return the same strings (split by
// delimiters)
clp::ErrorCode error_code{ErrorCode_Success};
auto delimiter = (char)('a' + (std::rand() % (cNumAlphabets)));
ErrorCode error_code{ErrorCode_Success};
auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Comment/name consistency: refer to BufferedReader.

-    // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
+    // Validate that clearing a checkpoint without any reading doesn't change the beginning offset
@@
-    // Validate that a FileReader and a BufferedFileReader return the same strings (split by
+    // Validate that a FileReader and a BufferedReader return the same strings (split by
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
// Validate that a FileReader and a BufferedFileReader return the same strings (split by
// delimiters)
clp::ErrorCode error_code{ErrorCode_Success};
auto delimiter = (char)('a' + (std::rand() % (cNumAlphabets)));
ErrorCode error_code{ErrorCode_Success};
auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));
// Validate that clearing a checkpoint without any reading doesn't change the beginning offset
buffered_file_reader.clear_checkpoint();
REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
// Validate that a FileReader and a BufferedReader return the same strings (split by
// delimiters)
ErrorCode error_code{ErrorCode_Success};
auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));
🤖 Prompt for AI Agents
In components/core/tests/test-BufferedReader.cpp around lines 301 to 307, the
test uses wording/identifiers referencing "BufferedFileReader" while the
file/class is BufferedReader; update the comment and any variable identifiers
for consistency: change the comment text to mention BufferedReader and rename
variables (and related setup/usage) from buffered_file_reader to buffered_reader
(and reader_begin_offset accordingly) or vice versa so names match the actual
class under test, and update any includes or factory calls if necessary to use
the correct BufferedReader symbol.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
components/core/tests/test-BufferedReader.cpp (1)

61-63: Comment naming: refer to BufferedReader.

-        // Read a large chunk of data, so BufferedFileReader will refill the
+        // Read a large chunk of data, so BufferedReader will refill the
components/core/src/clp/clp/FileCompressor.hpp (3)

34-49: Doc missing for new parameter use_heuristic.

Please add @param use_heuristic to the compress_file comment for completeness.


109-117: Add param detail for file_reader in archive path.

State that file_reader is positioned at the start of the container and will be read as the archive source.

  * @param archive_writer
- * @param file_reader
+ * @param file_reader Reader positioned at the start of the container; consumed but not closed.
  * @param use_heuristic

4-16: Avoid relying on transitive includes; include ReaderInterface.hpp explicitly.

This header declares functions taking ReaderInterface& but does not include it directly. Add the include (or forward-declare and qualify as needed) to reduce coupling on LibarchiveReader.hpp.

 #include "../ir/LogEventDeserializer.hpp"
 #include "../LibarchiveFileReader.hpp"
 #include "../LibarchiveReader.hpp"
 #include "../MessageParser.hpp"
 #include "../ParsedMessage.hpp"
 #include "../streaming_archive/writer/Archive.hpp"
 #include "FileToCompress.hpp"
+#include "../ReaderInterface.hpp"
components/core/src/clp/clp/FileCompressor.cpp (1)

321-327: Consider logging error_code from try_open for diagnosability.

Including the specific ErrorCode can ease triage.

-    if (ErrorCode_Success != error_code) {
-        SPDLOG_ERROR(
-                "Cannot compress {} - failed to open with libarchive.",
-                file_to_compress.get_path().c_str()
-        );
+    if (ErrorCode_Success != error_code) {
+        SPDLOG_ERROR(
+            "Cannot compress {} - failed to open with libarchive, error={}",
+            file_to_compress.get_path().c_str(),
+            error_code
+        );
♻️ Duplicate comments (11)
components/core/tests/test-BufferedReader.cpp (10)

1-2: Add missing standard headers for algorithms, C string ops, and smart pointers.

Required for std::max, memcmp, and std::make_unique/std::make_shared.

Apply:

 #include <array>
-#include <random>
+#include <random>
+#include <algorithm>  // std::max
+#include <cstring>    // std::memcmp
+#include <memory>     // std::make_unique, std::make_shared

26-26: Rename test tag to match the new class.

-TEST_CASE("Test reading data", "[BufferedFileReader]") {
+TEST_CASE("Test reading data", "[BufferedReader]") {

28-33: std::array size must be compile‑time constant. Make test_data_size constexpr (and use unsigned literal).

-    size_t const test_data_size = 4L * 1024 * 1024 + 1;  // 4MB + 1
+    constexpr size_t test_data_size = 4ULL * 1024 * 1024 + 1;  // 4 MiB + 1

35-35: Align temporary filename with the new class name (clarity).

-    string const test_file_path{"BufferedFileReader.test"};
+    string const test_file_path{"BufferedReader.test"};

44-46: Construct BufferedReader with a shared_ptr explicitly (matches API and avoids subtle conversions).

-    size_t const base_buffer_size = BufferedReader::cMinBufferSize << 4;
-    BufferedReader reader{make_unique<FileDescriptorReader>(test_file_path), base_buffer_size};
+    size_t const base_buffer_size = BufferedReader::cMinBufferSize << 4;
+    BufferedReader reader{std::make_shared<FileDescriptorReader>(test_file_path), base_buffer_size};

266-266: Rename second test tag to “[BufferedReader]”.

-TEST_CASE("Test delimiter", "[BufferedFileReader]") {
+TEST_CASE("Test delimiter", "[BufferedReader]") {

273-279: Make size constexpr here as well (same compile‑time constant issue).

-    size_t const test_data_size = 1L * 1024 * 1024 + 1;  // 1MB
+    constexpr size_t test_data_size = 1ULL * 1024 * 1024 + 1;  // 1 MiB + 1

281-283: Rename delimiter test filename for consistency.

-    string const test_file_path{"BufferedFileReader.delimiter.test"};
+    string const test_file_path{"BufferedReader.delimiter.test"};

288-299: Use shared_ptr for reader construction, and align variable names/comments with BufferedReader.

-    // Instantiate BufferedFileReader and the reference FileReader from a non-zero pos
-    auto fd_reader = make_unique<FileDescriptorReader>(test_file_path);
-    fd_reader->seek_from_begin(reader_begin_offset);
-    BufferedReader buffered_file_reader{std::move(fd_reader)};
-    string test_string;
+    // Instantiate BufferedReader and the reference FileReader from a non-zero position
+    auto fd_reader = std::make_shared<FileDescriptorReader>(test_file_path);
+    fd_reader->seek_from_begin(reader_begin_offset);
+    BufferedReader buffered_reader{fd_reader};
+    string test_string;
@@
-    FileReader ref_file_reader{test_file_path};
+    FileReader ref_file_reader{test_file_path};

300-307: Grammar/name consistency and follow‑through on rename.

-    // Validate a clearing a checkpoint without any reading wouldn't change the beginning offset
-    buffered_file_reader.clear_checkpoint();
-    REQUIRE(reader_begin_offset == buffered_file_reader.get_pos());
+    // Validate that clearing a checkpoint without any reading doesn't change the beginning offset
+    buffered_reader.clear_checkpoint();
+    REQUIRE(reader_begin_offset == buffered_reader.get_pos());
@@
-    // Validate that a FileReader and a BufferedFileReader return the same strings (split by
+    // Validate that a FileReader and a BufferedReader return the same strings (split by
@@
-    auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));
+    auto delimiter = static_cast<char>(uniformly_distributed_alphabet(gen));

Also update subsequent uses:

-        auto const error_code2
-                = buffered_file_reader.try_read_to_delimiter(delimiter, true, false, test_string);
+        auto const error_code2
+                = buffered_reader.try_read_to_delimiter(delimiter, true, false, test_string);
components/core/src/clp/clp/FileCompressor.cpp (1)

125-126: std::filesystem::canonical may throw; tracked in Issue #553.

No action in this PR per prior discussion, just flagging that this still may throw on missing/permission errors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa7e03 and 4f4f5a3.

📒 Files selected for processing (3)
  • components/core/src/clp/clp/FileCompressor.cpp (7 hunks)
  • components/core/src/clp/clp/FileCompressor.hpp (3 hunks)
  • components/core/tests/test-BufferedReader.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/clp/FileCompressor.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/clp/FileCompressor.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/tests/test-BufferedReader.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/tests/test-BufferedReader.cpp
📚 Learning: 2024-12-05T16:32:21.507Z
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.cpp
  • components/core/src/clp/clp/FileCompressor.hpp
📚 Learning: 2024-10-24T14:25:17.978Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.

Applied to files:

  • components/core/src/clp/clp/FileCompressor.hpp
🧬 Code graph analysis (3)
components/core/tests/test-BufferedReader.cpp (2)
components/core/src/clp/BufferedReader.hpp (6)
  • BufferedReader (72-75)
  • BufferedReader (77-78)
  • BufferedReader (77-77)
  • BufferedReader (81-81)
  • BufferedReader (84-84)
  • BufferedReader (85-85)
components/core/src/clp/BufferedReader.cpp (1)
  • BufferedReader (20-43)
components/core/src/clp/clp/FileCompressor.cpp (1)
components/core/src/clp/LibarchiveReader.hpp (1)
  • reader (47-47)
components/core/src/clp/clp/FileCompressor.hpp (4)
components/core/src/clp/streaming_compression/zstd/Decompressor.hpp (1)
  • reader (83-83)
components/core/src/clp/streaming_compression/Decompressor.hpp (1)
  • reader (54-54)
components/core/src/clp/streaming_compression/passthrough/Decompressor.hpp (1)
  • reader (77-77)
components/core/src/clp/LibarchiveReader.hpp (1)
  • reader (47-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/core/src/clp/clp/FileCompressor.cpp (4)

157-166: Exception propagation bypasses profiler stop; confirm intended behaviour.

If parse_and_encode throws, stop_continuous_measurement won’t run. If that’s undesired, consider a small scope guard or catching here and setting succeeded = false.

-    if (is_utf8_encoded({utf8_validation_buf, utf8_validation_buf_len})) {
-        parse_and_encode( ... );
-    } else {
+    if (is_utf8_encoded({utf8_validation_buf, utf8_validation_buf_len})) {
+        try {
+            parse_and_encode( ... );
+        } catch (std::exception const& e) {
+            SPDLOG_ERROR("Failed to parse/encode {}: {}", file_to_compress.get_path(), e.what());
+            succeeded = false;
+        }
+    } else {

190-221: Wrapper looks good; consistent routing by use_heuristic.

The dispatcher cleanly centralizes the path choice. No concerns.


395-403: LGTM: archive member path uses the same parse_and_encode wrapper.

Consistent with the non-archive path; good reuse.


5-8: No unused <iostream> or unique_ptr> includes/usings to remove
<iostream> and its std::cout/std::endl usages appear later in the file, and std::unique_ptr/std::make_unique are in use, so these includes and using-declarations must remain.

Comment thread components/core/src/clp/clp/FileCompressor.cpp
Comment thread components/core/src/clp/clp/FileCompressor.hpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/core/src/clp/BufferedReader.hpp (2)

28-30: Doc fix: don't claim "page size" when enforcing a fixed 4 KiB multiple.

The comment states "multiple of the page size" but the implementation enforces a fixed 4 KiB multiple (cMinBufferSize).

Apply this diff to fix the documentation:

- * NOTE: This class restricts the buffer size to a multiple of the page size and we avoid reading
- * anything less than a page to avoid multiple page faults.
+ * NOTE: This class restricts the buffer size to a multiple of cMinBufferSize (4 KiB) and avoids
+ * reading less than that to reduce page faults.

86-87: Use canonical signatures for deleted assignment operators.

Using operator=(BufferedReader const&) instead of the typical by-value parameter for copy assignment is preferable for clarity.

The current code already uses the canonical signature operator=(BufferedReader const&) for the copy assignment operator, which is good. The move assignment operator on line 87 is also correctly deleted. No changes needed here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 737e4a4 and fbbd8c7.

📒 Files selected for processing (1)
  • components/core/src/clp/BufferedReader.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/BufferedReader.hpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/BufferedReader.hpp
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/src/clp/BufferedReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
  • BufferedReader (20-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (3)
components/core/src/clp/BufferedReader.hpp (3)

95-95: Well-designed error handling delegation.

Good practice forwarding refill_reader_buffer's return values for consistent error propagation.


127-127: Clear and accurate documentation.

The method documentation clearly states it returns the position in the source reader, which aligns with the implementation.


156-157: Documentation correctly references BufferReader::try_read.

Based on the retrieved learnings, the documentation appropriately references BufferReader::try_read since BufferedReader::try_read explicitly calls it.

Comment on lines +195 to +197
[[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
return in_src_pos - m_buffer_begin_in_src_pos;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Ensure overflow safety in position calculation.

The subtraction in_src_pos - m_buffer_begin_in_src_pos could underflow if in_src_pos is less than m_buffer_begin_in_src_pos.

Consider adding a debug assertion to catch programming errors:

 [[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
+    // Ensure the position is within the buffered range
+    assert(in_src_pos >= m_buffer_begin_in_src_pos);
     return in_src_pos - m_buffer_begin_in_src_pos;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
return in_src_pos - m_buffer_begin_in_src_pos;
}
[[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
// Ensure the position is within the buffered range
assert(in_src_pos >= m_buffer_begin_in_src_pos);
return in_src_pos - m_buffer_begin_in_src_pos;
}
🤖 Prompt for AI Agents
components/core/src/clp/BufferedReader.hpp around lines 195-197: the subtraction
return in_src_pos - m_buffer_begin_in_src_pos can underflow if in_src_pos <
m_buffer_begin_in_src_pos; add a debug-time assertion (e.g.,
assert/instrumentation/Expects) that in_src_pos >= m_buffer_begin_in_src_pos to
catch programming errors, then perform the subtraction only after the check;
also consider a clear fail-fast behavior for release builds (throw
std::out_of_range or clamp) if you want to avoid silent underflow in production.

@gibber9809 gibber9809 changed the title refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader. refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader; rename BufferedFileReader to BufferedReader. Sep 18, 2025

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the recent edits these changes LGTM. Directly edited the PR title as well.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/core/src/clp/clp/CMakeLists.txt (1)

4-5: BufferedReader source swap looks good; verify no stale BufferedFileReader refs remain.

Change is correct. Please re-check repo for lingering BufferedFileReader sources/includes/tests to avoid duplicate/unused files.

#!/bin/bash
# Find any remaining BufferedFileReader references (code, CMake, tests, lint)
rg -n --hidden --no-ignore -S '\bBufferedFileReader\b|test-BufferedFileReader\.cpp' || true
fd -HI 'BufferedFileReader.*' || true
components/core/CMakeLists.txt (1)

444-446: Unit test/source rename to BufferedReader is aligned; ensure legacy refs are removed project‑wide.

Looks consistent with the refactor. Please confirm no leftover BufferedFileReader files/references (incl. taskfiles/lint.yaml, glt).

#!/bin/bash
# Repo-wide verification for stale BufferedFileReader artifacts
rg -n --hidden --no-ignore -S '\bBufferedFileReader\b|test-BufferedFileReader\.cpp' || true
fd -HI 'BufferedFileReader.*|test-BufferedFileReader.cpp' || true

Also applies to: 692-692

components/core/src/clp/BufferedReader.hpp (2)

28-29: Doc fix: “page size” → cMinBufferSize (4 KiB).

Update to reflect the actual constraint.

- * NOTE: This class restricts the buffer size to a multiple of the page size and we avoid reading
- * anything less than a page to avoid multiple page faults.
+ * NOTE: This class restricts the buffer size to a multiple of cMinBufferSize (4 KiB) and avoids
+ * reading less than that to reduce page faults.

195-197: Guard against underflow in position math.

Add a debug assertion before subtraction.

 [[nodiscard]] auto get_in_buffer_pos(size_t in_src_pos) const -> size_t {
+    assert(in_src_pos >= m_buffer_begin_in_src_pos);
     return in_src_pos - m_buffer_begin_in_src_pos;
 }

Also add the header at the top of this file:

#include <cassert>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbbd8c7 and b542f17.

📒 Files selected for processing (3)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/BufferedReader.hpp (1 hunks)
  • components/core/src/clp/clp/CMakeLists.txt (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/BufferedReader.hpp
🧠 Learnings (5)
📓 Common learnings
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp/BufferedReader.hpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-10-24T14:46:00.664Z
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Applied to files:

  • components/core/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp/BufferedReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp/BufferedReader.hpp (1)
components/core/src/clp/BufferedReader.cpp (1)
  • BufferedReader (20-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
🔇 Additional comments (1)
components/core/src/clp/BufferedReader.hpp (1)

84-88: Good: canonical deletion of copy/move assignment operators.

Signatures are explicit and standard.

Comment thread components/core/src/clp/BufferedReader.hpp
@LinZhihao-723 LinZhihao-723 changed the title refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader; rename BufferedFileReader to BufferedReader. refactor(core): Make BufferedFileReader wrap a ReaderInterface instead of a FileReader; Rename BufferedFileReader to BufferedReader. Sep 18, 2025

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated title directly. Otherwise lgtm.

@LinZhihao-723 LinZhihao-723 merged commit 1f2a728 into y-scope:main Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants