clp-core: Add custom headers support for NetworkReader#564
clp-core: Add custom headers support for NetworkReader#564anlowee wants to merge 42 commits intoy-scope:mainfrom anlowee:xiaochong-add-user-customized-header
Conversation
WalkthroughThe changes introduce a new parameter, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant NetworkReader
participant DownloaderThread
participant CurlDownloadHandler
User->>NetworkReader: Create instance with http_header_kv_pairs
NetworkReader->>DownloaderThread: Pass http_header_kv_pairs
DownloaderThread->>CurlDownloadHandler: Initialize with http_header_kv_pairs
CurlDownloadHandler->>CurlDownloadHandler: Append valid headers
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/CurlDownloadHandler.hpp (1)
67-68: Documentation needed for custom_headers parameterThe constructor's documentation block should be updated to include details about the
custom_headersparameter, including its purpose and any restrictions on header types.Add documentation like this:
* @param overall_timeout Maximum time that the transfer may take. Note that this includes * `connection_timeout`. Doc: https://curl.se/libcurl/c/CURLOPT_TIMEOUT.html + * @param custom_headers Map of custom HTTP headers to be included in the requestcomponents/core/src/clp/NetworkReader.hpp (2)
107-108: Add parameter documentation for custom_headersThe constructor's documentation block should be updated to include the new
custom_headersparameter. This helps maintain API documentation completeness.Add this to the existing parameter documentation:
* @param buffer_pool_size The required number of buffers in the buffer pool. * @param buffer_size The size of each buffer in the buffer pool. +* @param custom_headers Map of custom HTTP headers to be included in the request.
248-252: Add parameter documentation for custom_headersThe DownloaderThread constructor's documentation block should be updated to include the new parameter.
Add this to the existing parameter documentation:
* @param reader * @param offset Index of the byte at which to start the download. * @param disable_caching Whether to disable caching. + * @param custom_headers Map of custom HTTP headers to be included in the request.Also applies to: 265-265
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/src/clp/CurlDownloadHandler.hpp (2 hunks)
- components/core/src/clp/NetworkReader.cpp (3 hunks)
- components/core/src/clp/NetworkReader.hpp (4 hunks)
🔇 Additional comments (7)
components/core/src/clp/CurlDownloadHandler.cpp (2)
5-5: LGTM: Required include for std::mapThe addition of the map header is necessary for the custom headers parameter.
23-24: 🛠️ Refactor suggestionConsider adding parameter validation
While the implementation accepts custom headers, it might be beneficial to validate the map's contents before processing.
Let's check if there are any existing parameter validations in the codebase:
Consider adding validation:
std::chrono::seconds overall_timeout, std::map<std::string, std::string> const& custom_headers ) : m_error_msg_buf{std::move(error_msg_buf)} { + // Validate custom headers + for (auto const& [key, value] : custom_headers) { + if (key.empty() || value.empty()) { + throw std::invalid_argument("Empty key or value in custom headers"); + } + }components/core/src/clp/CurlDownloadHandler.hpp (1)
7-7: LGTM: Include statement properly addedThe
<map>header is correctly included and properly ordered alphabetically with other standard headers.components/core/src/clp/NetworkReader.cpp (2)
121-122: 🛠️ Refactor suggestionConsider adding header validation and documentation.
While the implementation correctly handles custom headers, consider:
- Adding validation for header names and values to prevent injection of malicious or invalid headers
- Documenting the expected format and any restrictions for header names/values in the constructor's documentation
Let's check if there are any existing header validation patterns in the codebase:
Consider adding validation like this:
void validate_headers(std::map<std::string, std::string> const& headers) { for (auto const& [name, value] : headers) { if (name.empty() || value.empty()) { throw std::invalid_argument("Header name and value cannot be empty"); } // Add more validation as needed } }Also applies to: 134-134
219-220: 🛠️ Refactor suggestionConsider enhancing error handling for header-related failures.
The current implementation might benefit from specific error handling for header-related failures during CURL operations. Consider:
- Adding specific error codes or exceptions for header-related failures
- Logging header-related errors separately for better debugging
Let's check existing error handling patterns:
Consider extending error handling like this:
try { CurlDownloadHandler curl_handler{/*...*/}; auto const ret_code{curl_handler.perform()}; if (CURLE_UNSUPPORTED_PROTOCOL == ret_code) { // Log specific header-related error logger.error("Failed to set custom headers: {}", m_reader.m_curl_error_msg_buf); } // ... rest of the code }components/core/src/clp/NetworkReader.hpp (2)
9-9: LGTM: Header inclusion is properly placedThe addition of
<map>header is necessary for the custom headers functionality and follows the file's include ordering convention.
248-252: LGTM: Constructor parameter formatting improves readabilityThe multi-line parameter formatting enhances code readability while maintaining consistent style.
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com> Co-authored-by: Kirk Rodrigues <2454684+kirkrodrigues@users.noreply.github.com> Co-authored-by: wraymo <raywangv@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
components/core/tests/test-NetworkReader.cpp (2)
197-202: Refactor magic number and string concatenationConsider these improvements:
- Define NR_HEADERS as a named constant
- Use string formatting or a more efficient string building method
- const int NR_HEADERS = 10; - for (int i = 0; i < NR_HEADERS; i++) { - std::string key = "Unit-Test-Key" + std::to_string(i); - std::string value = "Unit-Test-Value" + std::to_string(i); + static constexpr int kNumTestHeaders = 10; + for (int i = 0; i < kNumTestHeaders; i++) { + std::string key = std::format("Unit-Test-Key{}", i); + std::string value = std::format("Unit-Test-Value{}", i);
208-214: Consider using raw string literals for special charactersUsing raw string literals would improve readability when testing headers with special characters.
- custom_headers["A\nNewline"] = "xx"; - custom_headers["Legal-Name2"] = "control-char\x01"; + custom_headers[R"(A\nNewline)"] = "xx"; + custom_headers["Legal-Name2"] = R"(control-char\x01)";components/core/src/clp/NetworkReader.hpp (1)
Line range hint
98-109: Consider documenting security implications of custom headersWhile the implementation correctly uses const reference for the custom_headers parameter, consider adding documentation about:
- Security implications of storing authentication tokens in headers
- Best practices for handling sensitive header data
- Lifetime management of sensitive header content
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/src/clp/CurlDownloadHandler.hpp (3 hunks)
- components/core/src/clp/NetworkReader.hpp (5 hunks)
- components/core/tests/test-NetworkReader.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/CurlDownloadHandler.hpp
🔇 Additional comments (8)
components/core/src/clp/CurlDownloadHandler.cpp (3)
5-8: LGTM: Required headers added appropriatelyThe new includes are properly ordered and necessary for the custom headers implementation.
25-26: LGTM: Constructor parameter properly addedThe custom_headers parameter is appropriately added as a const reference, which is efficient for passing map data.
63-67: LGTM: Case-insensitive header filtering implementedThe implementation properly handles case-insensitive comparison of reserved headers.
components/core/tests/test-NetworkReader.cpp (4)
5-5: LGTM: Appropriate header inclusionThe addition of the map header is necessary and correctly placed with other STL includes.
195-197: Consider mocking external HTTP serviceThe test depends on httpbin.org, which could cause test failures if the service is unavailable. Consider:
- Using a local mock server for testing
- Adding retry logic or proper error handling
- Documenting the external dependency
203-207: LGTM: Good coverage of protected headersExcellent test coverage ensuring that critical headers (Range, Cache-Control, Pragma) cannot be overridden by custom headers.
233-241: LGTM: Comprehensive validation of header restrictionsGood coverage ensuring that:
- Protected headers are not included
- Invalid header names are rejected
- Headers with special characters are handled correctly
components/core/src/clp/NetworkReader.hpp (1)
Line range hint
248-267: Verify thread safety and memory handling of custom headersThe implementation looks good, but consider the following aspects:
- Thread Safety: The current implementation correctly handles thread safety by storing a copy of the headers map.
- Memory Management: Consider clearing sensitive header data (like authentication tokens) after the download is complete.
Let's verify the usage of these headers:
✅ Verification successful
Let me verify how the custom headers are used in the codebase, particularly in the NetworkReader class.
Custom headers implementation is secure and well-managed
The implementation correctly handles custom headers with proper security considerations:
- Headers are passed by const reference and stored as a copy in the thread for thread safety
- Reserved headers (Range, Cache-Control, Pragma) are protected from being overridden
- Headers are validated and sanitized in CurlDownloadHandler before use
- Headers are scoped to the download thread's lifetime and automatically cleaned up
No security or memory management issues found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for potential sensitive header handling in the codebase rg -i "authorization|bearer|token|secret|password" --type cppLength of output: 119815
Script:
#!/bin/bash # Search for custom_headers usage in NetworkReader and related files rg -A 5 -B 5 "custom_headers" --type cppLength of output: 15079
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp/CurlDownloadHandler.cpp (1)
26-26: Include the new parameter in the function documentationEnsure that the documentation for
CurlDownloadHandleris updated to reflect the addition of thecustom_headersparameter, so that users are aware of the new functionality.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/tests/test-NetworkReader.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/tests/test-NetworkReader.cpp
🔇 Additional comments (1)
components/core/src/clp/CurlDownloadHandler.cpp (1)
77-82:⚠️ Potential issueOptimize regex usage and update patterns to comply with RFC 7230
The regex patterns for header names and values are being compiled in each iteration of the loop. To improve performance, declare the regex objects as
static constoutside the loop to avoid recompilation. Additionally, update theheader_name_patternto comply with RFC 7230, which specifies that header field names must start with a letter and can contain letters, digits, and hyphens (-), but not underscores (_).Apply this diff to address the concerns:
+ // RFC 7230 compliant header name and value patterns + static const std::regex header_name_pattern("^[A-Za-z][A-Za-z0-9-]*$"); + static const std::regex header_value_pattern("^[\\x20-\\x7E]+$"); for (const auto& [key, value] : custom_headers) { // Convert to lowercase for case-insensitive comparison std::string lower_key = key; std::transform(lower_key.begin(), lower_key.end(), lower_key.begin(), ::tolower); if (reserved_headers.end() == reserved_headers.find(lower_key)) { - // Filter out illegal header names and header values by regex - // Can contain alphanumeric characters (A-Z, a-z, 0-9), hyphens (`-`), and underscores - // (`_`) - std::regex header_name_pattern("^[A-Za-z0-9_-]+$"); - // Must consist of printable ASCII characters (values between 0x20 and 0x7E) - std::regex header_value_pattern("^[\\x20-\\x7E]*$"); if (std::regex_match(key, header_name_pattern) && std::regex_match(value, header_value_pattern)) { m_http_headers.append(key + ": " + value); } } }
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/src/clp/NetworkReader.cpp (3 hunks)
- components/core/tests/test-NetworkReader.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- components/core/src/clp/NetworkReader.cpp
- components/core/tests/test-NetworkReader.cpp
🔇 Additional comments (4)
components/core/src/clp/CurlDownloadHandler.cpp (4)
5-8: Include directives updated appropriately for custom headers supportThe addition of
<map>and<regex>is necessary for handling custom headers and performing validations.
25-26: Constructor signature correctly extended withcustom_headersIncluding
std::map<std::string, std::string> const& custom_headersenhances the class's ability to accept and process custom headers.
63-67: Reserved headers defined appropriatelyDefining
kReservedHeadersas a local constant improves readability, and the reserved headers are correctly specified to prevent overriding critical HTTP headers.
71-71: Proper character conversion instd::tolowerCasting characters to
unsigned charin the lambda function ensures correct behaviour and avoids undefined behaviour with negativecharvalues.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp/CurlDownloadHandler.cpp (1)
76-81: Consider logging invalid headers for debugging.When headers are rejected due to invalid format, it would be helpful to log this information for debugging purposes.
Consider adding debug logging:
if (std::regex_match(key, header_name_pattern) && std::regex_match(value, header_value_pattern)) { m_http_headers.append(key + ": " + value); + } else { + // Log invalid header format if debug logging is enabled + CLOG(DEBUG, "network") << "Rejected invalid header: " << key; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/src/clp/CurlDownloadHandler.hpp (3 hunks)
- components/core/src/clp/NetworkReader.cpp (3 hunks)
- components/core/src/clp/NetworkReader.hpp (5 hunks)
- components/core/tests/test-NetworkReader.cpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- components/core/src/clp/CurlDownloadHandler.hpp
- components/core/src/clp/NetworkReader.cpp
- components/core/src/clp/NetworkReader.hpp
- components/core/tests/test-NetworkReader.cpp
🔇 Additional comments (2)
components/core/src/clp/CurlDownloadHandler.cpp (2)
5-5: LGTM! New includes are properly organized.The added includes for map, regex, and set are necessary for the custom headers implementation and follow the project's include ordering convention.
Also applies to: 7-8
25-26: LGTM! Constructor parameter addition is well-designed.The custom_headers parameter is appropriately added as a const reference, which is efficient for passing map containers.
| if (std::regex_match(key, header_name_pattern) | ||
| && std::regex_match(value, header_value_pattern)) | ||
| { | ||
| m_http_headers.append(key + ": " + value); |
There was a problem hiding this comment.
Add separator when appending multiple headers.
When appending multiple headers, ensure they are properly separated. The curl_slist_append function expects each header to be terminated correctly.
Apply this diff:
- m_http_headers.append(key + ": " + value);
+ m_http_headers.append(key + ": " + value + "\r\n");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| m_http_headers.append(key + ": " + value); | |
| m_http_headers.append(key + ": " + value + "\r\n"); |
LinZhihao-723
left a comment
There was a problem hiding this comment.
Commented on things we need to fix, and also with two questions for Kirk.
Let's fix these comments first :)
Also we should try to run clang-tidy before asking for the next round of review. We can talk offline to help you set up clang-tidy
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
components/core/tests/test-NetworkReader.cpp (3)
105-120: Consider enhancing error validation in test_illegal_headerThe function effectively tests illegal headers, but could be more robust by:
- Explicitly checking for specific error conditions rather than just empty content
- Adding validation for the HTTP response code
Consider this enhancement:
auto test_illegal_header(std::string const& header_name, std::string const& header_value) -> bool { std::unordered_map<std::string, std::string> illegal_custom_headers; illegal_custom_headers.emplace(header_name, header_value); clp::NetworkReader illegal_reader{ "https://httpbin.org/headers", 0, false, clp::CurlDownloadHandler::cDefaultOverallTimeout, clp::CurlDownloadHandler::cDefaultConnectionTimeout, clp::NetworkReader::cDefaultBufferPoolSize, clp::NetworkReader::cDefaultBufferSize, illegal_custom_headers }; - auto const content = get_content(illegal_reader); - return content.empty(); + try { + auto const content = get_content(illegal_reader); + return content.empty(); + } catch (const clp::NetworkReader::OperationFailed& e) { + // Expected failure for illegal headers + return true; + } }
224-241: Fix typo and maintain type consistency
- The variable name "reulgar_reader" contains a typo
- The loop counter should use size_t for consistency with the setup loop
Apply these changes:
- clp::NetworkReader reulgar_reader{ + clp::NetworkReader regular_reader{ "https://httpbin.org/headers", 0, false, clp::CurlDownloadHandler::cDefaultOverallTimeout, clp::CurlDownloadHandler::cDefaultConnectionTimeout, clp::NetworkReader::cDefaultBufferPoolSize, clp::NetworkReader::cDefaultBufferSize, regular_custom_headers }; - auto const content{nlohmann::json::parse(get_content(reulgar_reader))}; + auto const content{nlohmann::json::parse(get_content(regular_reader))}; auto const& headers{content.at(0).at("headers")}; - REQUIRE(assert_curl_error_code(CURLE_OK, reulgar_reader)); - for (int i = 0; i < cNumRegularTestHeaders; i++) { + REQUIRE(assert_curl_error_code(CURLE_OK, regular_reader)); + for (size_t i{0}; i < cNumRegularTestHeaders; i++) {
242-253: LGTM! Comprehensive invalid header testingThe test cases effectively cover:
- Reserved headers that shouldn't be overridden
- Invalid header names with various special characters
- Invalid header values with control characters
Consider grouping similar test cases using Catch2's SECTION for better organization:
SECTION("Reserved headers cannot be overridden") { REQUIRE(test_illegal_header("Range", "bytes=100-")); REQUIRE(test_illegal_header("Cache-Control", "no-cache")); REQUIRE(test_illegal_header("Pragma", "no-cache")); } SECTION("Invalid header names are rejected") { REQUIRE(test_illegal_header("A Space", "xx")); REQUIRE(test_illegal_header("A\nNewline", "xx")); REQUIRE(test_illegal_header("An@At", "xx")); REQUIRE(test_illegal_header("-Start-with-Non-Alphanumeric", "xx")); } SECTION("Invalid header values are rejected") { REQUIRE(test_illegal_header("Legal-Name1", "newline\n")); REQUIRE(test_illegal_header("Legal-Name2", "control-char\x01")); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/src/clp/CurlDownloadHandler.hpp (3 hunks)
- components/core/src/clp/NetworkReader.cpp (3 hunks)
- components/core/src/clp/NetworkReader.hpp (5 hunks)
- components/core/tests/test-NetworkReader.cpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- components/core/src/clp/CurlDownloadHandler.hpp
- components/core/src/clp/NetworkReader.hpp
🔇 Additional comments (6)
components/core/src/clp/CurlDownloadHandler.cpp (2)
Line range hint
3-27: LGTM: Constructor signature and includes are well-structuredThe new parameter follows the established patterns and the includes are properly organized.
99-105:⚠️ Potential issueImprove error message for reserved header usage
The error message should indicate which reserved header was attempted to be used.
Apply this diff:
- "curl_download_handler_init failed." + fmt::format("Cannot override reserved header: '{}'", key)Likely invalid or redundant comment.
components/core/tests/test-NetworkReader.cpp (1)
213-223: LGTM! Well-structured test setupThe test case effectively sets up custom headers using proper formatting and type-safe iterations. The use of httpbin.org as a test endpoint is a good choice for header validation.
components/core/src/clp/NetworkReader.cpp (3)
121-122: LGTM: Constructor parameter addition is well-designedThe new
custom_headersparameter is appropriately implemented as an optional const reference to an unordered map, which is memory efficient and maintains ABI compatibility.
134-135: LGTM: Thread initialization properly handles custom headersThe initialization correctly forwards the custom headers to the DownloaderThread while maintaining exception safety through std::make_unique.
220-221: Verify thread safety and lifetime of custom headersWhile the implementation looks correct, please ensure:
- The custom headers map's lifetime extends beyond the download operation
- Thread safety is maintained when accessing the headers
Consider storing a copy of the headers in the DownloaderThread to prevent potential lifetime issues. Example implementation:
- m_custom_headers + m_custom_headers_copy // Store a copy in DownloaderThreadRun this verification script:
✅ Verification successful
Based on the search results, I can now provide a final response:
Thread safety and lifetime concerns are properly addressed
The implementation is correct as is, because:
m_custom_headersis stored as a member variable inDownloaderThreadand initialized through the constructor, making it a thread-local copy- The headers are passed by const reference to
CurlDownloadHandler, ensuring thread safety- The lifetime is managed appropriately since each thread maintains its own copy of the headers
The code follows a proper thread-safe design pattern where:
- Headers are passed as
std::optional<std::unordered_map>through the constructor- Each thread instance maintains its own copy via
m_custom_headersmember- Access is read-only during download operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for potential thread safety issues with custom headers access rg -A 3 "custom_headers" --type cpp # Verify if there are any other places accessing the headers that might need synchronization ast-grep --pattern 'm_custom_headers'Length of output: 6862
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_ABORTED_BY_CALLBACK, | ||
| "curl_download_handler_init failed." | ||
| ); | ||
| } | ||
| } else { | ||
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_ABORTED_BY_CALLBACK, | ||
| "curl_download_handler_init failed." | ||
| ); |
There was a problem hiding this comment.
Review CURLE_ABORTED_BY_CALLBACK usage
CURLE_ABORTED_BY_CALLBACK might not be the most appropriate error code for header validation failures. Consider using CURLE_BAD_FUNCTION_ARGUMENT instead.
Apply this diff to both error throws:
- CURLE_ABORTED_BY_CALLBACK,
+ CURLE_BAD_FUNCTION_ARGUMENT,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_ABORTED_BY_CALLBACK, | |
| "curl_download_handler_init failed." | |
| ); | |
| } | |
| } else { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_ABORTED_BY_CALLBACK, | |
| "curl_download_handler_init failed." | |
| ); | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| "curl_download_handler_init failed." | |
| ); | |
| } | |
| } else { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| "curl_download_handler_init failed." | |
| ); |
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_ABORTED_BY_CALLBACK, | ||
| "curl_download_handler_init failed." | ||
| ); |
There was a problem hiding this comment.
Enhance error message for invalid header format
The current error message "curl_download_handler_init failed" doesn't provide enough context about what specifically failed. Consider including details about the invalid header.
Apply this diff:
- "curl_download_handler_init failed."
+ fmt::format("Invalid header format - name: '{}', value: '{}'", key, value)Committable suggestion was skipped due to low confidence.
| std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); | ||
| // Must consist of printable ASCII characters (values between 0x20 and 0x7E) | ||
| std::regex const header_value_pattern("^[\\x20-\\x7E]*$"); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Simplify header name validation pattern
The current pattern ^(?![!#$%&'*+.^_|~-])[!#$%&'*+.^_|~0-9a-zA-Z-]+$ is complex and might be hard to maintain. Consider simplifying it while maintaining RFC 7230 compliance.
Apply this diff:
- std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$");
+ // RFC 7230: token = 1*tchar
+ // tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
+ // "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
+ std::regex const header_name_pattern("^[a-zA-Z][!#$%&'*+.^_`|~0-9a-zA-Z-]*$");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); | |
| // Must consist of printable ASCII characters (values between 0x20 and 0x7E) | |
| std::regex const header_value_pattern("^[\\x20-\\x7E]*$"); | |
| // RFC 7230: token = 1*tchar | |
| // tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / | |
| // "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA | |
| std::regex const header_name_pattern("^[a-zA-Z][!#$%&'*+.^_`|~0-9a-zA-Z-]*$"); | |
| // Must consist of printable ASCII characters (values between 0x20 and 0x7E) | |
| std::regex const header_value_pattern("^[\\x20-\\x7E]*$"); |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (1)
components/core/tests/test-NetworkReader.cpp (1)
104-119: Add function documentationPlease add a documentation block explaining:
- Purpose of the function
- Parameters and their constraints
- Return value meaning
- Example usage
Add this documentation block:
+/** + * Tests if a header with potentially illegal name or value is properly rejected. + * @param header_name The header name to test + * @param header_value The header value to test + * @return true if the header was properly rejected (CURLE_BAD_FUNCTION_ARGUMENT and empty content) + */ auto test_illegal_header(std::string const& header_name, std::string const& header_value) -> bool {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
- components/core/tests/test-NetworkReader.cpp (3 hunks)
🔇 Additional comments (5)
components/core/src/clp/CurlDownloadHandler.cpp (3)
3-3: LGTM: Required includes added correctlyThe new includes are properly ordered and necessary for the implementation.
Also applies to: 7-7, 13-13
25-26: LGTM: Parameter signature is well-designedThe parameter type choice (const reference to optional unordered_map) is appropriate for passing custom headers efficiently.
67-69: 🛠️ Refactor suggestionSimplify header name pattern and add pattern documentation
The current regex pattern is complex and could be simplified while maintaining RFC 7230 compliance.
Apply this diff:
- std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); + // RFC 7230: token must start with a letter followed by allowed characters + std::regex const header_name_pattern("^[a-zA-Z][!#$%&'*+.^_`|~0-9a-zA-Z-]*$");Likely invalid or redundant comment.
components/core/tests/test-NetworkReader.cpp (2)
9-9: LGTM: Required includes added appropriatelyThe new includes support custom headers (unordered_map), string formatting (fmt), and JSON response parsing (nlohmann/json).
Also applies to: 14-15
212-254: LGTM: Comprehensive test coverageThe test case effectively covers:
- Regular custom headers functionality
- Protection against overriding reserved headers
- Validation of header names and values
- Error handling for illegal headers
Let's verify we're testing all reserved headers:
✅ Verification successful
Let me analyze the search results to verify the reserved headers testing coverage.
From the codebase search, I found that the test is already comprehensively covering all the reserved headers that are used in the NetworkReader implementation. Specifically:
The test checks the three reserved headers that are internally set by NetworkReader/CurlDownloadHandler:
- "Range" (for offset)
- "Cache-Control" (for disable_cache)
- "Pragma" (for disable_cache)
Looking at CurlDownloadHandler.cpp implementation, these are indeed the only headers that are reserved and automatically set:
if (0 != offset) { std::string const range{"Range: bytes=" + std::to_string(offset) + "-"}; m_http_headers.append(range); } if (disable_caching) { m_http_headers.append("Cache-Control: no-cache"); m_http_headers.append("Pragma: no-cache"); }The test coverage for reserved headers is complete and accurate
The test correctly verifies that these headers cannot be overridden through custom headers, which is the key security concern being tested here.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other reserved headers in the codebase that we might need to test rg -g '!*.{md,txt}' -i 'header|curl.*set.*opt' -A 3 -B 3Length of output: 91532
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_BAD_FUNCTION_ARGUMENT, | ||
| "curl_download_handler_init failed." | ||
| ); |
There was a problem hiding this comment.
Enhance error message for invalid header format
The current error message doesn't provide enough context about what specifically failed.
Apply this diff:
- "curl_download_handler_init failed."
+ fmt::format(
+ "Invalid header format - name: '{}', value: '{}'",
+ key,
+ value)Committable suggestion was skipped due to low confidence.
| if (std::regex_match(key, header_name_pattern) | ||
| && std::regex_match(value, header_value_pattern)) | ||
| { | ||
| m_http_headers.append(fmt::format("{}: {}", key, value)); |
There was a problem hiding this comment.
Add CRLF validation for header values
As per libcurl documentation, headers must not be CRLF-terminated as libcurl adds CRLF itself.
Apply this diff:
- m_http_headers.append(fmt::format("{}: {}", key, value));
+ // Ensure header value doesn't end with CRLF
+ if (value.ends_with("\r\n")) {
+ throw CurlOperationFailed(
+ ErrorCode_Failure,
+ __FILE__,
+ __LINE__,
+ CURLE_BAD_FUNCTION_ARGUMENT,
+ fmt::format(
+ "Header value must not end with CRLF - key: '{}'",
+ key));
+ }
+ m_http_headers.append(fmt::format("{}: {}", key, value));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| m_http_headers.append(fmt::format("{}: {}", key, value)); | |
| // Ensure header value doesn't end with CRLF | |
| if (value.ends_with("\r\n")) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "Header value must not end with CRLF - key: '{}'", | |
| key)); | |
| } | |
| m_http_headers.append(fmt::format("{}: {}", key, value)); |
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_BAD_FUNCTION_ARGUMENT, | ||
| "curl_download_handler_init failed." | ||
| ); |
There was a problem hiding this comment.
Improve error message for reserved header violation
The error message should indicate which reserved header was attempted to be set.
Apply this diff:
- "curl_download_handler_init failed."
+ fmt::format(
+ "Cannot override reserved header: '{}'",
+ key)Committable suggestion was skipped due to low confidence.
| for (int i = 0; i < cNumRegularTestHeaders; i++) { | ||
| REQUIRE(( | ||
| fmt::format("Unit-Test-Value{}", i) == headers.at(fmt::format("Unit-Test-Key{}", i)) | ||
| )); | ||
| } |
There was a problem hiding this comment.
Use consistent integer types
The loop counter uses int while the earlier loop uses size_t. For consistency with the previous loop and to follow best practices for container indices, use size_t.
- for (int i = 0; i < cNumRegularTestHeaders; i++) {
+ for (size_t i{0}; i < cNumRegularTestHeaders; i++) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (int i = 0; i < cNumRegularTestHeaders; i++) { | |
| REQUIRE(( | |
| fmt::format("Unit-Test-Value{}", i) == headers.at(fmt::format("Unit-Test-Key{}", i)) | |
| )); | |
| } | |
| for (size_t i{0}; i < cNumRegularTestHeaders; i++) { | |
| REQUIRE(( | |
| fmt::format("Unit-Test-Value{}", i) == headers.at(fmt::format("Unit-Test-Key{}", i)) | |
| )); | |
| } |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp/CurlDownloadHandler.cpp (1)
94-99: Enhance error messages for better debuggingThe current error messages could be more specific about the validation failure reason.
Consider these improvements:
- "curl_download_handler_init failed due to illegal header: {}: " - "{}.", + "Invalid header format - name: '{}', value: '{}' must match " + "RFC 7230 requirements",- "curl_download_handler_init failed due to reserved header is being " - "modified: {}: {}.", + "Cannot override reserved header '{}' with value '{}'",Also applies to: 108-113
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- components/core/src/clp/CurlDownloadHandler.cpp (3 hunks)
🔇 Additional comments (5)
components/core/src/clp/CurlDownloadHandler.cpp (5)
3-14: LGTM: Include statements and constructor signature changes are well-implementedThe new includes and constructor signature changes are appropriate for the custom headers feature. The use of std::optional for the custom_headers parameter allows for backward compatibility, and passing by const reference is efficient for the map type.
Also applies to: 24-27
64-65: LGTM: Reserved headers declaration is well-implementedThe use of constexpr array with string_view for reserved headers is efficient and compile-time evaluated.
87-87:⚠️ Potential issueAdd CRLF validation for header values
As per libcurl documentation, headers must not be CRLF-terminated since libcurl adds CRLF itself.
Add validation before appending the header:
- m_http_headers.append(fmt::format("{}: {}", key, value)); + if (value.ends_with("\r\n")) { + throw CurlOperationFailed( + ErrorCode_Failure, + __FILE__, + __LINE__, + CURLE_BAD_FUNCTION_ARGUMENT, + fmt::format("Header value must not be CRLF-terminated - key: '{}'", key)); + } + m_http_headers.append(fmt::format("{}: {}", key, value));Likely invalid or redundant comment.
67-69: 🛠️ Refactor suggestionSimplify the header name validation pattern
The current pattern is overly complex and could be simplified while maintaining RFC 7230 compliance.
Consider this simpler pattern:
- std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); + // RFC 7230: token must start with a letter followed by allowed characters + std::regex const header_name_pattern("^[a-zA-Z][!#$%&'*+.^_`|~0-9a-zA-Z-]*$");Likely invalid or redundant comment.
80-82: 🛠️ Refactor suggestionSimplify the reserved header check
The current implementation using std::find could be simplified using a range-based for loop.
Consider this more readable approach:
- if (cReservedHeaders.end() - == std::find(cReservedHeaders.begin(), cReservedHeaders.end(), lower_key)) + bool is_reserved = false; + for (auto const reserved : cReservedHeaders) { + if (lower_key == reserved) { + is_reserved = true; + break; + } + } + if (!is_reserved)Likely invalid or redundant comment.
…s-14 (ARM) runners. (#566)
…owee/clp into xiaochong-add-user-customized-header
| size_t buffer_pool_size, | ||
| size_t buffer_size | ||
| size_t buffer_size, | ||
| std::optional<std::unordered_map<std::string, std::string>> http_header_kv_pairs |
There was a problem hiding this comment.
Missing includes for std::string and std::unordered_map. Can u check if clang-tidy alerts on these warnings?
| constexpr std::array<std::string_view, 3> cReservedHeaders | ||
| = {"range", "cache-control", "pragma"}; | ||
| // RFC 7230 token pattern: one or more tchars | ||
| std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); |
| // RFC 7230 token pattern: one or more tchars | ||
| std::regex const header_name_pattern("^(?![!#$%&'*+.^_`|~-])[!#$%&'*+.^_`|~0-9a-zA-Z-]+$"); | ||
| // Must consist of printable ASCII characters (values between 0x20 and 0x7E) | ||
| std::regex const header_value_pattern("^[\\x20-\\x7E]*$"); |
| } | ||
| if (http_header_kv_pairs.has_value()) { | ||
| constexpr std::array<std::string_view, 3> cReservedHeaders | ||
| = {"range", "cache-control", "pragma"}; |
There was a problem hiding this comment.
Do we need to store and match the keys in their lower case?
There was a problem hiding this comment.
I think so? Because header names are case-insensitive
There was a problem hiding this comment.
Sorry my bad. You're right, I didn't notice that the header field (key) is case-insensitive defined by HTTP protocol.
Then how about:
- Let's keep the current way of detecting reserved keys, transforming the key into lower-case
- Let's define three static constexpr string_view variables for reserved keys in lower-case form. Use these variables to replace these hardcoded lines: , ,
- Let's switch back to a
unordered_set<std::string_view>for lower-case reserved keys, and construct this set in every constructor call. For lower-cased keys transformed in step 1, let's callcontainsto ensure they don't appear in the unordered set.
Benefits:
- Let's be consistent of using lower-cases keys
- The unordered_map search will be faster then array search, especially when user don't given any reserved keys
- The overhead of constructing
unordered_setwithstring_viewin runtime should be acceptable
| ) | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Instead of if (valid) {do_sth;} else {error_handling;}, we prefer to use if (not valid) {error_handling;} do_sth;.
This reduce the level of indentations and thus make the code much cleaner.
I think this function can be rewritten in this suggested way
| // The following headers are determined by offset and disable_cache, which should not be | ||
| // overrided by custom headers. | ||
| REQUIRE(test_illegal_header("Range", "bytes=100-")); | ||
| REQUIRE(test_illegal_header("Cache-Control", "no-cache")); | ||
| REQUIRE(test_illegal_header("Pragma", "no-cache")); | ||
| // The following headers contain illegal header values, the requests should be rejected. | ||
| REQUIRE(test_illegal_header("Legal-Name1", "CRLF\n")); | ||
| REQUIRE(test_illegal_header("Legal-Name2", "CRLF\r")); | ||
| REQUIRE(test_illegal_header("Legal-Name3", "CRLF\n\r")); | ||
| REQUIRE(test_illegal_header("Legal-Name4", "CRLF\r\n")); |
There was a problem hiding this comment.
as said above, let's move this into a separate test-case with GENERATE to replace test_illegal_header
| auto content{nlohmann::json::parse(get_content(regular_reader))}; | ||
| if (content.is_array() && 1 == content.size()) { | ||
| content = content.at(0); | ||
| } |
There was a problem hiding this comment.
| auto content{nlohmann::json::parse(get_content(regular_reader))}; | |
| if (content.is_array() && 1 == content.size()) { | |
| content = content.at(0); | |
| } | |
| auto const content =nlohmann::json::parse(get_content(regular_reader)); |
Have you tried this?
| } | ||
| auto const& headers{content.at("headers")}; | ||
| REQUIRE(assert_curl_error_code(CURLE_OK, regular_reader)); | ||
| for (int i = 0; i < cNumRegularTestHeaders; i++) { |
There was a problem hiding this comment.
If applying the comment above, this should be replaced.
But as I commented in the last round of review, we shouldn't use int unless the number can be negative; we should use {} for typed variable initialization; and we should use ++i in general loop variable increment
| // We use httpbin (https://httpbin.org/) to test the custom headers. This request will return a | ||
| // JSON object that contains the custom headers. We check if the headers are in the response. |
There was a problem hiding this comment.
| // We use httpbin (https://httpbin.org/) to test the custom headers. This request will return a | |
| // JSON object that contains the custom headers. We check if the headers are in the response. | |
| // We use httpbin (https://httpbin.org/) to test user-specified headers. | |
| // On success, it is supposed to respond all the user-specified headers as key-value pairs in | |
| // JSON form. |
| constexpr int cNumRegularTestHeaders{10}; | ||
| for (size_t i{0}; i < cNumRegularTestHeaders; i++) { |
There was a problem hiding this comment.
| constexpr int cNumRegularTestHeaders{10}; | |
| for (size_t i{0}; i < cNumRegularTestHeaders; i++) { | |
| constexpr size_t cNumRegularTestHeaders{10}; | |
| for (size_t i{0}; i < cNumRegularTestHeaders; ++i) { |
Please ensure all the comments are applied from previous reviews
#564 (comment)
| REQUIRE((clp::ErrorCode_Failure == reader.try_get_pos(pos))); | ||
| } | ||
|
|
||
| TEST_CASE("network_reader_with_http_header_kv_pairs", "[NetworkReader]") { |
There was a problem hiding this comment.
| TEST_CASE("network_reader_with_http_header_kv_pairs", "[NetworkReader]") { | |
| // NOLINTNEXTLINE(readability-function-cognitive-complexity) | |
| TEST_CASE("network_reader_with_http_header_kv_pairs", "[NetworkReader]") { |
We can safely silent this warning since it's mostly caused by catch2 macros
| std::unordered_map<std::string, std::string> regular_http_header_kv_pairs; | ||
| // We use httpbin (https://httpbin.org/) to test the custom headers. This request will return a | ||
| // JSON object that contains the custom headers. We check if the headers are in the response. | ||
| constexpr int cNumRegularTestHeaders{10}; |
There was a problem hiding this comment.
How about: cNumHttpHeaderKeyValuePairs
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Outside diff range and nitpick comments (22)
.github/workflows/clp-core-build-macos.yaml (1)
41-43: Excellent implementation of the matrix strategy, eh!The matrix configuration properly supports both macOS 13 and 14, which improves the testing coverage. Consider adding a comment explaining the rationale for these specific versions for future maintenance.
Add a comment above the matrix:
strategy: matrix: + # Testing on the two latest macOS versions to ensure compatibility runner: ["macos-13", "macos-14"]components/core/src/clp_s/ColumnWriter.cpp (1)
77-77: Remove unnecessary semicolon, ehThere's a stray semicolon that should be removed.
- ;components/core/src/clp/CurlDownloadHandler.cpp (1)
3-3: Remove unused includes.The includes for
<array>and<regex>appear to be unused in the current implementation.-#include <array> #include <chrono> #include <cstddef> #include <memory> -#include <regex> #include <string>Also applies to: 7-7
components/core/src/clp_s/ArchiveWriter.hpp (3)
24-24: Add documentation for the new option parameter.Please add documentation to clarify the purpose, units, and any constraints for
min_table_size. This will help users understand how to properly configure this option.struct ArchiveWriterOption { boost::uuids::uuid id; std::string archives_dir; int compression_level; bool print_archive_stats; + /// Minimum size (in bytes) for packed tables before compression + /// @note Must be greater than 0 size_t min_table_size; };
36-43: Add documentation for the StreamMetadata struct.The struct would benefit from documentation explaining its purpose and the significance of each member.
+ /** + * Metadata for tracking stream location and size within the archive + */ struct StreamMetadata { StreamMetadata(uint64_t file_offset, uint64_t uncompressed_size) : file_offset(file_offset), uncompressed_size(uncompressed_size) {} + /// Offset of the stream within the archive file uint64_t file_offset{}; + /// Size of the stream before compression uint64_t uncompressed_size{}; };
190-190: Add documentation and consider adding a getter method.The member variable would benefit from documentation. Also, consider adding a getter method for consistency with other size-related members like
get_data_size().+ /// Minimum size threshold for packed tables before compression size_t m_min_table_size{}; + /** + * @return Minimum size threshold for packed tables + */ + [[nodiscard]] size_t get_min_table_size() const { return m_min_table_size; }components/core/src/clp_s/ArchiveReader.hpp (2)
183-193: Consider enhancing the documentation with thread safety notesThe method documentation is comprehensive regarding functionality and buffer reuse. However, since buffer reuse is involved, it would be helpful to document any thread safety considerations.
Consider adding a note about thread safety, for example:
* save memory allocations, but can only be used when tables are read one at a time. + * Note: Buffer reuse is not thread-safe. Ensure proper synchronization when using this feature + * in a multi-threaded context. * @param stream_id
214-216: Remove redundant initializationThe explicit initialization of
m_stream_buffer_sizeto0ULLis redundant as this is the default value for size_t class members.std::shared_ptr<char[]> m_stream_buffer{}; - size_t m_stream_buffer_size{0ULL}; + size_t m_stream_buffer_size{}; size_t m_cur_stream_id{0ULL};components/core/src/clp_s/CommandLineArguments.hpp (1)
180-180: LGTM! Consider adding more detailed documentation.The member variable is well-placed and properly initialized. The comment indicating "1 MB" is helpful.
Consider adding a brief comment explaining why 1 MB was chosen as the default minimum table size, perhaps in a doxygen-style format like other members in this class.
components/core/tests/test-NetworkReader.cpp (2)
196-223: LGTM: Well-structured test for regular headersThe test case effectively validates the custom headers functionality using httpbin.org's echo service. The implementation includes proper error checking and validation of the response.
Consider adding a brief comment explaining the expected format of httpbin.org's response for better maintainability:
// We use httpbin (https://httpbin.org/) to test the user-specified headers. On success, it is - // supposed to respond all the user-specified headers as key-value pairs in JSON form. + // supposed to respond with a JSON object containing a "headers" field that echoes back all + // request headers as key-value pairs. Example response: + // { + // "headers": { + // "Unit-Test-Key0": "Unit-Test-Value0", + // "Unit-Test-Key1": "Unit-Test-Value1" + // } + // }
225-254: Clean up commented code and improve test readabilityThe test case effectively validates illegal headers using parameterized testing. However, there are a few improvements that could be made:
- Remove the commented out code on line 236
- Consider extracting the lambda to improve readability
Here's a suggested refactor:
TEST_CASE("network_reader_with_illegal_http_header_kv_pairs", "[NetworkReader]") { auto [header_name, header_value] = GENERATE( // The following headers are determined by offset and disable_cache, which should not be // overriden by custom headers. std::make_pair("Range", "bytes=100-"), std::make_pair("RAnGe", "bytes=100-"), std::make_pair("Cache-Control", "no-cache"), std::make_pair("Pragma", "no-cache"), // The CRLF-terminated headers should be rejected. std::make_pair("Legal-Name", "CRLF\r\n") ); - // REQUIRE(test_illegal_header(header_name, header_value)); + auto test_illegal_header = [](std::string const& name, std::string const& value) { + std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs; + illegal_http_header_kv_pairs.emplace(name, value); + clp::NetworkReader illegal_reader{ + "https://httpbin.org/headers", + 0, + false, + clp::CurlDownloadHandler::cDefaultOverallTimeout, + clp::CurlDownloadHandler::cDefaultConnectionTimeout, + clp::NetworkReader::cDefaultBufferPoolSize, + clp::NetworkReader::cDefaultBufferSize, + illegal_http_header_kv_pairs + }; + auto const content = get_content(illegal_reader); + return assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader) + && content.empty(); + }; - REQUIRE([header_name, header_value]() { - std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs; - illegal_http_header_kv_pairs.emplace(header_name, header_value); - clp::NetworkReader illegal_reader{ - "https://httpbin.org/headers", - 0, - false, - clp::CurlDownloadHandler::cDefaultOverallTimeout, - clp::CurlDownloadHandler::cDefaultConnectionTimeout, - clp::NetworkReader::cDefaultBufferPoolSize, - clp::NetworkReader::cDefaultBufferSize, - illegal_http_header_kv_pairs - }; - auto const content = get_content(illegal_reader); - return assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader) - && content.empty(); - }()); + REQUIRE(test_illegal_header(header_name, header_value)); }components/core/src/clp_s/CommandLineArguments.cpp (1)
163-167: Consider enhancing the help text and adding value validation.The implementation looks good, but could benefit from these improvements:
- The help text could be more specific about what constitutes a "packed table" and how this option relates to other size parameters.
- Consider adding validation to ensure the value is within reasonable bounds.
Here's a suggested improvement:
"min-table-size", po::value<size_t>(&m_minimum_table_size)->value_name("MIN_TABLE_SIZE")-> default_value(m_minimum_table_size), - "Minimum size (B) for a packed table before it gets compressed." + "Minimum size (B) for a packed table before it gets compressed. This affects the compression efficiency by controlling the minimum amount of data that must be accumulated before applying compression."Also, consider adding validation after the options are parsed:
if (m_minimum_table_size < 1024) { // Example threshold throw std::invalid_argument("min-table-size must be at least 1KB"); }components/core/src/clp_s/PackedStreamReader.hpp (2)
75-77: Ensurestream_idis within bounds inget_uncompressed_stream_sizeThe method
get_uncompressed_stream_sizeusesm_stream_metadata.at(stream_id), which will throw an exception ifstream_idis out of bounds. Consider adding a clearer error message or explicit bounds checking to improve usability and error handling.
13-18: Enhance class documentation for clarityThe class description can be rephrased for better readability and to clearly convey the usage constraints.
Suggested rephrased documentation:
- * PackedStreamReader ensures that the tables section of an archive is read safely. Any attempt to - * read the tables section without loading the tables metadata, and any attempt to read tables - * section out of order will throw. As well, any incorrect usage of this class (e.g. closing without - * opening) will throw. + * PackedStreamReader ensures the safe reading of the tables section in an archive. Any attempt to + * read the tables section without first loading the tables metadata, or attempts to read the tables + * section out of order, will result in an exception. Additionally, incorrect usage of this class + * (e.g., closing without opening) will throw an exception.components/core/src/clp_s/PackedStreamReader.cpp (5)
87-89: Question the necessity of enforcing strictstream_idordering.The current implementation throws an error if
stream_idis not strictly greater thanm_prev_stream_id, which enforces sequential reading of streams in increasing order. Unless there is a specific requirement for this constraint, consider allowing random access to streams by removing this check. This would make theread_streamfunction more flexible for users who may need to read streams in any order.
5-41: Add unit tests forPackedStreamReader::read_metadata.To ensure the correct functionality and robust error handling of the
read_metadatamethod, please consider adding unit tests. These tests should cover various scenarios, including successful metadata reading, handling of corrupted data, and state transitions.
43-55: Add unit tests forPackedStreamReader::open_packed_streams.Adding unit tests for the
open_packed_streamsmethod will help verify that the class correctly handles file opening, state transitions, and error conditions such as attempting to open non-existent files.
57-70: Add unit tests forPackedStreamReader::close.Unit tests for the
closemethod will ensure that resources are properly released, the internal state is reset correctly, and that the method behaves as expected when called from different states.
72-116: Add unit tests forPackedStreamReader::read_streamto ensure correctness.To validate the functionality and error handling of the
read_streammethod, consider adding unit tests that cover:
- Reading streams in order and out of order (if allowed after addressing the sequential reading constraint).
- Handling invalid
stream_idvalues (e.g., out of bounds).- Scenarios where the buffer size is smaller than
uncompressed_size, testing buffer reallocation.- Error handling when seeking fails or when reading decompressed data fails.
components/core/src/clp_s/ArchiveWriter.cpp (1)
145-179: Standardize terminology and correct typographical errors in comments.To enhance clarity and maintain consistency in the documentation, consider the following improvements:
- Use "64-bit stream ID" instead of "64 bit stream id" for proper hyphenation and capitalization.
- Replace "id" with "ID" consistently throughout the comments.
- In line 150~, rephrase for clarity: "In the first half of the metadata, we identify the number of streams and the offset into the file where each compression stream can be found."
components/core/src/clp_s/ArchiveReader.cpp (2)
51-53: Provide a more descriptive error message for unsupported schemas.Throwing
ErrorCode::ErrorCodeUnsupportedwhennum_separate_column_schemasis non-zero might not give enough context to the user. Consider including additional information in the error message to clarify why this operation is unsupported.
Line range hint
132-151: Handle potential null pointers fromread_stream.The
read_schema_tablemethod retrievesstream_bufferfromread_streambut does not check if it'snullptrbefore use. This could lead to dereferencing a null pointer ifread_streamfails. Include a check to ensurestream_bufferis valid.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (28)
.github/workflows/clp-core-build-macos.yaml(4 hunks).github/workflows/clp-core-build.yaml(2 hunks)components/core/src/clp/CurlDownloadHandler.cpp(3 hunks)components/core/src/clp/NetworkReader.cpp(3 hunks)components/core/src/clp/NetworkReader.hpp(5 hunks)components/core/src/clp_s/ArchiveReader.cpp(7 hunks)components/core/src/clp_s/ArchiveReader.hpp(4 hunks)components/core/src/clp_s/ArchiveWriter.cpp(3 hunks)components/core/src/clp_s/ArchiveWriter.hpp(3 hunks)components/core/src/clp_s/CMakeLists.txt(1 hunks)components/core/src/clp_s/ColumnWriter.cpp(2 hunks)components/core/src/clp_s/ColumnWriter.hpp(7 hunks)components/core/src/clp_s/CommandLineArguments.cpp(1 hunks)components/core/src/clp_s/CommandLineArguments.hpp(2 hunks)components/core/src/clp_s/JsonParser.cpp(1 hunks)components/core/src/clp_s/JsonParser.hpp(1 hunks)components/core/src/clp_s/PackedStreamReader.cpp(1 hunks)components/core/src/clp_s/PackedStreamReader.hpp(1 hunks)components/core/src/clp_s/SchemaReader.cpp(1 hunks)components/core/src/clp_s/SchemaReader.hpp(4 hunks)components/core/src/clp_s/SchemaWriter.cpp(1 hunks)components/core/src/clp_s/SchemaWriter.hpp(1 hunks)components/core/src/clp_s/clp-s.cpp(1 hunks)components/core/src/clp_s/search/Output.cpp(1 hunks)components/core/tests/test-NetworkReader.cpp(2 hunks)deps-tasks.yml(1 hunks)docs/src/dev-guide/components-core/index.md(2 hunks)docs/src/dev-guide/components-core/macos-deps-install.md(2 hunks)
✅ Files skipped from review due to trivial changes (3)
- .github/workflows/clp-core-build.yaml
- docs/src/dev-guide/components-core/index.md
- docs/src/dev-guide/components-core/macos-deps-install.md
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/NetworkReader.hpp
🧰 Additional context used
🪛 cppcheck
components/core/src/clp_s/ArchiveWriter.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments (37)
components/core/src/clp_s/SchemaWriter.cpp (4)
7-7: Size tracking addition looks good, eh!
The addition of header size to total uncompressed size is correctly placed before storing the column writer.
12-13: Good use of uniform initialization, eh!
The switch to uniform initialization with curly braces is more consistent with modern C++ practices and prevents narrowing conversions.
15-16: Size tracking logic is now more straightforward, eh!
The simplified approach of directly adding return values from add_value() improves code readability. The total size is properly accumulated for both content types and correctly updates m_total_uncompressed_size.
Also applies to: 20-21, 25-25
29-32: Method signature change aligns with refactoring, eh!
The change from size_t to void return type is consistent with the broader refactoring effort seen in related classes.
Let's verify the consistency of this change across related classes:
✅ Verification successful
The previous ast-grep pattern might have failed due to syntax. Let's try a broader search to find store method implementations and their return types across the codebase.
Return type change is consistent across the codebase, eh!
The search results confirm that all store methods using ZstdCompressor are consistently declared with void return type across the codebase:
- Base class
ColumnWriterdeclares virtualvoid store(ZstdCompressor&) - All derived column writers (
Int64ColumnWriter,FloatColumnWriter, etc.) implementvoid store(ZstdCompressor&) SchemaWriteralso usesvoid store(ZstdCompressor&)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for store method signatures in related classes
ast-grep --pattern 'void store(ZstdCompressor& $_) {
$$$
}'
Length of output: 63
Script:
#!/bin/bash
# Search for store method implementations with ZstdCompressor parameter
rg -U "store.*ZstdCompressor" -A 3
# Also search for any return type declarations with store and ZstdCompressor
rg -U "(\w+)\s+store.*ZstdCompressor.*\{" --multiline
Length of output: 5139
components/core/src/clp_s/SchemaWriter.hpp (2)
49-51: LGTM: Well-structured size tracking implementation
The addition of get_total_uncompressed_size() method and its corresponding member variable is well-implemented:
- Clear documentation explaining the purpose
- Proper initialization of the member variable
- Consistent type usage between getter and member
Also applies to: 55-55
44-44: Verify impact of removing size_t return value from store()
The change from size_t to void return type is a breaking change. Please ensure all callers of this method have been updated accordingly.
Let's verify the usage:
✅ Verification successful
Let me search for more context about the usage of this method.
Let me check if there are any direct calls to this method from other files.
Let me fix the search pattern and try again.
The store() method's return type change is safe to proceed
Based on the codebase analysis, the change from size_t to void return type is safe because:
- The only direct calls to
SchemaWriter::store()are inSchemaWriter.cppitself, where the return value isn't used - The method is called through
ArchiveWriter::store_tables(), but the actual size tracking is now handled by the newm_total_uncompressed_sizemember variable and its getter - All other
store()calls in the codebase are unrelated to this class (they belong to different components like NetworkReader)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for calls to SchemaWriter::store to verify they don't depend on the return value
rg -A 2 "SchemaWriter.*store\(" --type cpp
Length of output: 300
Script:
#!/bin/bash
# Search for any references to SchemaWriter and store
rg -B 2 -A 2 "SchemaWriter.*store" --type cpp
# Look for the implementation of store method
ast-grep --pattern 'void SchemaWriter::store($$$) { $$$ }'
Length of output: 453
Script:
#!/bin/bash
# Search for any direct usage of SchemaWriter's store method
rg -B 3 -A 3 "->store\(|\.store\(" --type cpp
# Also check if SchemaWriter is used in any test files
fd -e cpp -e hpp test | xargs rg "SchemaWriter"
Length of output: 500
Script:
#!/bin/bash
# Search for store method calls with proper escaping
rg -B 3 -A 3 "\->store\(|\\.store\(" --type cpp
# Look for any references to SchemaWriter in the entire codebase
rg -B 2 -A 2 "SchemaWriter" --type cpp
Length of output: 9849
.github/workflows/clp-core-build-macos.yaml (2)
11-11: Verify the directory structure changes for macOS scripts
The path changes from macos-12 to macos suggest a restructuring of the installation scripts directory. Let's verify that the corresponding directory changes have been made.
Also applies to: 24-24
59-59: Verify the installation script location and contents
The installation script path has been updated to use the new directory structure. Let's verify the script's existence and contents.
✅ Verification successful
Let me gather more information about the script's contents to ensure it has the expected functionality.
Installation script verified and properly located
The macOS installation script exists at the specified path and contains the expected package installation commands using Homebrew. The script:
- Is executable and properly located in the new directory structure
- Contains essential setup commands for installing required dependencies
- Follows the same pattern as other platform-specific installation scripts
- Includes proper error handling with
set -eandset -u
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the installation script exists and contains expected content
echo "Checking installation script..."
SCRIPT_PATH="components/core/tools/scripts/lib_install/macos/install-all.sh"
# Check if the script exists and is executable
if [ ! -x "$SCRIPT_PATH" ]; then
echo "Error: Installation script not found or not executable"
exit 1
fi
# Check if the script contains essential setup commands
rg -l "brew" "$SCRIPT_PATH"
Length of output: 368
Script:
#!/bin/bash
# Check the contents of the installation script
echo "=== Installation Script Contents ==="
cat components/core/tools/scripts/lib_install/macos/install-all.sh
# Check if there are any other related installation scripts
echo -e "\n=== Related Installation Scripts ==="
fd install-all.sh
Length of output: 840
components/core/src/clp_s/JsonParser.hpp (1)
35-35: Verify usage and add documentation for min_table_size.
The new min_table_size member lacks documentation explaining its purpose and constraints. Additionally, let's verify its usage across the codebase.
Consider adding documentation to explain:
- The purpose of this configuration option
- Any constraints on valid values
- The default value if not specified
✅ Verification successful
Let me gather more information about the usage and potential documentation by searching for command line arguments and related configuration.
Add documentation for min_table_size member
Based on the command line argument documentation and usage found in the codebase, this member represents the minimum size in bytes for a packed table before it gets compressed. The default value is 1 MB (1024*1024 bytes).
Add the following documentation above the member:
/// Minimum size in bytes for a packed table before compression.
/// Default: 1 MB (1024*1024 bytes)
size_t min_table_size;🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for usage of min_table_size to understand its purpose and verify documentation
# Test 1: Find all references to min_table_size
echo "Searching for min_table_size usage..."
rg "min_table_size" -A 3 -B 3
# Test 2: Look for related documentation or comments
echo "Searching for related documentation..."
rg -i "table.?size" --type cpp --type hpp -A 3 -B 3
Length of output: 4521
Script:
#!/bin/bash
# Look for command line argument handling and documentation
rg "get_minimum_table_size|minimum.?table.?size" -A 5 -B 5
# Look for any related configuration or default values
rg "minimum.?table|min.?table" -A 5 -B 5 --type cpp --type hpp
# Search for any related comments or documentation in header files
fd -e hpp -e h | xargs grep -A 5 -B 5 -i "table.*size"
Length of output: 17203
components/core/src/clp_s/ColumnWriter.cpp (6)
4-6: Changes to Int64ColumnWriter look good, eh!
The size calculation is accurate for int64_t values, and the memory handling in the store method is safe.
Also applies to: 9-12
14-16: FloatColumnWriter changes are consistent with the new pattern
The size calculation and storage implementation align well with the overall design.
Also applies to: 19-22
24-26: BooleanColumnWriter changes maintain type safety and efficiency
The implementation properly handles boolean to uint8_t conversion and storage.
Also applies to: 29-32
Line range hint 34-47: ClpStringColumnWriter changes handle dictionary encoding correctly
The size calculation properly accounts for both the dictionary ID and encoded variable sizes.
Also applies to: 50-57
59-64: VariableStringColumnWriter changes are straightforward and correct
The size calculation accurately reflects the dictionary ID storage requirements.
Also applies to: 67-70
72-76: DateStringColumnWriter changes properly handle timestamp pairs
The implementation correctly manages both timestamp and encoding components, with accurate size calculations.
Also applies to: 80-85
components/core/src/clp/CurlDownloadHandler.cpp (2)
26-27: LGTM!
The new parameter is appropriately placed and typed as an optional const reference.
64-68: LGTM!
The header formatting is clean and consistently uses the defined constants.
components/core/src/clp_s/ColumnWriter.hpp (3)
30-32: Method signature changes improve API clarity
The changes to add_value and store methods enhance the interface by:
- Returning the size of encoded data directly from
add_value - Simplifying the
storemethod by removing unnecessary size return
Also applies to: 38-39
40-47: Well-documented header size calculation method
The new get_total_header_size method is well-documented and provides a clear contract for derived classes. The documentation effectively explains how this value relates to the total size calculation.
62-62: Verify header size implementation requirements
While the method signature updates are consistent across all derived classes, only ClpStringColumnWriter overrides get_total_header_size. Please verify if other column writers should also override this method based on their header data requirements.
Also applies to: 64-64, 79-79, 81-81, 96-96, 98-98, 120-124, 176-176, 178-178, 194-194, 196-196
✅ Verification successful
Let me gather more information about the column writer implementations to verify if they need header size overrides.
Let me try a different approach to find column writer implementations.
Let me try one more search pattern to find column writer implementations.
Only ClpStringColumnWriter requires header size override
Based on the code analysis, only ClpStringColumnWriter needs to override get_total_header_size() because it maintains additional header data for log dictionary entries. Other column writers (Int64ColumnWriter, FloatColumnWriter, BooleanColumnWriter, VariableStringColumnWriter, and DateStringColumnWriter) store simple arrays of values without any header metadata, correctly inheriting the base implementation that returns 0.
The implementation is consistent with the design where:
ClpStringColumnWriterneeds header space for dictionary metadata (sizeof(size_t))- Other column writers don't require header space as they only store raw values
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for header-related data members or operations in column writer implementations
rg -A 5 "header|Header" components/core/src/clp_s/
Length of output: 5601
Script:
#!/bin/bash
# Search for all column writer class definitions and their implementations
ast-grep --pattern 'class $_ColumnWriter : public ColumnWriter {
$$$
}'
Length of output: 75
Script:
#!/bin/bash
# Search for column writer class definitions using ripgrep
rg -A 10 "class \w+ColumnWriter\s*:\s*public\s*ColumnWriter" components/core/src/clp_s/
Length of output: 89
Script:
#!/bin/bash
# Search for column writer class definitions in the ColumnWriter.hpp file
cat components/core/src/clp_s/ColumnWriter.hpp
Length of output: 5705
components/core/src/clp_s/ArchiveWriter.hpp (1)
45-61: Add documentation and consider input validation.
The struct requires documentation to explain the relationships between streams, schemas, and messages. Additionally, consider adding validation for schema_id.
+ /**
+ * Metadata for tracking schema information within a stream
+ */
struct SchemaMetadata {
SchemaMetadata(
uint64_t stream_id,
uint64_t stream_offset,
int32_t schema_id,
uint64_t num_messages
)
: stream_id(stream_id),
stream_offset(stream_offset),
schema_id(schema_id),
num_messages(num_messages) {
+ // Validate schema_id
+ if (schema_id < 0) {
+ throw std::invalid_argument("schema_id must be non-negative");
+ }
}
+ /// Unique identifier for the stream
uint64_t stream_id{};
+ /// Offset within the stream where this schema begins
uint64_t stream_offset{};
+ /// Identifier for the schema definition
int32_t schema_id{};
+ /// Number of messages using this schema in the stream
uint64_t num_messages{};
};Let's verify the schema_id validation requirement:
components/core/src/clp_s/CMakeLists.txt (1)
68-69: LGTM: PackedStreamReader integration
The addition of PackedStreamReader.cpp and PackedStreamReader.hpp to CLP_S_SOURCES is properly integrated into the build system.
components/core/src/clp_s/ArchiveReader.hpp (3)
13-13: LGTM: Include statement properly placed
The include statement for PackedStreamReader is correctly placed following the include ordering convention.
205-205: LGTM: Consistent naming convention
The rename from m_id_to_table_metadata to m_id_to_schema_metadata maintains consistency with the schema-focused naming convention throughout the class.
96-100: Verify the impact of method rename
The rename from read_table to read_schema_table better reflects the method's purpose, but we should verify all call sites have been updated.
✅ Verification successful
Method rename has been properly implemented
The search results confirm that:
- No instances of the old method name
read_tableremain in the codebase - The new method name
read_schema_tableis consistently used across all references:- Method declaration in
ArchiveReader.hpp - Method implementation in
ArchiveReader.cpp - Method calls in
Output.cppand internally inArchiveReader.cpp
- Method declaration in
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining calls to the old method name
rg "->read_table\(" --type cpp
# Search for calls to the new method name to confirm updates
rg "->read_schema_table\(" --type cpp
Length of output: 882
Script:
#!/bin/bash
# Search for any remaining calls to the old method name, escaping the arrow operator
rg "read_table\(" --type cpp -A 2
# Search for calls to the new method name to confirm updates
rg "read_schema_table\(" --type cpp -A 2
Length of output: 1179
components/core/src/clp_s/CommandLineArguments.hpp (1)
111-112: LGTM! The getter method follows the class conventions.
The method signature and implementation are consistent with other getter methods in the class.
components/core/tests/test-NetworkReader.cpp (1)
9-16: LGTM: New includes are properly organized
The new includes are correctly placed in the standard library section and are necessary for the added functionality.
components/core/src/clp/NetworkReader.cpp (3)
134-139: LGTM: Correct initialization of DownloaderThread
The initialization sequence and parameter passing are logically correct.
224-225: LGTM: Proper error handling and initialization
The CurlDownloadHandler initialization is correctly implemented within the try-catch block, ensuring proper error handling for CURL operations.
134-139: Verify thread safety with custom headers
The implementation appears thread-safe, but let's verify there are no potential race conditions with the custom headers handling.
Also applies to: 224-225
✅ Verification successful
Let me gather more information about the thread safety implementation.
Let me check one more aspect of the implementation to verify thread safety.
Custom headers handling is thread-safe
The implementation is thread-safe because:
- HTTP headers are moved (
std::move) into theDownloaderThreadconstructor, creating a separate copy - The
m_http_header_kv_pairsmember is accessed only within the thread's context - No concurrent access patterns were found in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any concurrent access to http_header_kv_pairs
rg -n "http_header_kv_pairs" --type cpp
# Search for mutex-protected access patterns
ast-grep --pattern 'std::unique_lock<std::mutex> $_ { $_ }'
Length of output: 2948
Script:
#!/bin/bash
# Look for mutex declarations and usage in NetworkReader
rg -n "mutex" --type cpp -A 3 components/core/src/clp/NetworkReader
# Check DownloaderThread implementation for thread safety
ast-grep --pattern 'class DownloaderThread {
$$$
}'
# Look for any shared state access
rg -n "m_http_header_kv_pairs" --type cpp -A 2
Length of output: 962
Script:
#!/bin/bash
# Check NetworkReader class implementation for thread synchronization
ast-grep --pattern 'class NetworkReader {
$$$
}'
# Look for any member variable access patterns
rg -n "std::move" --type cpp -A 2 components/core/src/clp/NetworkReader.cpp
# Check for any concurrent access patterns in DownloaderThread
rg -n "DownloaderThread::" --type cpp -A 5
Length of output: 784
components/core/src/clp_s/JsonParser.cpp (1)
34-34: LGTM: Archive options initialization is properly implemented.
The initialization of min_table_size is correctly placed within the archive options setup and follows the established pattern for option initialization.
components/core/src/clp_s/search/Output.cpp (1)
Line range hint 87-91: LGTM! Method rename aligns with schema-centric data handling.
The change from read_table to read_schema_table accurately reflects the method's purpose of reading schema-specific table data.
Let's verify that all callers have been updated consistently:
✅ Verification successful
Method rename has been consistently applied across the codebase
The verification confirms that:
- The old method name
read_tableis not present anywhere in the codebase - The new method
read_schema_tableis properly defined inArchiveReader.hppand implemented inArchiveReader.cpp - All calls to this method, including the one in
Output.cppand internal usage inArchiveReader.cpp, use the new name
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for any remaining calls to the old method name
# Expect: No occurrences of the old method name
# Search for old method name
rg -l 'read_table\(' --type cpp
# Search for new method name for context
rg -l 'read_schema_table\(' --type cpp
Length of output: 206
Script:
#!/bin/bash
# Let's examine the actual method implementations and calls
rg 'read_table\(' --type cpp -A 2
rg 'read_schema_table\(' --type cpp -A 2
Length of output: 1179
components/core/src/clp_s/SchemaReader.hpp (2)
52-56: Ensuring Consistent Use of Fixed-Width Integer Types
Changing uncompressed_size from size_t to uint64_t ensures consistent use of fixed-width integer types within SchemaMetadata. This enhances portability and predictability across different platforms.
139-144: Ensure All Calls to load Are Updated to the New Signature
The load method signature has been updated to accept std::shared_ptr<char[]> stream_buffer, size_t offset, and size_t uncompressed_size. Please verify that all invocations of load have been updated to match this new signature to prevent compilation errors or unintended behaviour.
To confirm, run the following script to locate all calls to load:
components/core/src/clp_s/ArchiveWriter.cpp (1)
190-194:
Potential exception in sorting comparator may violate noexcept.
The lambda comparator used in std::sort may throw an exception if get_total_uncompressed_size() throws, potentially violating the noexcept guarantee of std::sort. Ensure that get_total_uncompressed_size() is noexcept or handle exceptions appropriately to prevent undefined behaviour.
Run the following script to verify whether get_total_uncompressed_size() is declared noexcept:
🧰 Tools
🪛 cppcheck
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
components/core/src/clp_s/ArchiveReader.cpp (1)
83-85:
Validate stream offsets to prevent out-of-bounds access.
The check if (stream_offset > m_stream_reader.get_uncompressed_stream_size(stream_id)) assumes that stream_offset should not exceed the stream size. Ensure that stream_offset is always within valid bounds to prevent potential buffer overruns.
components/core/src/clp_s/SchemaReader.cpp (1)
40-46: Changes to load method appear correct
The modifications to the load method signature and implementation are appropriate. The use of std::shared_ptr<char[]> for stream_buffer, along with offset and uncompressed_size, enhances buffer management. Assigning m_stream_buffer = stream_buffer and initializing BufferViewReader with m_stream_buffer.get() + offset correctly handles the buffer without introducing ownership issues.
| constexpr std::string_view cRangeHeaderName{"range"}; | ||
| constexpr std::string_view cCacheControlHeaderName{"cache-control"}; | ||
| constexpr std::string_view cPragmaHeaderName{"pragma"}; | ||
| std::unordered_set<std::string_view> reserved_headers; | ||
| reserved_headers.insert(cRangeHeaderName); | ||
| reserved_headers.insert(cCacheControlHeaderName); | ||
| reserved_headers.insert(cPragmaHeaderName); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider using constexpr array for reserved headers.
While using unordered_set provides O(1) lookup, constructing it in the constructor adds runtime overhead. Since we have a small, fixed set of reserved headers, a constexpr array with a simple linear search would be more efficient.
constexpr std::string_view cRangeHeaderName{"range"};
constexpr std::string_view cCacheControlHeaderName{"cache-control"};
constexpr std::string_view cPragmaHeaderName{"pragma"};
-std::unordered_set<std::string_view> reserved_headers;
-reserved_headers.insert(cRangeHeaderName);
-reserved_headers.insert(cCacheControlHeaderName);
-reserved_headers.insert(cPragmaHeaderName);
+constexpr std::array<std::string_view, 3> cReservedHeaders{
+ cRangeHeaderName,
+ cCacheControlHeaderName,
+ cPragmaHeaderName
+};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| constexpr std::string_view cRangeHeaderName{"range"}; | |
| constexpr std::string_view cCacheControlHeaderName{"cache-control"}; | |
| constexpr std::string_view cPragmaHeaderName{"pragma"}; | |
| std::unordered_set<std::string_view> reserved_headers; | |
| reserved_headers.insert(cRangeHeaderName); | |
| reserved_headers.insert(cCacheControlHeaderName); | |
| reserved_headers.insert(cPragmaHeaderName); | |
| constexpr std::string_view cRangeHeaderName{"range"}; | |
| constexpr std::string_view cCacheControlHeaderName{"cache-control"}; | |
| constexpr std::string_view cPragmaHeaderName{"pragma"}; | |
| constexpr std::array<std::string_view, 3> cReservedHeaders{ | |
| cRangeHeaderName, | |
| cCacheControlHeaderName, | |
| cPragmaHeaderName | |
| }; |
| if (http_header_kv_pairs.has_value()) { | ||
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | ||
| // Convert to lowercase for case-insensitive comparison | ||
| std::string lower_key = key; | ||
| std::transform( | ||
| lower_key.begin(), | ||
| lower_key.end(), | ||
| lower_key.begin(), | ||
| [](unsigned char c) { return std::tolower(c); } | ||
| ); | ||
| if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) { | ||
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_BAD_FUNCTION_ARGUMENT, | ||
| fmt::format( | ||
| "curl_download_handler_init failed due to " | ||
| "header: {}: " | ||
| "{}.", | ||
| key, | ||
| value | ||
| ) | ||
| ); | ||
| } else { | ||
| m_http_headers.append(fmt::format("{}: {}", key, value)); | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve code structure and error messages.
- The current if-else structure increases nesting depth unnecessarily.
- The error message could be more specific about whether the failure was due to a reserved header or CRLF termination.
if (http_header_kv_pairs.has_value()) {
for (auto const& [key, value] : http_header_kv_pairs.value()) {
// Convert to lowercase for case-insensitive comparison
std::string lower_key = key;
std::transform(
lower_key.begin(),
lower_key.end(),
lower_key.begin(),
[](unsigned char c) { return std::tolower(c); }
);
- if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) {
+ // Check for reserved headers
+ if (reserved_headers.contains(lower_key)) {
throw CurlOperationFailed(
ErrorCode_Failure,
__FILE__,
__LINE__,
CURLE_BAD_FUNCTION_ARGUMENT,
fmt::format(
- "curl_download_handler_init failed due to "
- "header: {}: "
- "{}.",
- key,
- value
+ "Cannot override reserved header: '{}'",
+ key
)
);
- } else {
- m_http_headers.append(fmt::format("{}: {}", key, value));
}
+
+ // Check for CRLF termination
+ if (value.ends_with("\r\n")) {
+ throw CurlOperationFailed(
+ ErrorCode_Failure,
+ __FILE__,
+ __LINE__,
+ CURLE_BAD_FUNCTION_ARGUMENT,
+ fmt::format(
+ "Header value must not be CRLF-terminated: '{}: {}'",
+ key,
+ value
+ )
+ );
+ }
+
+ m_http_headers.append(fmt::format("{}: {}", key, value));
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (http_header_kv_pairs.has_value()) { | |
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | |
| // Convert to lowercase for case-insensitive comparison | |
| std::string lower_key = key; | |
| std::transform( | |
| lower_key.begin(), | |
| lower_key.end(), | |
| lower_key.begin(), | |
| [](unsigned char c) { return std::tolower(c); } | |
| ); | |
| if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "curl_download_handler_init failed due to " | |
| "header: {}: " | |
| "{}.", | |
| key, | |
| value | |
| ) | |
| ); | |
| } else { | |
| m_http_headers.append(fmt::format("{}: {}", key, value)); | |
| } | |
| } | |
| if (http_header_kv_pairs.has_value()) { | |
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | |
| // Convert to lowercase for case-insensitive comparison | |
| std::string lower_key = key; | |
| std::transform( | |
| lower_key.begin(), | |
| lower_key.end(), | |
| lower_key.begin(), | |
| [](unsigned char c) { return std::tolower(c); } | |
| ); | |
| // Check for reserved headers | |
| if (reserved_headers.contains(lower_key)) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "Cannot override reserved header: '{}'", | |
| key | |
| ) | |
| ); | |
| } | |
| // Check for CRLF termination | |
| if (value.ends_with("\r\n")) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "Header value must not be CRLF-terminated: '{}: {}'", | |
| key, | |
| value | |
| ) | |
| ); | |
| } | |
| m_http_headers.append(fmt::format("{}: {}", key, value)); | |
| } |
| size_t buffer_size, | ||
| std::optional<std::unordered_map<std::string, std::string>> http_header_kv_pairs |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider passing large parameter by value for move semantics
The http_header_kv_pairs parameter could be more efficiently handled by passing it by value and moving it into the member variable, rather than passing by const reference. This would enable move semantics and avoid potential lifetime issues.
- std::optional<std::unordered_map<std::string, std::string>> http_header_kv_pairs
+ std::optional<std::unordered_map<std::string, std::string>> http_header_kv_pairs = std::nulloptCommittable suggestion was skipped due to low confidence.
| SRC_NAME: "yscope-log-viewer-4c69bc11dbe8a5d87b5fbfb0e43a2f2a06f04866" | ||
| SRC_URL: "https://github.com/y-scope/yscope-log-viewer/archive/4c69bc1.zip" |
There was a problem hiding this comment.
💡 Codebase verification
Update dependency version to one that supports custom headers
Based on the commit history and PR information, the current version (4c69bc1) is not related to custom headers support. This version only adds a warning about decoder format string syntax changes. There are no recent commits or PRs related to custom headers functionality.
- The current commit (4c69bc1) focuses on decoder format string warnings
- No evidence of custom headers support in recent commits or PRs
- Consider updating to a version that includes the required custom headers functionality
🔗 Analysis chain
Verify the yscope-log-viewer dependency update.
The dependency is being updated to a new commit (4c69bc1). Let's verify if this is the correct version that supports the custom headers functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the commit exists and check its relationship to custom headers support
# Test 1: Verify the commit exists and get its details
echo "Verifying commit details..."
gh api repos/y-scope/yscope-log-viewer/commits/4c69bc11dbe8a5d87b5fbfb0e43a2f2a06f04866 --jq '.commit.message'
# Test 2: Check if this commit is related to custom headers
echo "Checking commit history for custom headers changes..."
gh api repos/y-scope/yscope-log-viewer/commits/4c69bc11dbe8a5d87b5fbfb0e43a2f2a06f04866/pulls --jq '.[].title'
Length of output: 570
Script:
#!/bin/bash
# Let's check the commit history around this version to see if there are any custom headers related changes
echo "Checking recent commits..."
gh api repos/y-scope/yscope-log-viewer/commits --per-page 10 --jq '.[].commit.message'
# Also check open and recently closed PRs for custom headers
echo "Checking recent PRs..."
gh api repos/y-scope/yscope-log-viewer/pulls --state all --per-page 10 --jq '.[].title'
Length of output: 3031
Script:
#!/bin/bash
# Let's try a different approach using simpler GitHub API queries
echo "Checking recent commits..."
gh api /repos/y-scope/yscope-log-viewer/commits?per_page=10 --jq '.[].commit.message'
echo "Checking recent PRs..."
gh api /repos/y-scope/yscope-log-viewer/pulls?state=all --jq '.[].title'
Length of output: 4077
| /** | ||
| * Decompresses a stream with a given stream_id and returns it. This function must be called | ||
| * strictly in ascending stream_id order. If this function is called twice for the same stream | ||
| * or if a stream with lower id is requested after a stream with higher id then an error is | ||
| * thrown. | ||
| * | ||
| * Note: the buffer and buffer size are returned by reference. This is to support the use case | ||
| * where the caller wants to re-use the same buffer for multiple streams to avoid allocations | ||
| * when they already have a sufficiently large buffer. If no buffer is provided or the provided | ||
| * buffer is too small calling read_stream will create a buffer exactly as large as the stream | ||
| * being decompressed. | ||
| * | ||
| * @param stream_id | ||
| * @param buf a shared ptr to the buffer where the stream will be read. The buffer gets resized | ||
| * if it is too small to contain the requested stream. | ||
| * @param buf_size the size of the underlying buffer owned by buf -- passed and updated by | ||
| * reference | ||
| */ | ||
| void read_stream(size_t stream_id, std::shared_ptr<char[]>& buf, size_t& buf_size); |
There was a problem hiding this comment.
Initialize m_prev_stream_id appropriately to avoid comparison issues
The member m_prev_stream_id is initialized to 0ULL (line 92), which could cause the first call to read_stream with stream_id = 0 to fail the strict ordering check. Since read_stream requires stream_id to be strictly greater than m_prev_stream_id, initializing m_prev_stream_id to std::numeric_limits<size_t>::max() would prevent this issue.
Apply this diff to fix the initialization and include the necessary header:
+#include <limits>
...
- size_t m_prev_stream_id{0ULL};
+ size_t m_prev_stream_id{std::numeric_limits<size_t>::max()};Also applies to: 92-92
|
|
||
| m_tables_file_reader.open(archive_path_str + constants::cArchiveTablesFile); | ||
| m_table_metadata_file_reader.open(archive_path_str + constants::cArchiveTableMetadataFile); | ||
| m_stream_reader.open_packed_streams(archive_path_str + constants::cArchiveTablesFile); |
There was a problem hiding this comment.
Add error handling for open_packed_streams.
The call to m_stream_reader.open_packed_streams(...) lacks error handling for potential failures, such as missing files or read errors. Consider checking the return value or wrapping the call in a try-catch block to handle exceptions appropriately.
| m_stream_reader.read_metadata(m_table_metadata_decompressor); | ||
|
|
There was a problem hiding this comment.
Check for errors after reading metadata.
After invoking m_stream_reader.read_metadata(m_table_metadata_decompressor);, there is no verification to ensure the metadata was read successfully. It's advisable to handle any potential errors that might occur during this operation.
| bool prev_metadata_initialized{false}; | ||
| SchemaReader::SchemaMetadata prev_metadata{}; | ||
| int32_t prev_schema_id{}; |
There was a problem hiding this comment.
Ensure variables are properly initialised before use.
The variables prev_metadata_initialized, prev_metadata, and prev_schema_id are used within the loop but may not be properly initialised in all code paths. Verify that they are correctly set before the loop to prevent undefined behaviour.
| if (prev_metadata_initialized) { | ||
| uint64_t uncompressed_size{0}; | ||
| if (stream_id != prev_metadata.stream_id) { | ||
| uncompressed_size | ||
| = m_stream_reader.get_uncompressed_stream_size(prev_metadata.stream_id) | ||
| - prev_metadata.stream_offset; | ||
| } else { | ||
| uncompressed_size = stream_offset - prev_metadata.stream_offset; | ||
| } | ||
| prev_metadata.uncompressed_size = uncompressed_size; | ||
| m_id_to_schema_metadata[prev_schema_id] = prev_metadata; | ||
| } else { |
There was a problem hiding this comment.
Guard against underflow in uncompressed size calculation.
In the calculation of uncompressed_size, subtracting prev_metadata.stream_offset from stream_offset could result in a negative value if stream_offset is less than prev_metadata.stream_offset. Since uncompressed_size is an unsigned integer, this would lead to underflow. Add a check to ensure stream_offset >= prev_metadata.stream_offset.
| std::shared_ptr<char[]> ArchiveReader::read_stream(size_t stream_id, bool reuse_buffer) { | ||
| if (nullptr != m_stream_buffer && m_cur_stream_id == stream_id) { | ||
| return m_stream_buffer; | ||
| } | ||
|
|
||
| if (false == reuse_buffer) { | ||
| m_stream_buffer.reset(); | ||
| m_stream_buffer_size = 0; | ||
| } | ||
|
|
||
| m_stream_reader.read_stream(stream_id, m_stream_buffer, m_stream_buffer_size); | ||
| m_cur_stream_id = stream_id; | ||
| return m_stream_buffer; | ||
| } |
There was a problem hiding this comment.
Address thread safety for shared resources in read_stream.
The method read_stream modifies shared members m_stream_buffer, m_cur_stream_id, and m_stream_buffer_size. If ArchiveReader is accessed by multiple threads, this could cause race conditions. Ensure that access to shared resources is properly synchronised or document that the class is not thread-safe.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
components/core/tests/test-NetworkReader.cpp (2)
196-223: LGTM! The test case effectively validates custom header functionality.The test comprehensively verifies that custom headers are correctly transmitted and received.
Consider these improvements:
- The constant name
cNumHttpHeaderKeyValuePairscould be more descriptive, e.g.,cNumTestHeaders- Consider adding a brief comment explaining why 10 headers were chosen as the test size
236-236: Remove commented out code.The line
// REQUIRE(test_illegal_header(header_name, header_value));should be removed as it appears to be obsolete.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
components/core/src/clp/CurlDownloadHandler.cpp(3 hunks)components/core/src/clp/CurlDownloadHandler.hpp(3 hunks)components/core/src/clp/NetworkReader.cpp(4 hunks)components/core/src/clp/NetworkReader.hpp(5 hunks)components/core/tests/test-NetworkReader.cpp(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- components/core/src/clp/CurlDownloadHandler.hpp
- components/core/src/clp/NetworkReader.cpp
- components/core/src/clp/NetworkReader.hpp
🔇 Additional comments (4)
components/core/src/clp/CurlDownloadHandler.cpp (3)
3-18: LGTM! All necessary includes are present and properly ordered.
The new includes support the header processing functionality and follow the project's include ordering convention.
31-32: LGTM! Parameter type is well-chosen.
The use of std::optional<std::unordered_map<std::string, std::string>> const& is appropriate as it:
- Allows optional header specification
- Prevents unnecessary copying through const reference
- Uses string type for header names and values
61-67: 🛠️ Refactor suggestion
Consider using a constexpr array for reserved headers.
While the current implementation works, using an unordered_set introduces unnecessary runtime overhead. Since we have a small, fixed set of reserved headers, a constexpr array would be more efficient.
constexpr std::string_view cRangeHeaderName{"range"};
constexpr std::string_view cCacheControlHeaderName{"cache-control"};
constexpr std::string_view cPragmaHeaderName{"pragma"};
-std::unordered_set<std::string_view> reserved_headers;
-reserved_headers.insert(cRangeHeaderName);
-reserved_headers.insert(cCacheControlHeaderName);
-reserved_headers.insert(cPragmaHeaderName);
+constexpr std::array<std::string_view, 3> cReservedHeaders{
+ cRangeHeaderName,
+ cCacheControlHeaderName,
+ cPragmaHeaderName
+};Likely invalid or redundant comment.
components/core/tests/test-NetworkReader.cpp (1)
9-16: LGTM! The new includes are appropriate for the added functionality.
The added headers support the new test cases by providing necessary data structures and formatting capabilities.
| if (http_header_kv_pairs.has_value()) { | ||
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | ||
| // Convert to lowercase for case-insensitive comparison | ||
| std::string lower_key = key; | ||
| std::transform( | ||
| lower_key.begin(), | ||
| lower_key.end(), | ||
| lower_key.begin(), | ||
| [](unsigned char c) { return std::tolower(c); } | ||
| ); | ||
| if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) { | ||
| throw CurlOperationFailed( | ||
| ErrorCode_Failure, | ||
| __FILE__, | ||
| __LINE__, | ||
| CURLE_BAD_FUNCTION_ARGUMENT, | ||
| fmt::format( | ||
| "curl_download_handler_init failed due to " | ||
| "header: {}: " | ||
| "{}.", | ||
| key, | ||
| value | ||
| ) | ||
| ); | ||
| } | ||
| m_http_headers.append(fmt::format("{}: {}", key, value)); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve code structure and error messages.
The current implementation can be improved by:
- Separating validation checks for better clarity
- Providing more specific error messages
- Reducing nesting depth
if (http_header_kv_pairs.has_value()) {
for (auto const& [key, value] : http_header_kv_pairs.value()) {
std::string lower_key = key;
std::transform(
lower_key.begin(),
lower_key.end(),
lower_key.begin(),
[](unsigned char c) { return std::tolower(c); }
);
- if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) {
+ // Check for reserved headers
+ if (reserved_headers.contains(lower_key)) {
throw CurlOperationFailed(
ErrorCode_Failure,
__FILE__,
__LINE__,
CURLE_BAD_FUNCTION_ARGUMENT,
fmt::format(
- "curl_download_handler_init failed due to "
- "header: {}: "
- "{}.",
- key,
- value
+ "Cannot override reserved header: '{}'",
+ key
)
);
- }
- m_http_headers.append(fmt::format("{}: {}", key, value));
+ }
+
+ // Check for CRLF termination
+ if (value.ends_with("\r\n")) {
+ throw CurlOperationFailed(
+ ErrorCode_Failure,
+ __FILE__,
+ __LINE__,
+ CURLE_BAD_FUNCTION_ARGUMENT,
+ fmt::format(
+ "Header value must not be CRLF-terminated: '{}: {}'",
+ key,
+ value
+ )
+ );
+ }
+
+ m_http_headers.append(fmt::format("{}: {}", key, value));
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (http_header_kv_pairs.has_value()) { | |
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | |
| // Convert to lowercase for case-insensitive comparison | |
| std::string lower_key = key; | |
| std::transform( | |
| lower_key.begin(), | |
| lower_key.end(), | |
| lower_key.begin(), | |
| [](unsigned char c) { return std::tolower(c); } | |
| ); | |
| if (reserved_headers.contains(lower_key) || value.ends_with("\r\n")) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "curl_download_handler_init failed due to " | |
| "header: {}: " | |
| "{}.", | |
| key, | |
| value | |
| ) | |
| ); | |
| } | |
| m_http_headers.append(fmt::format("{}: {}", key, value)); | |
| } | |
| if (http_header_kv_pairs.has_value()) { | |
| for (auto const& [key, value] : http_header_kv_pairs.value()) { | |
| std::string lower_key = key; | |
| std::transform( | |
| lower_key.begin(), | |
| lower_key.end(), | |
| lower_key.begin(), | |
| [](unsigned char c) { return std::tolower(c); } | |
| ); | |
| // Check for reserved headers | |
| if (reserved_headers.contains(lower_key)) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "Cannot override reserved header: '{}'", | |
| key | |
| ) | |
| ); | |
| } | |
| // Check for CRLF termination | |
| if (value.ends_with("\r\n")) { | |
| throw CurlOperationFailed( | |
| ErrorCode_Failure, | |
| __FILE__, | |
| __LINE__, | |
| CURLE_BAD_FUNCTION_ARGUMENT, | |
| fmt::format( | |
| "Header value must not be CRLF-terminated: '{}: {}'", | |
| key, | |
| value | |
| ) | |
| ); | |
| } | |
| m_http_headers.append(fmt::format("{}: {}", key, value)); | |
| } |
| REQUIRE([header_name, header_value]() { | ||
| std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs; | ||
| illegal_http_header_kv_pairs.emplace(header_name, header_value); | ||
| clp::NetworkReader illegal_reader{ | ||
| "https://httpbin.org/headers", | ||
| 0, | ||
| false, | ||
| clp::CurlDownloadHandler::cDefaultOverallTimeout, | ||
| clp::CurlDownloadHandler::cDefaultConnectionTimeout, | ||
| clp::NetworkReader::cDefaultBufferPoolSize, | ||
| clp::NetworkReader::cDefaultBufferSize, | ||
| illegal_http_header_kv_pairs | ||
| }; | ||
| auto const content = get_content(illegal_reader); | ||
| return assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader) | ||
| && content.empty(); | ||
| }()); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider simplifying the lambda function.
The lambda could be more concise by removing the explicit return statement and letting the expression flow naturally.
- REQUIRE([header_name, header_value]() {
- std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs;
- illegal_http_header_kv_pairs.emplace(header_name, header_value);
- clp::NetworkReader illegal_reader{
- "https://httpbin.org/headers",
- 0,
- false,
- clp::CurlDownloadHandler::cDefaultOverallTimeout,
- clp::CurlDownloadHandler::cDefaultConnectionTimeout,
- clp::NetworkReader::cDefaultBufferPoolSize,
- clp::NetworkReader::cDefaultBufferSize,
- illegal_http_header_kv_pairs
- };
- auto const content = get_content(illegal_reader);
- return assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader)
- && content.empty();
- }());
+ std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs{{header_name, header_value}};
+ clp::NetworkReader illegal_reader{
+ "https://httpbin.org/headers",
+ 0,
+ false,
+ clp::CurlDownloadHandler::cDefaultOverallTimeout,
+ clp::CurlDownloadHandler::cDefaultConnectionTimeout,
+ clp::NetworkReader::cDefaultBufferPoolSize,
+ clp::NetworkReader::cDefaultBufferSize,
+ illegal_http_header_kv_pairs
+ };
+ auto const content = get_content(illegal_reader);
+ REQUIRE(assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader));
+ REQUIRE(content.empty());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| REQUIRE([header_name, header_value]() { | |
| std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs; | |
| illegal_http_header_kv_pairs.emplace(header_name, header_value); | |
| clp::NetworkReader illegal_reader{ | |
| "https://httpbin.org/headers", | |
| 0, | |
| false, | |
| clp::CurlDownloadHandler::cDefaultOverallTimeout, | |
| clp::CurlDownloadHandler::cDefaultConnectionTimeout, | |
| clp::NetworkReader::cDefaultBufferPoolSize, | |
| clp::NetworkReader::cDefaultBufferSize, | |
| illegal_http_header_kv_pairs | |
| }; | |
| auto const content = get_content(illegal_reader); | |
| return assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader) | |
| && content.empty(); | |
| }()); | |
| std::unordered_map<std::string, std::string> illegal_http_header_kv_pairs{{header_name, header_value}}; | |
| clp::NetworkReader illegal_reader{ | |
| "https://httpbin.org/headers", | |
| 0, | |
| false, | |
| clp::CurlDownloadHandler::cDefaultOverallTimeout, | |
| clp::CurlDownloadHandler::cDefaultConnectionTimeout, | |
| clp::NetworkReader::cDefaultBufferPoolSize, | |
| clp::NetworkReader::cDefaultBufferSize, | |
| illegal_http_header_kv_pairs | |
| }; | |
| auto const content = get_content(illegal_reader); | |
| REQUIRE(assert_curl_error_code(CURLE_BAD_FUNCTION_ARGUMENT, illegal_reader)); | |
| REQUIRE(content.empty()); |
Description
Modified constructors of
NetworkReader,DownloaderThreadandCurlDownloadHandlerto support custom headers passed by users.Validation performed
Added the
Authorizationheader with aBearertoken to fetch a file from remote, validating the file's size.Summary by CodeRabbit
New Features
CurlDownloadHandlerandNetworkReaderconstructors to accept custom HTTP headers, allowing for more flexible HTTP requests.CurlDownloadHandlerandNetworkReader.PackedStreamReaderclass to manage reading packed streams and their metadata.Tests
NetworkReaderto validate the handling of custom HTTP headers, ensuring expected headers are included and illegal ones are excluded.Documentation