refactor: introduce expected pattern for error handling in master service#562
refactor: introduce expected pattern for error handling in master service#562xiaguan merged 10 commits intokvcache-ai:mainfrom
Conversation
fa9d130 to
a4a9a49
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR modernizes error handling across the master service and client layers by adopting tl::expected<T, ErrorCode>, updating method signatures, RPC adapters, and test suites to use has_value()/.error().
- Core RPC and service methods now return
tl::expectedinstead ofErrorCode. - Test code is updated to assert on
.has_value()and extract.error()where appropriate. - Introduced
execute_rpchelper inrpc_helper.hand enhanced logging inscoped_vlog_timer.h.
Reviewed Changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/stress_workload_test.cpp | Updated mount/unmount calls to use .has_value() checks |
| tests/client_integration_test.cpp | Refactored client calls to expected pattern; minor typo introduced |
| tests/master_service_test.cpp | Converted EXPECT_EQ on ErrorCode to EXPECT_TRUE/.error() assertions |
| include/rpc_helper.h | Added generic execute_rpc helper for RPC logging and metrics |
| include/utils/scoped_vlog_timer.h | Enhanced ScopedVLogTimer with LogResponseExpected for tl::expected |
ykwd
left a comment
There was a problem hiding this comment.
Great work! Just a few suggestions in the comments.
| // Special handling for bool to log the value, similar to the original | ||
| // implementation of ExistKey. | ||
| if constexpr (std::is_same_v<ResultType, bool>) { | ||
| timer.LogResponse("success=true, exist=", result.value()); |
There was a problem hiding this comment.
- Using success=true or false is a little bit ambiguous. Also, the "exist=" may only be suitable if the returned value denotes whether something exists or not. How about timer.LogResponse("status=success, return=", result.value());
There was a problem hiding this comment.
Good suggestion. Resolved.
| if constexpr (std::is_same_v<ResultType, bool>) { | ||
| timer.LogResponse("success=true, exist=", result.value()); | ||
| } else { | ||
| timer.LogResponse("success=true"); |
There was a problem hiding this comment.
timer.LogResponse("status=success“);
There was a problem hiding this comment.
Good suggestion. Resolved.
| } else { | ||
| inc_fail_metric(); | ||
| timer.LogResponse("success=false, error=", toString(result.error())); | ||
| } |
There was a problem hiding this comment.
timer.LogResponse("status=fail, error=", toString(result.error()));
| auto execute_rpc(std::string_view rpc_name, RpcCallable&& rpc_call, | ||
| LogRequestCallable&& log_request, | ||
| std::function<void()> inc_req_metric, | ||
| std::function<void()> inc_fail_metric) { |
There was a problem hiding this comment.
Just like the rpc_call and log_request, why don't we also use template for in_req_metric and inc_fail_metric? The template can prevent the overhead of the std::function
| } | ||
| std::vector<tl::expected<std::vector<Replica::Descriptor>, ErrorCode>> | ||
| results; | ||
| results.reserve(keys.size()); |
There was a problem hiding this comment.
Will we record batchGet and batchPut metrics?
…vice - Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern - Improve error handling clarity by separating success values from error codes - Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode> - Modify RPC service interfaces to support expected pattern - Update all related tests to handle new expected return types - Add necessary includes for ylt/util/expected.hpp This change makes error handling more explicit and type-safe: - Success cases can be accessed via .value() - Error cases can be accessed via .error() - Eliminates ambiguity between success and error states Future work: - Extend expected pattern to RPC response types - Enhance error code system for more comprehensive error handling
…Code - Updated MasterService methods to replace ylt::expected with tl::expected for better error handling. - Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types. - Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected. - Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type. - Improved overall error reporting in tests to provide clearer feedback on operation failures.
- Enhanced client.h and client.cpp with new functionality - Removed obsolete master.proto file - Updated master_client.cpp and transfer_task.cpp - Improved integration and stress tests
- Replace BatchObjectInfo with vector<vector<Replica::Descriptor>> - Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>> - Fix BatchQuery API usage to work with new expected pattern - Convert unordered_map to vector format for BatchPut parameter compatibility
Resolved merge conflicts in multiple files: - mooncake-integration/store/store_py.cpp: Updated replica handling to support both memory and disk replicas - mooncake-store/include/client.h: Unified API signatures for Get and BatchPut methods - mooncake-store/include/rpc_service.h: Added response structures and fixed Ping method - mooncake-store/src/client.cpp: Adopted main branch implementation with storage backend support - mooncake-store/src/master_client.cpp: Used main branch version - mooncake-store/src/transfer_task.cpp: Used main branch version - mooncake-store/tests/*: Used main branch test implementations Key changes integrated: - Enhanced storage backend functionality - Ascend transport support - Improved error handling and batch operations - Updated test infrastructure
…vice (kvcache-ai#562) * refactor: introduce expected pattern for error handling in master service - Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern - Improve error handling clarity by separating success values from error codes - Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode> - Modify RPC service interfaces to support expected pattern - Update all related tests to handle new expected return types - Add necessary includes for ylt/util/expected.hpp This change makes error handling more explicit and type-safe: - Success cases can be accessed via .value() - Error cases can be accessed via .error() - Eliminates ambiguity between success and error states Future work: - Extend expected pattern to RPC response types - Enhance error code system for more comprehensive error handling * Refactor error handling to use expected<T,E> pattern instead of ErrorCode - Updated MasterService methods to replace ylt::expected with tl::expected for better error handling. - Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types. - Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected. - Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type. - Improved overall error reporting in tests to provide clearer feedback on operation failures. * fix test compile * refactor: update client implementation and remove master.proto - Enhanced client.h and client.cpp with new functionality - Removed obsolete master.proto file - Updated master_client.cpp and transfer_task.cpp - Improved integration and stress tests * fix: update Python integration to work with new batch API - Replace BatchObjectInfo with vector<vector<Replica::Descriptor>> - Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>> - Fix BatchQuery API usage to work with new expected pattern - Convert unordered_map to vector format for BatchPut parameter compatibility * refine master log and metric * fix(ci): should alloc first * merge main * fix tests
…vice (kvcache-ai#562) * refactor: introduce expected pattern for error handling in master service - Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern - Improve error handling clarity by separating success values from error codes - Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode> - Modify RPC service interfaces to support expected pattern - Update all related tests to handle new expected return types - Add necessary includes for ylt/util/expected.hpp This change makes error handling more explicit and type-safe: - Success cases can be accessed via .value() - Error cases can be accessed via .error() - Eliminates ambiguity between success and error states Future work: - Extend expected pattern to RPC response types - Enhance error code system for more comprehensive error handling * Refactor error handling to use expected<T,E> pattern instead of ErrorCode - Updated MasterService methods to replace ylt::expected with tl::expected for better error handling. - Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types. - Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected. - Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type. - Improved overall error reporting in tests to provide clearer feedback on operation failures. * fix test compile * refactor: update client implementation and remove master.proto - Enhanced client.h and client.cpp with new functionality - Removed obsolete master.proto file - Updated master_client.cpp and transfer_task.cpp - Improved integration and stress tests * fix: update Python integration to work with new batch API - Replace BatchObjectInfo with vector<vector<Replica::Descriptor>> - Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>> - Fix BatchQuery API usage to work with new expected pattern - Convert unordered_map to vector format for BatchPut parameter compatibility * refine master log and metric * fix(ci): should alloc first * merge main * fix tests
…vice (kvcache-ai#562) * refactor: introduce expected pattern for error handling in master service - Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern - Improve error handling clarity by separating success values from error codes - Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode> - Modify RPC service interfaces to support expected pattern - Update all related tests to handle new expected return types - Add necessary includes for ylt/util/expected.hpp This change makes error handling more explicit and type-safe: - Success cases can be accessed via .value() - Error cases can be accessed via .error() - Eliminates ambiguity between success and error states Future work: - Extend expected pattern to RPC response types - Enhance error code system for more comprehensive error handling * Refactor error handling to use expected<T,E> pattern instead of ErrorCode - Updated MasterService methods to replace ylt::expected with tl::expected for better error handling. - Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types. - Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected. - Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type. - Improved overall error reporting in tests to provide clearer feedback on operation failures. * fix test compile * refactor: update client implementation and remove master.proto - Enhanced client.h and client.cpp with new functionality - Removed obsolete master.proto file - Updated master_client.cpp and transfer_task.cpp - Improved integration and stress tests * fix: update Python integration to work with new batch API - Replace BatchObjectInfo with vector<vector<Replica::Descriptor>> - Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>> - Fix BatchQuery API usage to work with new expected pattern - Convert unordered_map to vector format for BatchPut parameter compatibility * refine master log and metric * fix(ci): should alloc first * merge main * fix tests
Overview
This PR introduces the
tl::expectedpattern for error handling in the master service, replacing traditionalErrorCodereturn types with a more type-safe and explicit approach.Changes
Core Improvements
ErrorCodereturn types withtl::expected<T, ErrorCode>pattern.value().error()Modified Components
expected<void, ErrorCode>orexpected<T, ErrorCode>ylt/util/expected.hppFiles Changed
mooncake-store/include/master_service.h- Updated method signaturesmooncake-store/include/rpc_service.h- RPC interface adaptationsmooncake-store/include/types.h- Minor formatting improvementsmooncake-store/src/master_service.cpp- Implementation updatesmooncake-store/tests/- Comprehensive test updatesBenefits
Future Work
This PR lays the groundwork for:
Testing
All existing tests have been updated and continue to pass, ensuring backward compatibility and correctness of the new error handling approach.
Note: This change maintains API compatibility while improving error handling semantics. The expected pattern provides a modern, type-safe approach to error handling that will benefit future development and maintenance.