Skip to content

refactor: introduce expected pattern for error handling in master service#562

Merged
xiaguan merged 10 commits intokvcache-ai:mainfrom
xiaguan:master_expected
Jul 3, 2025
Merged

refactor: introduce expected pattern for error handling in master service#562
xiaguan merged 10 commits intokvcache-ai:mainfrom
xiaguan:master_expected

Conversation

@xiaguan
Copy link
Copy Markdown
Collaborator

@xiaguan xiaguan commented Jun 26, 2025

Overview

This PR introduces the tl::expected pattern for error handling in the master service, replacing traditional ErrorCode return types with a more type-safe and explicit approach.

Changes

Core Improvements

  • Type-safe error handling: Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern
  • Clearer API semantics: Success values and error codes are now explicitly separated
  • Better developer experience:
    • Success cases: access via .value()
    • Error cases: access via .error()
    • Eliminates ambiguity between success and error states

Modified Components

  • MasterService methods: Updated to return expected<void, ErrorCode> or expected<T, ErrorCode>
  • RPC service interfaces: Modified to support the expected pattern
  • Test suites: Updated all related tests to handle new expected return types
  • Dependencies: Added necessary includes for ylt/util/expected.hpp

Files Changed

  • mooncake-store/include/master_service.h - Updated method signatures
  • mooncake-store/include/rpc_service.h - RPC interface adaptations
  • mooncake-store/include/types.h - Minor formatting improvements
  • mooncake-store/src/master_service.cpp - Implementation updates
  • mooncake-store/tests/ - Comprehensive test updates

Benefits

  1. Type Safety: Compile-time guarantees about error handling
  2. Explicit Error Handling: No more ambiguous return codes
  3. Better Maintainability: Clear separation of success and error paths
  4. Future-Ready: Foundation for extending expected pattern to RPC responses

Future Work

This PR lays the groundwork for:

  • Extending expected pattern to RPC response types
  • Enhancing the error code system for more comprehensive error handling
  • Further improving type safety across the codebase

Testing

All existing tests have been updated and continue to pass, ensuring backward compatibility and correctness of the new error handling approach.


Note: This change maintains API compatibility while improving error handling semantics. The expected pattern provides a modern, type-safe approach to error handling that will benefit future development and maintenance.

@xiaguan xiaguan force-pushed the master_expected branch 4 times, most recently from fa9d130 to a4a9a49 Compare June 30, 2025 06:17
@xiaguan xiaguan requested review from Copilot and ykwd and removed request for ykwd June 30, 2025 06:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modernizes error handling across the master service and client layers by adopting tl::expected<T, ErrorCode>, updating method signatures, RPC adapters, and test suites to use has_value()/.error().

  • Core RPC and service methods now return tl::expected instead of ErrorCode.
  • Test code is updated to assert on .has_value() and extract .error() where appropriate.
  • Introduced execute_rpc helper in rpc_helper.h and enhanced logging in scoped_vlog_timer.h.

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/stress_workload_test.cpp Updated mount/unmount calls to use .has_value() checks
tests/client_integration_test.cpp Refactored client calls to expected pattern; minor typo introduced
tests/master_service_test.cpp Converted EXPECT_EQ on ErrorCode to EXPECT_TRUE/.error() assertions
include/rpc_helper.h Added generic execute_rpc helper for RPC logging and metrics
include/utils/scoped_vlog_timer.h Enhanced ScopedVLogTimer with LogResponseExpected for tl::expected

Comment thread mooncake-store/tests/stress_workload_test.cpp Outdated
Comment thread mooncake-store/tests/stress_workload_test.cpp Outdated
Comment thread mooncake-store/tests/client_integration_test.cpp Outdated
Copy link
Copy Markdown
Collaborator

@ykwd ykwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just a few suggestions in the comments.

Comment thread mooncake-store/include/rpc_helper.h Outdated
// Special handling for bool to log the value, similar to the original
// implementation of ExistKey.
if constexpr (std::is_same_v<ResultType, bool>) {
timer.LogResponse("success=true, exist=", result.value());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Using success=true or false is a little bit ambiguous. Also, the "exist=" may only be suitable if the returned value denotes whether something exists or not. How about timer.LogResponse("status=success, return=", result.value());

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Resolved.

Comment thread mooncake-store/include/rpc_helper.h Outdated
if constexpr (std::is_same_v<ResultType, bool>) {
timer.LogResponse("success=true, exist=", result.value());
} else {
timer.LogResponse("success=true");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timer.LogResponse("status=success“);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Resolved.

} else {
inc_fail_metric();
timer.LogResponse("success=false, error=", toString(result.error()));
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timer.LogResponse("status=fail, error=", toString(result.error()));

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment thread mooncake-store/include/rpc_helper.h Outdated
auto execute_rpc(std::string_view rpc_name, RpcCallable&& rpc_call,
LogRequestCallable&& log_request,
std::function<void()> inc_req_metric,
std::function<void()> inc_fail_metric) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like the rpc_call and log_request, why don't we also use template for in_req_metric and inc_fail_metric? The template can prevent the overhead of the std::function

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sovled

}
std::vector<tl::expected<std::vector<Replica::Descriptor>, ErrorCode>>
results;
results.reserve(keys.size());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we record batchGet and batchPut metrics?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

xiaguan added 6 commits June 30, 2025 19:52
…vice

- Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern
- Improve error handling clarity by separating success values from error codes
- Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode>
- Modify RPC service interfaces to support expected pattern
- Update all related tests to handle new expected return types
- Add necessary includes for ylt/util/expected.hpp

This change makes error handling more explicit and type-safe:
- Success cases can be accessed via .value()
- Error cases can be accessed via .error()
- Eliminates ambiguity between success and error states

Future work:
- Extend expected pattern to RPC response types
- Enhance error code system for more comprehensive error handling
…Code

- Updated MasterService methods to replace ylt::expected with tl::expected for better error handling.
- Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types.
- Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected.
- Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type.
- Improved overall error reporting in tests to provide clearer feedback on operation failures.
- Enhanced client.h and client.cpp with new functionality
- Removed obsolete master.proto file
- Updated master_client.cpp and transfer_task.cpp
- Improved integration and stress tests
- Replace BatchObjectInfo with vector<vector<Replica::Descriptor>>
- Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>>
- Fix BatchQuery API usage to work with new expected pattern
- Convert unordered_map to vector format for BatchPut parameter compatibility
xiaguan added 3 commits July 3, 2025 10:57
Resolved merge conflicts in multiple files:
- mooncake-integration/store/store_py.cpp: Updated replica handling to support both memory and disk replicas
- mooncake-store/include/client.h: Unified API signatures for Get and BatchPut methods
- mooncake-store/include/rpc_service.h: Added response structures and fixed Ping method
- mooncake-store/src/client.cpp: Adopted main branch implementation with storage backend support
- mooncake-store/src/master_client.cpp: Used main branch version
- mooncake-store/src/transfer_task.cpp: Used main branch version
- mooncake-store/tests/*: Used main branch test implementations

Key changes integrated:
- Enhanced storage backend functionality
- Ascend transport support
- Improved error handling and batch operations
- Updated test infrastructure
@xiaguan xiaguan merged commit bde2fca into kvcache-ai:main Jul 3, 2025
10 checks passed
201341 pushed a commit to 201341/Mooncake that referenced this pull request Jul 22, 2025
…vice (kvcache-ai#562)

* refactor: introduce expected pattern for error handling in master service

- Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern
- Improve error handling clarity by separating success values from error codes
- Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode>
- Modify RPC service interfaces to support expected pattern
- Update all related tests to handle new expected return types
- Add necessary includes for ylt/util/expected.hpp

This change makes error handling more explicit and type-safe:
- Success cases can be accessed via .value()
- Error cases can be accessed via .error()
- Eliminates ambiguity between success and error states

Future work:
- Extend expected pattern to RPC response types
- Enhance error code system for more comprehensive error handling

* Refactor error handling to use expected<T,E> pattern instead of ErrorCode

- Updated MasterService methods to replace ylt::expected with tl::expected for better error handling.
- Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types.
- Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected.
- Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type.
- Improved overall error reporting in tests to provide clearer feedback on operation failures.

* fix test compile

* refactor: update client implementation and remove master.proto

- Enhanced client.h and client.cpp with new functionality
- Removed obsolete master.proto file
- Updated master_client.cpp and transfer_task.cpp
- Improved integration and stress tests

* fix: update Python integration to work with new batch API

- Replace BatchObjectInfo with vector<vector<Replica::Descriptor>>
- Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>>
- Fix BatchQuery API usage to work with new expected pattern
- Convert unordered_map to vector format for BatchPut parameter compatibility

* refine master log and metric

* fix(ci): should alloc first

* merge main

* fix tests
wanyue-wy pushed a commit to wanyue-wy/Mooncake that referenced this pull request Dec 14, 2025
…vice (kvcache-ai#562)

* refactor: introduce expected pattern for error handling in master service

- Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern
- Improve error handling clarity by separating success values from error codes
- Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode>
- Modify RPC service interfaces to support expected pattern
- Update all related tests to handle new expected return types
- Add necessary includes for ylt/util/expected.hpp

This change makes error handling more explicit and type-safe:
- Success cases can be accessed via .value()
- Error cases can be accessed via .error()
- Eliminates ambiguity between success and error states

Future work:
- Extend expected pattern to RPC response types
- Enhance error code system for more comprehensive error handling

* Refactor error handling to use expected<T,E> pattern instead of ErrorCode

- Updated MasterService methods to replace ylt::expected with tl::expected for better error handling.
- Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types.
- Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected.
- Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type.
- Improved overall error reporting in tests to provide clearer feedback on operation failures.

* fix test compile

* refactor: update client implementation and remove master.proto

- Enhanced client.h and client.cpp with new functionality
- Removed obsolete master.proto file
- Updated master_client.cpp and transfer_task.cpp
- Improved integration and stress tests

* fix: update Python integration to work with new batch API

- Replace BatchObjectInfo with vector<vector<Replica::Descriptor>>
- Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>>
- Fix BatchQuery API usage to work with new expected pattern
- Convert unordered_map to vector format for BatchPut parameter compatibility

* refine master log and metric

* fix(ci): should alloc first

* merge main

* fix tests
JasonZhang517 pushed a commit to JasonZhang517/Mooncake that referenced this pull request Feb 9, 2026
…vice (kvcache-ai#562)

* refactor: introduce expected pattern for error handling in master service

- Replace ErrorCode return types with tl::expected<T, ErrorCode> pattern
- Improve error handling clarity by separating success values from error codes
- Update MasterService methods to return expected<void, ErrorCode> or expected<T, ErrorCode>
- Modify RPC service interfaces to support expected pattern
- Update all related tests to handle new expected return types
- Add necessary includes for ylt/util/expected.hpp

This change makes error handling more explicit and type-safe:
- Success cases can be accessed via .value()
- Error cases can be accessed via .error()
- Eliminates ambiguity between success and error states

Future work:
- Extend expected pattern to RPC response types
- Enhance error code system for more comprehensive error handling

* Refactor error handling to use expected<T,E> pattern instead of ErrorCode

- Updated MasterService methods to replace ylt::expected with tl::expected for better error handling.
- Modified BatchGetReplicaList, BatchPutStart, BatchPutEnd, and other methods to return tl::expected types.
- Enhanced ClientIntegrationTest to handle expected results from Put, Get, Remove, and other operations using tl::expected.
- Adjusted error handling in clientctl and master_metrics_test to utilize the new expected type.
- Improved overall error reporting in tests to provide clearer feedback on operation failures.

* fix test compile

* refactor: update client implementation and remove master.proto

- Enhanced client.h and client.cpp with new functionality
- Removed obsolete master.proto file
- Updated master_client.cpp and transfer_task.cpp
- Improved integration and stress tests

* fix: update Python integration to work with new batch API

- Replace BatchObjectInfo with vector<vector<Replica::Descriptor>>
- Update BatchPut to handle new return type vector<tl::expected<void, ErrorCode>>
- Fix BatchQuery API usage to work with new expected pattern
- Convert unordered_map to vector format for BatchPut parameter compatibility

* refine master log and metric

* fix(ci): should alloc first

* merge main

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants