ffi: Add class for key-value pair log events.#507
Conversation
| if (false == value.has_value()) { | ||
| // Empty value | ||
| if (SchemaTreeNode::Type::Obj != type) { | ||
| return std::errc::protocol_error; | ||
| } |
There was a problem hiding this comment.
I kinda feel like this can be moved into the value type validation method.
There was a problem hiding this comment.
prefer to leave it outside for two reasons:
- empty isn't a
Valuetype - This simplifies the helper function so it doesn't have to check
has_value()and we don't need to symbols inside the helper (one for thestd::optional<Value>and the over for the reference of the underlyingValue)
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
haiqi96
left a comment
There was a problem hiding this comment.
Just quickly skimmed through the class and haven't read the unit tests yet.
| * `node_id_value_pairs`. A node is considered a leaf if none of its descendants appear in the | ||
| * `node_id_value_pairs`. | ||
| */ | ||
| [[nodiscard]] auto is_leaf_node( |
There was a problem hiding this comment.
Disclaimer: I only have a rough understanding on how schema tree works.
This function is bit confusing for me.
"A node is considered a leaf if none of its descendants appear in the node_id_value_pairs." <- This doesn't seem to be the normal definition for a leaf node. Is this expected?
In addition, "the sub schema tree defined by node_id_value_pairs." Is there any guarantee that nodes in the node_id_value_pairs will form a subtree? Just looking at the argument without thinking about how it is generated, it is possible node_id_value_pairs contains a list of unconnected nodes.
And lastly a highlevel comment is that I feel this function should belong to the Schema tree class.
There was a problem hiding this comment.
This is a validation method: if an object type node appears in the key value pairs, it must be a leaf node. The way we store key value pairs is to store the schema (all as leaf nodes), so the path from the root to the leaves are implicitly indicated. In the merged schema tree, usually an object type node is a non leaf node. But in a key value pairs' schema, this node can be a leaf node as long as its value is {} or null. Our goal is to validate that if such a node appears, all its descendants in the merged schema tree can't appear in the schema of the given key value pair log event. it is possible node_id_value_pairs contains a list of unconnected nodes isn't true since everything will be connected to the root if they are valid leaf nodes. This function is used to validate pairs using schema tree's information. I think we should split it from a schema tree's method.
| auto const& node{schema_tree.get_node(node_id)}; | ||
| auto const node_type{node.get_type()}; | ||
| if (false == value.has_value()) { | ||
| // Value is an empty object (`{}`, which is not the same as `null`) |
There was a problem hiding this comment.
I assume this comment tries to explain why the line doesn't use value.is_null(); in a way similar to line 68?
I don't think I follow the comment though. Can you clarify a bit?
And we can combine the two if statements?
There was a problem hiding this comment.
{}andnullare two different types of values.- I'd prefer to handle them separately. Combining these two statements will make it very hard to read
There was a problem hiding this comment.
I still don't get when there will be {} and when there will be null (and why we only check for {} but not null in this function)
Is it because null is already handled in the validation? If so, I think you should leave this information in the comment.
There was a problem hiding this comment.
{} is considered as a special case, where null is a valid primitive value (like an int or a string). These are user-input values.
| m_utc_offset{utc_offset} {} | ||
|
|
||
| // Variables | ||
| std::shared_ptr<SchemaTree> m_schema_tree; |
There was a problem hiding this comment.
Techinically it is possible to modify the original schema tree using this pointer in the class, right?
Should we consider using std::shared_ptr<SchemaTree const> ?
| KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs | ||
| ) -> std::optional<std::errc> { | ||
| std::optional<std::errc> ret_val; | ||
| std::unordered_map<SchemaTreeNode::id_t, std::unordered_set<std::string_view>> |
There was a problem hiding this comment.
Let's move this into the try to reduce its scope.
| * @param node_id_value_pairs | ||
| * @return `std::nullopt` if the inputs are valid, or an error code indicating the failure: | ||
| * - std::errc::operation_not_permitted if a node ID doesn't represent a valid node in the | ||
| * schema tree, or a non-leaf node ID is paired with a value. |
There was a problem hiding this comment.
A non-leaf node paired with a value actually returns protocol_not_supported. That said, I think it should return operation_not_permitted since if it was permitted, node_id_value_pairs wouldn't be a tree.
| } | ||
| } | ||
|
|
||
| auto clone_value(Value const& value) -> Value { |
There was a problem hiding this comment.
Hm, I missed this in the previous PRs but it seems like we should just have copy constructors for Value and EncodedTextAst. It feels like an overoptimization to force people to never use copy construction unless they write a method manually like this.
There was a problem hiding this comment.
We can add a copy constructor for EncodedTextAst, however, I'd prefer to not have a copy constructor for Value to enforce move semantics. How about making this clone method as a member of Value, so that people can explicitly copy a value when they really need
There was a problem hiding this comment.
What's bad about copying a Value? Is it not similar to copying a string?
There was a problem hiding this comment.
Copying string and EncodedTextAst can be expensive in deserialization. With copy disabled, we are more confident that there's no copy overhead in deserialization.
There was a problem hiding this comment.
Adding move constructor for EncodedTextAst and Value
| } | ||
|
|
||
| { | ||
| // Empty invalid_node_id_value_pairs |
| } | ||
|
|
||
| // NOLINTNEXTLINE(readability-function-cognitive-complexity) | ||
| TEST_CASE("ffi_KeyValuePairLogEvent_create", "[ffi]") { |
There was a problem hiding this comment.
You can reduce the amount of code for this test by using SECTION and even nested SECTION macros. Recall that CMake runs every SECTION from the beginning of the test, so the schema tree, node_id_value_pairs would be reset for each outer SECTION.
E.g., instead of cloning valid_node_id_value_pairs, you could just create it outside a SECTION and use it inside different SECTIONs.
There was a problem hiding this comment.
I actually thought about using SECTION but I didn't. Two reasons:
- I don't want to rerun everything from the beginning
- I want to test that the same schema tree is used across different log events (which have different life cycles)
There was a problem hiding this comment.
- But by using sections, you'd only run the things outside the section(s) from the beginning, right?
- Testing the same instance of the schema tree across all of the tests seems a bit extra since it should be const in the class anyway (and if it wasn't const, you're still relying on finding a bug in the implementation with a set of test cases that isn't exhaustive). Imo, having less code in the test is more valuable than testing that corner case.
From my reading, you could do something like:
- Create schema tree
- Section (optional): Test empty id-value pairs
- Section (optional): Test mismatched types
- Section (optional): Test valid id-value pairs
- Section: Test duplicate key conflicts
- Section: Test implicit key conflicts
- Section (optional): Test out of bound node ID
That would allow you to at least get rid of the clone_node_id_value_pairs function (and make the code a bit easier to read).
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
| auto operator=(Value const&) -> Value& = delete; | ||
|
|
||
| // Default move constructor and assignment operator | ||
| // Default copy/move constructor and assignment operator |
There was a problem hiding this comment.
Since everything except the constructor is default, we don't need to say they're default explicitly, right?
| auto operator=(EncodedTextAst const&) -> EncodedTextAst& = delete; | ||
|
|
||
| // Default move constructor and assignment operator | ||
| // Default copy/move constructor and assignment operator |
| REQUIRE_FALSE(result.has_error()); | ||
|
|
||
| SECTION("Test duplicated key conflict on node #3") { | ||
| auto conflict_pairs{valid_node_id_value_pairs}; |
There was a problem hiding this comment.
I thought the idea of using nested sections (besides breaking the code up) is that you wouldn't need to duplicate valid_node_id_value_pairs since it would start with a fresh state in each section, no? (You'd obviously call it something other than valid_node_id_value_pairs, but still.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
This PR introduces a new class
KeyValuePairLogEvent. This class represents a key-value pair log event serialized by the key-value pair IR format. It consists of the following members:Value, where the id refers to the schema tree node id.An instance must be created using the factory function. The factory function will validate whether the value's type matches the corresponding key. The input of the factory function should come from the deserialization methods, which will be used in the coming PRs.
Validation performed