Skip to content

Use TCP to send AUX#2

Merged
maning00 merged 3 commits intov0.5.5.post2.mori-iofrom
debug
Nov 27, 2025
Merged

Use TCP to send AUX#2
maning00 merged 3 commits intov0.5.5.post2.mori-iofrom
debug

Conversation

@maning00
Copy link
Copy Markdown
Owner

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @maning00, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the auxiliary data transfer mechanism within the system, transitioning from RDMA to TCP for improved flexibility or compatibility. It introduces a dedicated codec for efficient data handling and updates the core KV manager to support the new TCP-based communication flow for auxiliary data.

Highlights

  • Protocol Change for AUX Data: The mechanism for transferring auxiliary data (AUX) has been switched from RDMA to TCP, indicating a shift in the underlying communication protocol for this data type.
  • New Data Codec for Serialization: A new AuxDataCodec class has been introduced to handle the serialization and deserialization of auxiliary data buffers, specifically designed for efficient TCP transmission.
  • MoriKVManager Updates for TCP Integration: The MoriKVManager has been significantly updated to integrate TCP-based sending and receiving of AUX data, including a new _handle_aux_data method for processing incoming TCP messages and refactoring send_aux to use the new TCP logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@maning00 maning00 merged commit 544c10e into v0.5.5.post2.mori-io Nov 27, 2025
1 check failed
@maning00 maning00 deleted the debug branch November 27, 2025 06:15
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the auxiliary data transfer to use TCP via ZMQ instead of RDMA. This involves adding an AuxDataCodec for data serialization and updating the MoriKVManager to handle TCP-based communication for auxiliary data. The overall logic seems sound. I've provided a high-severity comment regarding a potential memory corruption vulnerability due to a missing size check, and a medium-severity comment to improve maintainability by replacing magic numbers with named constants.

Comment on lines +163 to +168
dst_aux_ptr = kv_args.aux_data_ptrs[buffer_index]
item_len = kv_args.aux_item_lens[buffer_index]
dst_addr = dst_aux_ptr + item_len * aux_index
buffer = (ctypes.c_byte * len(data)).from_address(dst_addr)
buffer[:] = data
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation does not validate the size of the received data against the allocated buffer size (item_len). If len(data) is greater than item_len, this could lead to a buffer overflow, writing past the intended memory region for the given aux_index. This can cause memory corruption.

To prevent this, a check should be added to ensure the received data size does not exceed the allocated item length.

Suggested change
dst_aux_ptr = kv_args.aux_data_ptrs[buffer_index]
item_len = kv_args.aux_item_lens[buffer_index]
dst_addr = dst_aux_ptr + item_len * aux_index
buffer = (ctypes.c_byte * len(data)).from_address(dst_addr)
buffer[:] = data
return
dst_aux_ptr = kv_args.aux_data_ptrs[buffer_index]
item_len = kv_args.aux_item_lens[buffer_index]
if len(data) > item_len:
logger.error(
f"Received aux data size ({len(data)}) is larger than "
f"the allocated item length ({item_len})."
)
return
dst_addr = dst_aux_ptr + item_len * aux_index
buffer = (ctypes.c_byte * len(data)).from_address(dst_addr)
buffer[:] = data

Comment on lines +650 to +654
room = int(msg[1].decode("ascii"))
buffer_index = int(msg[2].decode("ascii"))
aux_index = int(msg[3].decode("ascii"))
data_length = struct.unpack(">I", msg[4])[0]
data = msg[5]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indices for accessing parts of the msg list are hardcoded as magic numbers (e.g., msg[1], msg[2]). This makes the code harder to read and maintain. If the message format changes, these indices would need to be updated in multiple places, which is error-prone.

It's recommended to define these indices as constants within the MoriKVManager class. This would make the code more self-documenting and easier to refactor.

For example, you could add these constants to MoriKVManager (around line 172):

class MoriKVManager(CommonKVManager):
    AUX_DATA_HEADER = b"AUX_DATA"
    _AUX_MSG_IDX_ROOM = 1
    _AUX_MSG_IDX_BUFFER_IDX = 2
    _AUX_MSG_IDX_AUX_IDX = 3
    _AUX_MSG_IDX_LEN = 4
    _AUX_MSG_IDX_DATA = 5

And then use them here and in send_aux_data_to_endpoint.

Suggested change
room = int(msg[1].decode("ascii"))
buffer_index = int(msg[2].decode("ascii"))
aux_index = int(msg[3].decode("ascii"))
data_length = struct.unpack(">I", msg[4])[0]
data = msg[5]
room = int(msg[self._AUX_MSG_IDX_ROOM].decode("ascii"))
buffer_index = int(msg[self._AUX_MSG_IDX_BUFFER_IDX].decode("ascii"))
aux_index = int(msg[self._AUX_MSG_IDX_AUX_IDX].decode("ascii"))
data_length = struct.unpack(">I", msg[self._AUX_MSG_IDX_LEN])[0]
data = msg[self._AUX_MSG_IDX_DATA]

maning00 added a commit that referenced this pull request Dec 3, 2025
* add disable notif

* send aux with tcp

* remove unused log

---------

Co-authored-by: cwortman-amd <cwortman@amd.com>
maning00 added a commit that referenced this pull request Dec 3, 2025
* add disable notif

* send aux with tcp

* remove unused log

---------

Co-authored-by: cwortman-amd <cwortman@amd.com>
maning00 added a commit that referenced this pull request Dec 23, 2025
* add disable notif

* send aux with tcp

* remove unused log

---------

Co-authored-by: cwortman-amd <cwortman@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants