Conversation
*suppot post wqe and poll cq *use new abi *successful build, pending debug
NOTE: bnxt dont use __be*, no need for conversion. TODO: modify lkey & rkey from uint64_t to uint32_t
successfully implemented the following ops for bnxt IBGDA: write, write inline, atomic CAS and FA, poll CQ and ringDB
NOTE: mlx5 recv ops need write dbr
bnxt recv ops need ring doorbell(write header into dbr)
* TODO: confirm the latency value for multiple ring doorbell
NOTE: When multiple rapid writes to the doorbell will
trigger cqe coaleasec and only return the subsequent cqe.
use tools/bnxt_disable_cq_coal.sh to disable this func.
* still not sure how bnxt cqe behaves, so use the serial quiet for now
* poll cqe hang WA * fix internode segFault * reduce memory overuse * add ShmemQuietThread in dispatch --------- Co-authored-by: jhchouuu <jiahzhou@amd.com>
There was a problem hiding this comment.
Pull Request Overview
This pull request adds support for Broadcom BNXT RDMA adapters and modifies the SHMEM API to improve completion queue efficiency. The implementation unifies IBGDA operations between MLX5 and BNXT devices while introducing CQE signaling control options.
- Adds comprehensive BNXT RDMA support with device primitives, QP containers, and endpoint management
- Modifies SHMEM API to return one CQE per batch of WQE operations for improved efficiency
- Unifies IBGDA API design between BNXT and MLX5 with shared operation interfaces
Reviewed Changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Adds USE_BNXT build option and library detection logic |
| src/application/transport/rdma/providers/bnxt/ | Complete BNXT provider implementation with device primitives and host/device APIs |
| include/mori/core/transport/rdma/providers/bnxt/ | BNXT headers including HSI definitions and device primitive interfaces |
| src/application/transport/rdma/rdma.cpp | Integrates BNXT device factory and vendor ID support |
| include/mori/shmem/shmem_ibgda_kernels.hpp | Extensive SHMEM kernel modifications for provider dispatch and BNXT support |
| tests/cpp/io/test_engine.cpp | Fixes printf format specifier for 64-bit values |
Comments suppressed due to low confidence (1)
src/io/rdma/backend_impl.cpp:1
- The assert statement uses a string literal instead of a boolean expression. This will always evaluate to true and never trigger an assertion failure. Should be
assert(false && \"message\")or use a proper conditional.
// Copyright © Advanced Micro Devices, Inc. All rights reserved.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| RdmaContext::RdmaContext(RdmaBackendType backendType) : backendType(backendType) { | ||
| deviceList = ibv_get_device_list(nullptr); | ||
| deviceList = ibv_get_device_list(&nums_device); | ||
| Initialize(); | ||
| } |
There was a problem hiding this comment.
The variable nums_device is used but not declared in this scope. It appears to be a member variable that should be declared in the class definition.
|
|
||
| // TODO: write a better version | ||
| static __device__ __host__ void DumpWqe(void* wqeBaseAddr, uint32_t idx) { | ||
| // TODO: write a better verison |
There was a problem hiding this comment.
Typo in comment: 'verison' should be 'version'.
| // TODO: write a better verison | |
| // TODO: write a better version |
| constexpr int sendWqeSize = | ||
| sizeof(struct bnxt_re_bsqe) + sizeof(struct bnxt_re_send) + sizeof(struct bnxt_re_sge); | ||
| constexpr int slotsNum = CeilDiv(sendWqeSize, BNXT_RE_SLOT_SIZE); | ||
|
|
||
| int psnCnt = (bytes == 0) ? 1 : (bytes + mtuSize - 1) / mtuSize; |
There was a problem hiding this comment.
The sendWqeSize calculation is duplicated across multiple functions. Consider extracting this to a common constant or helper function to avoid code duplication.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Amazing PR! I've left several comments but overall, the code LGTM. Thanks for the hard work @jhchouuu |
Broadcom BNXT IBGDA Support
BNXT IBGDA functions are now supported.
Use
USE_BNXT=ONto enable BNXT features.Shmem API implementation modification
Shmem API now returns one CQE per batch of WQE.
IBGDA API Unification
BNXT and MLX5 share the same basic operation API.
Added
cqeSignaloption for CQE signaling control.