[Bug] parseMessageMetadata error when broker entry metadata enable with high loading

### Search before asking

- [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.


### Read release policy

- [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.


### Version

- 3.2.2
- 3.1.2

### Minimal reproduce step

publish event in about 6k QPS and 100Mbits/sec
with metaData
BatcherBuilder.KEY_BASED mode
and producer and send message by high concurrent/parallel producer process.
it happens only in almost real time consumer (almost zero backlog)

### What did you expect to see?

no lost event

### What did you see instead?

could see error log in broker and show 
Failed to peek sticky key from the message metadata

it look like thread safe issue, because it happen randomly. 
in 1M events, it only happen few times but the consumer will lose few events



### Anything else?

the error similar to 
https://github.com/apache/pulsar/issues/10967 but I think it's different issue.

the data in bookkeeper is correct.
I can download the event from bookkeeper and parse it successfully.
or consume the same event few minutes later and it could consume successfully.
but all subscriptions will get the same error in the same event in real time consumer(zero backlog).


I have traced source code.
it happens in 
PersistentDispatcherMultipleConsumers.readEntriesComplete -> AbstractBaseDispatcher.filterEntriesForConsumer
 -> Commands.peekAndCopyMessageMetadata

and I also print the ByteBuf contents,
it's I could clearly see the data isn't the same in bookkeeper

in normal event , the hex code usually start by 010e (magicCrc32c)
```
0000000      010e    9529    5fbc    0000    0a03    3a0a    6e69    7267
```
in one of our error event, the bytebuf have about 48 bytes strange data, then continue with normal data

```
0000000      0000    a610    0000    0000    0200    7239    0000    0000 <== from here
     
0000020      0200    1339    0000    0000    ea17    a8b0    8b8e    fa5e
        
0000040      2af0    2675    f645    1623    d17e    dc34    526d    ef44 <=== until here is garbage
          
0000060      010e    9529    5fbc    0000    0a03    3a0a    6e69    7267 <== from here is normal data
```

this is just an example, but sometimes the first few bytes are correct and something wrong after few bytes later.


I am still trying to debug when and how the ByteBuf returns incorrect data, and why it only happens during stress testing. It is still not easy to reproduce using the perf tool, but we can 100% reproduce it in our producer code.

Does anyone have any idea what could be causing this issue, and any suggestions on which library or class may have potential issues? Additionally, any suggestions on how to debug this issue or if I need to print any specific information to help identify the root cause would be appreciated. Thank you.



### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] parseMessageMetadata error when broker entry metadata enable with high loading #22601

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] parseMessageMetadata error when broker entry metadata enable with high loading #22601

Description

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions