From @marclop 's findings:
As the modelindexer processes the batches it receives, it compresses each event by default before they are stored in the bulk indexer cache. This is done with the activeMu lock held and caused the entire processing pipeline to stop until a single event is compressed. If we can defer the compression for later and even better remove the activeMu lock to eliminate its contention, the processing can progress much faster.
Since the modelindexer activeMu lock seems to be the offending component causing a significant bottleneck and reduced throughput, we benchmarked a modified version of APM Server where the modelindexer activeMu lock is eliminated and a new channel inside the modelindexer is introduced to decouple the cache writing from the HTTP request lifecycle. Also, since we have a queue where the BulkRequestItems are sent before they are compressed, we can scale the compression and flushing of the bulk indexers to increase the number of "active" consumers from the queue. See for more details on the design.
Implement a solution where the event compression is not causing lock contention so that the APM Server can better make use of available CPU resources.
From @marclop 's findings:
Implement a solution where the event compression is not causing lock contention so that the APM Server can better make use of available CPU resources.