Stop runner from hanging indefinitely within ubuntu docker images [elastic/beats#29681]#441
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
aafdb49 to
df88c62
Compare
9e284c2 to
5c3eb68
Compare
vigneshshanmugam
left a comment
There was a problem hiding this comment.
Discussed offline with @lucasfcosta, We are going to do following things
- Check if disable-gpu can work on both images.
- if yes -> do we see trace numbers differentiate with and without the flag.
- Do this only for headless mode.
a133c5d to
605e5b9
Compare
|
@vigneshshanmugam I have now:
Here are the results I've obtained for GPU enabled and disabled settings: with-gpu-enabled.json with-gpu-disabled.json The experience summary in a table:
|
__tests__/core/gatherer.test.ts
Outdated
| (wsEndpoint ? it : it.skip)( | ||
| 'does not the disable-gpu flag to start browser when running headful', | ||
| async () => { | ||
| if (!wsEndpoint) return; |
There was a problem hiding this comment.
Having this should be enough to skip the tests right? Do we need it or it.skip?
There was a problem hiding this comment.
True, it should be enough, it can be removed.
vigneshshanmugam
left a comment
There was a problem hiding this comment.
Thanks for the detailed testing on the images and the detailed information 🎉
605e5b9 to
6cfcecf
Compare
|
As mentioned by @vigneshshanmugam we'll drop further analysis on these given there's not much variation between runs and that they don't make much sense in headless mode:
Will merge as soon as CI passes and release. |
|
@vigneshshanmugam just before we push this forward, I did a rerun with enabled/disabled settings and I got bigger differences this time for 10 runs, it may indicate my last run didn't actually rebuild the The difference is still small, but I wouldn't say it's insignificant, but I wanted to confirm with you that you're still okay with releasing. (CC @andrewvc) Results are in MS when applicable (therefore not for Below you can see the testing procedure to obtain the results above Testing procedure
Here's the journey being ran by const { step, journey } = require('@elastic/synthetics');
journey('my cool journey', ({ page }) => {
step('visit website', async () => {
await page.goto('https://www.blueyard.com/');
});
}); |
|
@vigneshshanmugam @andrewvc and I have discussed the differences in performance in the comment above. ⭐ We've decided to release the package as is. We've also reached the following conclusions:
|
|
As a side note, I've just ran the same procedure outlined in this comment to benchmark journeys with and without GPU acceleration within a Docker container. For that, I've used Heartbeat's container with a locally built version of the runner. The results I got this time demonstrate that there's no significant difference between the 10 averaged runs for journeys with and without GPU acceleration. |



Summary
This PR contributes to elastic/beats#29681 as it makes the
syntheticsrunner work with theubuntuimages we will use.After this PR, the Heartbeat container should be able to execute the
syntheticsrunner.The problem in detail
The Heartbeat container wasn't able to run
syntheticsto completion, as the runner would hang indefinitely.It couldn't do so because it didn't have the necessary graphics library (
mesagl) available to make hardware acceleration work (more specifically, it seems Chromium usesosmesafor off-screen rendering) andnewPagenot to hang if called too soon.When that library wasn't available, and hardware acceleration was turned on, the creation of a new page would hang indefinitely if we immediately tried to create a new page. Just adding an
await new Promise(resolve => setTimeout(resolve, 5000))statement after the browser launch did make journeys work given it causednewPageto return (instead of hanging).Therefore, I have disabled hardware acceleration by launching Chromium with the
--disable-gpuflag, which makes it work fine in our Ubuntu image. I decided to use that flag rather than changing the base image because that's the exact fix that was applied in Playwright's master branch.How to test this PR
cwdmust be thex-pack/heartbeatfolder) usingenv PLATFORMS="+all linux/amd64" mage packageentrypointwith a script similar to the one belowsyntheticsmodule and rundist/cli.jsornpm linkit, and then runheartbeat, which will cause it to use the locally linked module.The process is the same for
elastic-agent, just make sure you change the image name toelastic-agent-complete.How I debugged it (in case it's helpful for others in a next opportunity):
dist/cli.jswithnode(whenever you change a file, make sure tonpm run build)9229port through the docker arg-p 9229:9229so that you can connect to the Node debugger inchrome://inspectfromlocalhostwhen runningnode --inspect-brk=0.0.0.0 dist/cli.js my_synthetic_path(this is how I figured out that just waiting would not make the process hang, because as I stepped through lines that was enough time for the browser to init itself)