multiprocessing can influence the results
Hi,
I'm running on linux via python multiprocessing the esmini python interface like in your examples. But I get for different simulations sometimes same results. It happens not always, only sometimes, are you using anywhere global variables or environment variables in the code or some temporary files, that could be overwritten by a parallel esmini run? Hard to say how to debug this issue, because not happening each time and I'm doing in total 8 parallel runs on 8 CPUs and in total 18000 simulations running 8hrs.
I think maybe cannot solved that easy, I'm only publishing it here to have such things in mind and what can happen using esmini in parallel mode. When it is happening, I see it in my postprocessing, each postprocessing result like collision or not is a dot in the plot and if there are no smooth curves (blocks instead) to see represented by these dots, then I know that same results for different scenarios have been taken and must simply remove all results and restart. Finally, after the first, latest after second restart I have the right results, but this takes at least further 8 - 16hrs (clear: not looking at the results 24/7).
Whoever has a similar experience, please contribute, maybe at some point in future it can be solved or at least a non-random test/demonstration is there.
Kind regards,
Thaddäus
Thanks for posting this issue! This is very important since esmini should definitely work for parallel runs.
First, let's figure out a few more details:
- Each run has each dedicated log- and dat file? E.g. --log_path run102.log --record run102.dat
- Run with fixed timestep, e.g. --fixed_timestep 0.05?
Then, Do you run with or without graphics? If so, you could try the --SingleThreaded flag, to enforce all graphics tasks being run in sequence with the application.
With above settings I'd expect identical results between runs. Otherwise we'll investigate.
Hi,
- yes all log files and dat files are seperated, different names and directories
- yes fixed_timestep is used
- Running on linux on cent os 7 with minimalistic esmini without any graphics
Here the plots showing somehow the results, both runs are identical, esmini results here are used for the background colors (I know hard to see the grey colour of background, but if you look a bit concentrated, you see the blocks on the first plot and the smooth curve on second plot), only restarted with same number of parallel processes, nothing else changed and meanwhile hundret percent sure, that this is related to esmini and the false info is in the dat files already included, I had already in the past one issue in same direction, that the dat file has not contained the data which was expected if you also remember and could not reproduce the error/false dat file again.

Thanks for input. I forgot to ask what version of esmini you run? Either check a log file, or look in the esmini root folder, there should be a file named version.txt.
Hi, a bit older due to the needed alks reference driver model and the linux build I have the last git commit for you I've used as a starting point (only modified the reference driver model a bit, will be committed soon with newest updates) https://github.com/esmini/esmini/commit/5a0110bc0812edc291f30d9294c7b3dcbcc82e7e
I know there happened a lot of changes in the last 2 months in esmini (but I don't expect, that there were any changes in the direction of parallel runs, that could solve this issue), but for the next weeks not so easy for me to test parallel runs with newer version or to create a minimalistic parallel run example, demonstrating this issue, especially it happens randomly, so don't know if can be reproduced by a minimalistic example that easy, must somehow detect very similar results and automatically restart,if all results are different or the error has not occured with a maximal limit of restarts like try to reproduce this error 100 times. Think I will experiment with it next year, whenever I find time for it. I'm also interested to solve it, the process shall run automatically and hrs of simulation time shall not be lost for restarts, if it can be somehow prevented.
Thanks, it's perfectly OK with older version. I agree that probably no recent changes are related to this issue. But I'll use this commit anyway, to start with.
Just a control question: How is esmini launched, is it directly from command line or from a script. Or is it via the scenariogeneration framework?
The reason behind the question is this fix in scenariogeneration. Before the fix it did not respect fixed timestep setting, resulting in esmini choosing deltatime based on system time passed since last frame - which in some cases can lead to different end results, especially for scenarios involving conditions that are "close calls".
Hi,
the error occurs so rarely (one time occurt after more than 1000 simulations and around 10 restarts of the script) that it is in principle impossible to reproduce it so fast that there would be any chance for debugging. I've written a script (minimalistic example) that starts the multiprocessing several times in row, and creates the needed OpenSCENARIO files and is simulating all of them on 8 cores in parallel. That the error can happen better to see on this picture from the minimalistic example
compared to the regular one
Definitely interesting also the difference in ego velocity of 2 kph, even we always talk about the same OpenSCENARIO files, don't understand what is going on here.
Whoever has a lot of time and fun to debug or is interested in the parallel computing with python and esmini (a bit older version, for the newest version needs for sure some updates), can ask me for the code here (don't want to publish it)
Kind regards,
Thaddäus Menzel
Interesting. I hope someone will show interest.
Meanwhile, could you just briefly explain how esmini is launched? Is it from scenariogeneration or from a bash script or similar?
Hi,
you got a mail with the example code. But in principle for all, esmini is started via the python interface several times shown in the esmini hello world examples. One additional info: I've also started this multiprocessing in the past via eclipse/pydev, it could be also a case of eclipse with multiprocessing or all 3 tools in such a toolchain, but I've used eclipse/pydev several times in the past with multiprocessing and not seen such a behaviour before and I've also started very often the esmini sims directly via shell (due to expected better performance) and I'm pretty sure (not 100%, but ~90%), that this effect also appeared outside of eclipse.
Kind regards,
Thaddäus