Pre-compile all python code and remove original .py files.#476
Pre-compile all python code and remove original .py files.#476purple4reina merged 10 commits intomainfrom
Conversation
ErrorsWe can see what is lost because we've removed the Chances are errors are not going to be caused by our code and thus would just be a distraction to helping the user find the real origin of their error. Also, if they really want to see the line of code, it's all open sourced. RawPretty |
Dockerfile
Outdated
| RUN rm ./python/lib/$runtime/site-packages/ddtrace/internal/datadog/profiling/ddup/_ddup.*.so | ||
| RUN rm ./python/lib/$runtime/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.*.so | ||
|
|
||
| RUN PYTHONNODEBUGRANGES=1 python -m compileall -o 2 -b ./python/lib/$runtime/site-packages |
There was a problem hiding this comment.
Could we add comments just to remind what's happening here?
duncanista
left a comment
There was a problem hiding this comment.
LGTM – would like to see this in self monitoring for a while before we release, it might be amazing to see this changes go there!
8a73394 to
df7bad0
Compare
| MAX_LAYER_COMPRESSED_SIZE_KB=$(expr 4 \* 1024) | ||
| MAX_LAYER_UNCOMPRESSED_SIZE_KB=$(expr 13 \* 1024) | ||
| MAX_LAYER_COMPRESSED_SIZE_KB=$(expr 5 \* 1024) | ||
| MAX_LAYER_UNCOMPRESSED_SIZE_KB=$(expr 12 \* 1024) |
There was a problem hiding this comment.
Bytecode is harder to compress (hence increase from 4->5) but uncompressed is much smaller (hence decrease from 13->12)
There was a problem hiding this comment.
Sounds good! Could we update this changes in serverless-tools once this is merged? Thanks!
583f37e to
582e612
Compare
60a64f5 to
43947e9
Compare




What does this PR do?
Ships python layer with only pre-built bytecode binaries and none of the original
*.pyfiles.Motivation
For python 3.12, reduces cold start by >24%. Also reduces memory overhead and runtime duration by small but statistically significant amounts.
Similar overhead improvements are seen with python 3.9.
Additional Notes
Impact of prebuilding bytecode binaries
When the python interpreter starts up, it always first looks for any
*.pycfiles and if found, loads those instead of the*.pyfiles. If it is loading any*.pyfiles, it will first compile those into bytecode and save to disk so that the next call is faster. See flow chart:Note that these
*.pycfiles are not actually "compiled" code, because python is still an interpreted language. They are also not platform dependent, meaning we can build these bytecode files anywhere and expect them to work everywhere.These bytecode files do not change the running/execution of the python code. The code will still run exactly the same and at the same speed. What changes instead is the time it takes to load the instructions, prior to them being executed.
Impact of removing
*.pyfilesIn addition to pre-compiling all
*.pycfiles, this PR also removes all*.pyfiles. As can be seen from flow chart and discussion above, this will make no change to the execution of the code.What is potentially lost is the ability for the interpreter to add proper line numbers and code samples to tracebacks. TODO: confirm this and add example here.
Impact of compiling at "optimization level" 2
In addition to pre-compiling bytecode files, this PR also builds those files using "optimization level" 2 by including
-o 2in the commandline args.From the python docs
Assert statements. On April 18, 2024 I attended the python guild meeting and discussed this change with the team. They said that ddtrace should not require execution of
assertstatements in order to run properly. If there is a failure due to the remove of theassertstatements, then this is considered a bug and should be filed with them directly. I ran several of the ddtrace unittest suites usingpython -OO -m pytestand found that the same tests failed using this as did when I ran the tests usingpython -m pytest.__debug__. The-o 2flag also removes all code conditional on__debug__. I did a grep of ddtrace and found no instances of this variable in the code base. I also did a grep for it in all repos I currently have checked out on my laptop and did not find any instances which would affect us.Docstrings. It is safe to remove the docstrings. The only time the customer would care to use them is in a python shell where they can run
help(my_func)which will then print the docstring to the screen. This is clearly not necessary for the running of any lambda function. Removing them also reduces the size of the layer binary by a noticeable amount.Impact of compiling with env var
PYTHONNODEBUGRANGES=1This env var is only available in python 3.11 and newer. See python docs:
Adding this env var leads to a noticeable reduction in cold start time (approx 0.5-1%) for python 3.12. I believe what is lost by including this is the column information displayed in any traceback. As these are not required, the trade off here (debug info for cold start improvement) I believe is worth it.
Testing Guidelines
Types of Changes
Check all that apply
https://datadoghq.atlassian.net/browse/SVLS-4713