Minimal python requirements.txt
Is it possible to provide a minimal set of python requirements.txt for the python container required to run on databricks-runtime with support for necessary functions (i.e. cell magics)?
I note that this was available prior to commit 832dc53, where a condense list of python requirements was provided. I believe this was valid of 14.3 LTS. However, this was changed for the 15.4 LTS release.
https://github.com/databricks/containers/blob/c302a4cf389da19d1530a559ffa3a2d2c3258a36/ubuntu/python/requirements.txt#L1-L15
A similar list is currently still available for the GPU container. Is it still valid? I suppose I should update the versions to the ones in 15.4 LTS and I should expect to work fine?
https://github.com/databricks/containers/blob/9a76062ee8c55325a63302c43c9e336f596e45a6/ubuntu/gpu/cuda-11.8/venv/requirements.txt#L1-L15
In comparison, the requirements.txt in the python container is the full list replicating databricks-runtime 15.4 LTS.
https://github.com/databricks/containers/blob/9a76062ee8c55325a63302c43c9e336f596e45a6/ubuntu/python/requirements.txt#L1-L134
To reiterate my main point, is there a minimal requirements.txt for the python container? Will you also provide the same for the lsp-requirements.txt. It also seems to have expanded quite a bit for the 15.4 LTS release. This would help me build a small container image for my team as the current one is >10GB uncompressed.
Based on my trial and error, these are the minimal requirements.txt to use databricks containers.
black[jupyter] # notebook formatting
databricks-sdk
ipykernel
matplotlib
pyspark[connect]
pyccolo
The following are implicitly installed as they are required by other libraries above.
grpcio
grpcio-status
ipython
jedi
numpy
pandas
pyarrow
six
The following seems not required.
jinja2
python-lsp-jsonrpc
lsp-requirements.txt seems completely not required for now.
black[jupyter] is required for notebook formatting
black[jupyter] # notebook formatting
databricks-sdk
ipykernel
matplotlib
pyspark[connect]
pyccolo
I have created minimal container images for Databricks Container Service in this repo. https://github.com/yxtay/databricks-container