containers icon indicating copy to clipboard operation
containers copied to clipboard

Minimal python requirements.txt

Open yxtay opened this issue 1 year ago • 3 comments

Is it possible to provide a minimal set of python requirements.txt for the python container required to run on databricks-runtime with support for necessary functions (i.e. cell magics)?

I note that this was available prior to commit 832dc53, where a condense list of python requirements was provided. I believe this was valid of 14.3 LTS. However, this was changed for the 15.4 LTS release.

https://github.com/databricks/containers/blob/c302a4cf389da19d1530a559ffa3a2d2c3258a36/ubuntu/python/requirements.txt#L1-L15

A similar list is currently still available for the GPU container. Is it still valid? I suppose I should update the versions to the ones in 15.4 LTS and I should expect to work fine?

https://github.com/databricks/containers/blob/9a76062ee8c55325a63302c43c9e336f596e45a6/ubuntu/gpu/cuda-11.8/venv/requirements.txt#L1-L15

In comparison, the requirements.txt in the python container is the full list replicating databricks-runtime 15.4 LTS.

https://github.com/databricks/containers/blob/9a76062ee8c55325a63302c43c9e336f596e45a6/ubuntu/python/requirements.txt#L1-L134

To reiterate my main point, is there a minimal requirements.txt for the python container? Will you also provide the same for the lsp-requirements.txt. It also seems to have expanded quite a bit for the 15.4 LTS release. This would help me build a small container image for my team as the current one is >10GB uncompressed.

yxtay avatar Dec 11 '24 04:12 yxtay

Based on my trial and error, these are the minimal requirements.txt to use databricks containers.

black[jupyter] # notebook formatting
databricks-sdk
ipykernel
matplotlib
pyspark[connect]
pyccolo

The following are implicitly installed as they are required by other libraries above.

grpcio
grpcio-status
ipython
jedi
numpy
pandas
pyarrow
six

The following seems not required.

jinja2
python-lsp-jsonrpc

lsp-requirements.txt seems completely not required for now.

yxtay avatar Dec 12 '24 03:12 yxtay

black[jupyter] is required for notebook formatting

black[jupyter] # notebook formatting
databricks-sdk
ipykernel
matplotlib
pyspark[connect]
pyccolo

yxtay avatar Dec 16 '24 06:12 yxtay

I have created minimal container images for Databricks Container Service in this repo. https://github.com/yxtay/databricks-container

yxtay avatar May 30 '25 14:05 yxtay