spark

Quick reference

Maintained by:
Apache Spark
Where to get help:
Apache Spark™ community

Supported tags and respective `Dockerfile` links

Quick reference (cont.)

Where to file issues:
https://issues.apache.org/jira/browse/SPARK
Supported architectures: (more info)
amd64, arm64v8
Published image artifact details:
repo-info repo's repos/spark/ directory (history)
(image metadata, transfer size, etc)
Image updates:
official-images repo's library/spark label
official-images repo's library/spark file (history)
Source of this description:
docs repo's spark/ directory (history)

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

scala> spark.range(1000 * 1000 * 1000).count()

Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

>>> spark.range(1000 * 1000 * 1000).count()

Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

docker run -it spark:r /opt/spark/bin/sparkR

Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

Name		Name	Last commit message	Last commit date
parent directory ..
README-short.txt		README-short.txt
README.md		README.md
content.md		content.md
get-help.md		get-help.md
github-repo		github-repo
issues.md		issues.md
license.md		license.md
logo.png		logo.png
maintainer.md		maintainer.md
metadata.json		metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Quick reference

Supported tags and respective `Dockerfile` links

Quick reference (cont.)

What is Apache Spark™?

Online Documentation

Interactive Scala Shell

Interactive Python Shell

Interactive R Shell

Running Spark on Kubernetes

Configuration and environment variables

License

FilesExpand file tree

spark

Directory actions

More options

Directory actions

More options

Latest commit

History

spark

Folders and files

parent directory

README.md

Quick reference

Supported tags and respective Dockerfile links

Quick reference (cont.)

What is Apache Spark™?

Online Documentation

Interactive Scala Shell

Interactive Python Shell

Interactive R Shell

Running Spark on Kubernetes

Configuration and environment variables

License

Supported tags and respective `Dockerfile` links