Q01. How will you create a Spark context? A01.
|
1 2 3 4 5 6 7 8 9 |
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("my spark job") spark.master('local[*]') spark.config('spark.jars.packages', 'com.amazonaws:aws-java-sdk:1.11.297,org.apache.hadoop:hadoop-aws:2.8.3,mysql:mysql-connector-java:5.1.46') .config('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', 2) .config('spark.speculation', 'false') |
Q02. How will you create a Dataframe by reading a file from AWS S3 bucket? A02.
|
1 2 3 4 5 6 |
csvFileAsDataframe = spark.read.format("com.databricks.spark.csv") \ .option("header", "false") \ .option("inferSchema", "true") \ .load(s3://my-bucket/some-path/input-file.csv) \ |
Q03. How will you create a Dataframe by reading a table in a…