"Hey, we need some kind of a REST API over all our data lakes to let analysts and other integrations query records on demand . Can we please get this done?" That was the use-case laid out that needed a solution. If you've had any experience with data lakes you know that they can be … Continue reading Architecture for a data lake REST API using Delta Lake, Fugue & Spark
Tag: spark
AWS Glue: Continuation for job JobBookmark does not exist
This will be a quick post but could not find much on this error, so figured I'd post it for others. {"service":"AWSGlue","statusCode":400,"errorCode":"EntityNotFoundException","requestId":"xxxxx","errorMessage":"Continuation for job JobBookmark for accountId=xxxxx, jobName=myjob, runId=jr_xxxxx does not exist. not found","type":"AwsServiceError"} Was recently working on a PySpark job in AWS Glue and was attempting to use the Job Bookmarks feature which lets … Continue reading AWS Glue: Continuation for job JobBookmark does not exist
Fix: HDP, YARN, Spark “check your cluster UI to ensure that workers are registered and have sufficient resources”
Are you trying to submit a Spark job over YARN on an HDP Hadoop cluster and encounter these kinds of errors? (below) If so just add the following 2 lines to your [spark-home]/conf/spark-defaults.conf file: ERRORS You will see the errors below, stem from the root issue that occurs on an Spark Executor node where … Continue reading Fix: HDP, YARN, Spark “check your cluster UI to ensure that workers are registered and have sufficient resources”