Skip to content

feat(glue-alpha): include extra jars parameter in pyspark jobs#33238

Merged
mergify[bot] merged 3 commits intoaws:mainfrom
gontzalm:include-extra-jars
Feb 19, 2025
Merged

feat(glue-alpha): include extra jars parameter in pyspark jobs#33238
mergify[bot] merged 3 commits intoaws:mainfrom
gontzalm:include-extra-jars

Conversation

@gontzalm
Copy link
Contributor

Issue # (if applicable)

Closes #33225.

Reason for this change

PySpark jobs with extra JAR dependencies cannot be defined with the new L2 constructs introduced in v2.177.0.

Description of changes

Add the extraJars parameter in the PySpark job L2 constructs.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK feature-request A feature should be added or improved. p2 labels Jan 30, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team January 30, 2025 14:59
@gontzalm
Copy link
Contributor Author

Exemption Request: no changes in README or integration tests needed.

@aws-cdk-automation aws-cdk-automation added pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. labels Jan 30, 2025
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@natalie-white-aws
Copy link
Contributor

One of the authors of the new L2 here - We talked about this during RFC and implementation phases as a potential anti-pattern. Can you share why you need extra jars for a python job?

@gontzalm
Copy link
Contributor Author

Hi Natalie, we need to use the spark-xml package in order to read XML files in Spark v3 (as you probably know, this package will be included in Spark v4). This package must be provided via the extraJars parameter, because AWS Glue does not accept installing packages via --conf spark.jars.packages=<maven coordinates>.

@natalie-white-aws
Copy link
Contributor

Thanks for the extra clarification. Let me get with the Glue service team; it sounds like this may be more of a Glue feature request than something we should work around in the L2 construct. Stay tuned.

@humanzz
Copy link
Contributor

humanzz commented Feb 9, 2025

+1 to this. Some libraries that provide additional spark capabilities require a jar, even if one is actually using spark via python (pyspark).

Here's a chatgpt-generated list of examples https://chatgpt.com/share/67a8e12d-ccd8-800e-a641-75e58db91d7b

@natalie-white-aws
Copy link
Contributor

natalie-white-aws commented Feb 9, 2025

We had some internal discussions and (in addition to the data here) decided this is a valid use case. But we should add them to all 3 PySpark job types.

@GavinZZ
Copy link
Member

GavinZZ commented Feb 14, 2025

@gontzalm Would you be able to add this change to all 3 pyspark job types?

@humanzz
Copy link
Contributor

humanzz commented Feb 18, 2025

not from CDK team, but with the discussion I started in #33356, I would suggest also supporting extraJarsFirst prop for setting --user-jars-first as both the --extra-jars and --user-jars-first tend to go together.

GavinZZ
GavinZZ previously approved these changes Feb 19, 2025
Copy link
Member

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@GavinZZ
Copy link
Member

GavinZZ commented Feb 19, 2025

@humanzz I think that's a good point, but I don't think it's worth blocking to merge this PR. If anyone is interested in contributing extraJarsFirst feature, feel free to tag me for a review!

@GavinZZ GavinZZ added pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exempt-readme The PR linter will not require README changes labels Feb 19, 2025
@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Feb 19, 2025
@aws-cdk-automation aws-cdk-automation dismissed their stale review February 19, 2025 19:14

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@mergify
Copy link
Contributor

mergify bot commented Feb 19, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@codecov
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.16%. Comparing base (6f1aa80) to head (2bbebe1).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #33238   +/-   ##
=======================================
  Coverage   82.16%   82.16%           
=======================================
  Files         119      119           
  Lines        6857     6857           
  Branches     1157     1157           
=======================================
  Hits         5634     5634           
  Misses       1120     1120           
  Partials      103      103           
Flag Coverage Δ
suite.unit 82.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
packages/aws-cdk ∅ <ø> (∅)
packages/aws-cdk-lib/core 82.16% <ø> (ø)

@mergify
Copy link
Contributor

mergify bot commented Feb 19, 2025

This pull request has been removed from the queue for the following reason: pull request branch update failed.

The pull request can't be updated

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@aaythapa
Copy link
Contributor

@mergify update

@mergify
Copy link
Contributor

mergify bot commented Feb 19, 2025

update

❌ Mergify doesn't have permission to update

Details

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/codecov.yml without workflows permission

@mergify mergify bot dismissed GavinZZ’s stale review February 19, 2025 21:42

Pull request has been modified.

@mergify
Copy link
Contributor

mergify bot commented Feb 19, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 2bbebe1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Feb 19, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit be3bce3 into aws:main Feb 19, 2025
20 checks passed
@github-actions
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK feature-request A feature should be added or improved. p2 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exempt-readme The PR linter will not require README changes pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws-glue-alpha: extra_jars parameter in PySparkEtlJob

6 participants