Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

When target path is empty, we can directly rename stagingDir to targetPath avoid to rename path one by one

Why are the changes needed?

Optimize file commit logic

Does this PR introduce any user-facing change?

User can set spark.sql.source.stagingDir to enable direct rename to targetPath instead of partition path one by one when targetPath is empty for dynamicPartitionOverWrite

How was this patch tested?

Added UT

@AngersZhuuuu
Copy link
Contributor Author

ping @cloud-fan

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47201/

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47202/

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47202/

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Test build #142699 has finished for PR 33811 at commit 2f36b1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 23, 2021

Test build #142700 has finished for PR 33811 at commit 1ed15ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

also ping @maropu @dongjoon-hyun @viirya

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @AngersZhuuuu .

.createOptional

val FILE_STAGING_DIR =
buildConf("spark.sql.source.stagingDir")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks orthogonal what your PR aims. Could you spin off this new configuration as a new PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do this first.

throw new IOException(s"Failed to rename $stagingPartPath to $finalPartPath when " +
val targetPath = new Path(path)
val pathExisted = fs.exists(targetPath)
if (!pathExisted || fs.listStatus(targetPath).isEmpty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test coverage for both fs.exist and fs.listStatus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have

.saveAsTable("t")
checkAnswer(sql("SELECT * FROM t"), df)
checkAnswer(sql("SELECT * FROM t WHERE p1 = 2"), Row(1, 2) :: Nil)
checkAnswer(sql("SELECT * FROM t WHERE p1 = 4"), Row(3, 4) :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test case fail before your PR? IIUC, this doesn't provide a test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test case fail before your PR? IIUC, this doesn't provide a test coverage.

Just to show in new way, it can write data as before?

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47246/

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Test build #142746 has finished for PR 33811 at commit 892b557.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

github-actions bot commented Dec 4, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 4, 2021
@github-actions github-actions bot closed this Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants