Skip to content

[VL] Add uniffle integration#3767

Merged
jackylee-ch merged 4 commits intoapache:mainfrom
summaryzb:main
Apr 11, 2024
Merged

[VL] Add uniffle integration#3767
jackylee-ch merged 4 commits intoapache:mainfrom
summaryzb:main

Conversation

@summaryzb
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add uniffle integration

How was this patch tested?

Integration test with uniffle manually
When uniffle 0.8.0 release in maven, add test in gluten action flow

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

2 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

// rename CelebornClient RssClient
std::shared_ptr<CelebornClient> celebornClient =
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod);
partitionWriterCreator = std::make_shared<CelebornPartitionWriterCreator>(std::move(celebornClient));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you explain a bit why Celeborn's code is reused here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CelebornClient wraps in the partitionWriterCreator mainly do the work related to take out partition data, when evictPartitions is called. That will be all the same with different rss framework

@zhztheplayer
Copy link
Copy Markdown
Member

Integration test with uniffle manually

gluten-celeborn has a GHA action job here. Is it possible for gluten-uniffle to add a similar one? Another option is to add some UTs, at least cover the newly added code.

@summaryzb
Copy link
Copy Markdown
Contributor Author

I'll look into the GHA action, add one in this patch later

@rhh777
Copy link
Copy Markdown

rhh777 commented Jan 10, 2024

@summaryzb The org.apache.spark.shuffle.UniffleColumnarBatchSerializer class is missing in the commit?

@zhztheplayer
Copy link
Copy Markdown
Member

UniffleColumnarBatchSerializer

Is it possible to reuse ColumnarBatchSerializer somehow ?

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@summaryzb
Copy link
Copy Markdown
Contributor Author

summaryzb commented Jan 11, 2024

Is it possible to reuse ColumnarBatchSerializer somehow ?

Yes, it's almost the same. Change it to reuse ColumnarBatchSerializer

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

2 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Copy Markdown
Member

@PZD-CHINA @philo-he
we may need to also add Uniffle installation in docker, similar as Celeborn
https://github.com/oap-project/gluten/blob/main/tools/gluten-te/ubuntu/dockerfile-buildenv#L118

@philo-he
Copy link
Copy Markdown
Member

@PZD-CHINA @philo-he we may need to also add Uniffle installation in docker, similar as Celeborn https://github.com/oap-project/gluten/blob/main/tools/gluten-te/ubuntu/dockerfile-buildenv#L118

@summaryzb, suppose uniffle-0.8.0 should be pre-installed in CI docker. Please let us know if there is any other dependency required. Thanks!

@weiting-chen weiting-chen mentioned this pull request Feb 21, 2024
21 tasks
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

@summaryzb summaryzb force-pushed the main branch 2 times, most recently from a1a5e24 to e5310aa Compare April 3, 2024 09:20
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

@summaryzb
Copy link
Copy Markdown
Contributor Author

@philo-he PTAL

@summaryzb
Copy link
Copy Markdown
Contributor Author

Uniffle run tpcds with only one worker may encounter some stability issues occasionally,so this integration test only run with tpch

@summaryzb
Copy link
Copy Markdown
Contributor Author

@jackylee-ch PTAL

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Please also respond to Hongze's comments. Thanks!
#3767 (comment)

// rename CelebornClient RssClient
std::shared_ptr<CelebornClient> uniffleClient =
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod);
partitionWriter = std::make_unique<CelebornPartitionWriter>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the above comment, better to use a common name if they indeed can be shared by both celeborn & uniffle.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a common name relate to all the logic of rss in jni layer, but not limit to this PartitionWriter, maybe it's better to resolve this in a separate pr

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@summaryzb, ok to me. Please create an issue to track this. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow this

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

@summaryzb
Copy link
Copy Markdown
Contributor Author

@philo-he PTAL

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer, do you have any other comment?

sparkConf.getSizeAsBytes(
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.key(),
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.defaultValue().get());
compressionCodec = GlutenShuffleUtils.getCompressionCodec(sparkConf);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note celeborn honors Spark's config for SHUFFLE_COMPRESS. Should uniffle do the same check? See e182d65.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Copy Markdown
Member

@zhztheplayer, do you have any other comment?

I'm good if we have tests covering the feature. Please proceed if it's ready to merge.

Thanks @summaryzb for this work!

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a follow-up issue #5335 to solve the conflicting code, there is no problem on my side. Thanks for your work.@summaryzb

@jackylee-ch
Copy link
Copy Markdown
Contributor

This pr is ready to be merged on my side. Would leave some time to see if there are any more suggestions.

@jackylee-ch jackylee-ch merged commit b9408f9 into apache:main Apr 11, 2024
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@summaryzb Thank you for working on uniffle integration and I just left some late minor review comments.

run: |
export MAVEN_HOME=/usr/lib/maven && \
export PATH=${PATH}:${MAVEN_HOME}/bin && \
export export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk && \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export export -> export

tar xzf apache-uniffle-0.8.0-incubating-bin.tar.gz -C /opt/ && mv /opt/rss-0.8.0-hadoop2.8 /opt/uniffle && \
wget -nv https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz && \
tar xzf hadoop-2.8.5.tar.gz -C /opt/
rm -f /opt/uniffle/jars/server/shuffle-server-0.8.0-SNAPSHOT.jar
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to remove shuffle-server-0.8.0.jar instead of shuffle-server-0.8.0-SNAPSHOT.jar, was this a typo or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants