Skip to content

[SPARK-17875] [BUILD] Remove unneeded direct dependence on Netty 3.x#15436

Closed
srowen wants to merge 1 commit intoapache:masterfrom
srowen:SPARK-17875
Closed

[SPARK-17875] [BUILD] Remove unneeded direct dependence on Netty 3.x#15436
srowen wants to merge 1 commit intoapache:masterfrom
srowen:SPARK-17875

Conversation

@srowen
Copy link
Copy Markdown
Member

@srowen srowen commented Oct 11, 2016

What changes were proposed in this pull request?

Remove unneeded direct dependency on Netty 3.x. I left the dependencyManagement entry because some Hadoop libs still use an older version of Netty 3, and I thought it would be weird if the transitive version we reference went backwards. (Note too that Flume declares a direct separate dependency in test scope on Netty 3.4.x)

How was this patch tested?

Existing tests

Copy link
Copy Markdown
Member Author

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTICE Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license changes are overdue update of the Netty 4.x license from 3.x's version

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite figure this out: Hadoop 2.2 and 2.6-2.7 do transitively depend on Netty 3.6.x. 2.3 and 2.4 do not. shrug

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think netty 3 is used by hadoop-nfs: https://issues.apache.org/jira/browse/HADOOP-12415

However, I don't know why the patch for HADOOP-12415 also added netty 3 to hadoop-hdfs...

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 11, 2016

Test build #66752 has finished for PR 15436 at commit a5c5c31.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Copy Markdown
Member

zsxwing commented Oct 12, 2016

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 12, 2016

Test build #66779 has finished for PR 15436 at commit a5c5c31.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 12, 2016

Test build #3326 has finished for PR 15436 at commit a5c5c31.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 12, 2016

Test build #66813 has finished for PR 15436 at commit 84bdf1b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 12, 2016

Test build #66826 has finished for PR 15436 at commit ecc241e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@EncodePanda
Copy link
Copy Markdown

My question would be: what about updating netty 4.x version as well? Right now it's 4.0.29.Final if I recall correctly, but we could update it to 4.1.3.Final

@srowen
Copy link
Copy Markdown
Member Author

srowen commented Oct 13, 2016

@rabbitonweb We're on 4.0.41 already. 4.1 won't work; see https://issues.apache.org/jira/browse/SPARK-17379

@srowen
Copy link
Copy Markdown
Member Author

srowen commented Oct 13, 2016

@zsxwing I'm getting failures like

ERROR
test_flume_polling_multiple_hosts (pyspark.streaming.tests.FlumePollingStreamTests) ... Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/streaming/tests.py", line 1367, in _testMultipleTimes
    f()
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/streaming/tests.py", line 1387, in _testFlumePollingMultipleHosts
    port = self._utils.startSingleSink()
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o24677.startSingleSink.
: java.lang.NoClassDefFoundError: org/jboss/netty/channel/ChannelPipelineFactory
    at org.apache.spark.streaming.flume.sink.SparkSink.start(SparkSink.scala:90)
    at org.apache.spark.streaming.flume.PollingFlumeTestUtils.startSingleSink(PollingFlumeTestUtils.scala:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav

I wonder if you encountered this and this is why you had to add Netty 3 back into the overall assembly? I'm still debugging why it isn't finding this from the flume assembly.

@srowen
Copy link
Copy Markdown
Member Author

srowen commented Oct 13, 2016

Well, I'm sure the flume assembly has the classes in question here, and I'm sure it's being added to --jars when pyspark is run for the tests. I'm still trying to figure out what's wrong here.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 15, 2016

Test build #67015 has finished for PR 15436 at commit f49f6a6.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 16, 2016

Test build #3344 has finished for PR 15436 at commit f49f6a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 16, 2016

Test build #3345 has finished for PR 15436 at commit f49f6a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 16, 2016

Test build #3346 has started for PR 15436 at commit f49f6a6.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 17, 2016

Test build #3355 has started for PR 15436 at commit f49f6a6.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 18, 2016

Test build #3360 has started for PR 15436 at commit f49f6a6.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 19, 2016

Test build #3362 has finished for PR 15436 at commit f49f6a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 19, 2016

Test build #3363 has started for PR 15436 at commit f49f6a6.

@srowen
Copy link
Copy Markdown
Member Author

srowen commented Oct 20, 2016

Well, heck. This all works fine except with Python 3.x tests. I should note that it works for me on Python 2 and 3 on Ubuntu 16. I don't know why yet. Still debugging.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 20, 2016

Test build #67255 has finished for PR 15436 at commit ad88597.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 21, 2016

Test build #67337 has finished for PR 15436 at commit d4bb9e4.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Copy Markdown
Member Author

srowen commented Oct 24, 2016

@zsxwing I may have to shelve this; I can't figure this out. It passes locally with Python2/3 but always times out with Python3 on Jenkins. No idea ...

@zsxwing
Copy link
Copy Markdown
Member

zsxwing commented Oct 24, 2016

@srowen agreed. Since Hadoop still depends on Netty 3, it hurts little if we still keep it.

@srowen srowen closed this Oct 25, 2016
@srowen srowen deleted the SPARK-17875 branch October 25, 2016 10:41
dongjoon-hyun pushed a commit that referenced this pull request Aug 22, 2019
### What changes were proposed in this pull request?

Spark uses Netty 4 directly, but also includes Netty 3 only because transitive dependencies do. The dependencies (Hadoop HDFS, Zookeeper, Avro) don't seem to need this dependency as used in Spark. I think we can forcibly remove it to slim down the dependencies.

Previous attempts were blocked by its usage in Flume, but that dependency has gone away.
#15436

### Why are the changes needed?

Mostly to reduce the transitive dependency size and complexity a little bit and avoid triggering spurious security alerts on Netty 3.x usage.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing tests

Closes #25544 from srowen/SPARK-17875.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants