Skip to content

Build: iceberg core is built against a later version of hadoop libs than spark 3.x releases #15100

@steveloughran

Description

@steveloughran

Feature Request / Improvement

#14125 set the version of the hadoop libraries iceberg is built with to 3.4.2
However, spark 3.4.4 and 3.5.8 are built with Hadoop 3.3.5.
There's a risk of accidental use of APIs, enum values, overridden methods and the like such that
iceberg doesn't actually run properly in these versions. And while the iceberg testing should find problems, there's inevitably code paths, which don't get tested, especially those related to
different deployment scenarios.

That's independent of the challenge of using new APIs (#12055) in the code.

It would be much safer to fix the hadoop version in builds to be that of the oldest supported
spark release. Those developers who want to adopt new features will have to use reflection.

Easy to fix

  • Downgrade hadoop3 version in gradle/libs.versions.toml
  • Add a comment saying "must be in sync with oldest supported spark release"

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions