Skip to content

GH-39680: [Java] enable half float support on Java module#39681

Merged
lidavidm merged 11 commits intoapache:mainfrom
davisusanibar:GH-39680-float16
Jan 31, 2024
Merged

GH-39680: [Java] enable half float support on Java module#39681
lidavidm merged 11 commits intoapache:mainfrom
davisusanibar:GH-39680-float16

Conversation

@davisusanibar
Copy link
Copy Markdown
Contributor

@davisusanibar davisusanibar commented Jan 18, 2024

Rationale for this change

  • To enable half float support on Java module.

What changes are included in this PR?

  • Add initial Float16 type support
  • Unit test
  • Integration test
  • Documentation

Are these changes tested?

Yes.

Are there any user-facing changes?

No

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #39680 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 18, 2024
* where the value will be read from
* @return 4 byte float value
*/
public short getFloat16(long index) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that setFloat16 takes a float, by symmetry this method should probably return a float too.
@danepitkin WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to float will involve a conversion to a 32 bit type. I think it would be nice to provide set/get operations on short as the core APIs and set/get operations on float as helper APIs. Thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getShort and setShort already seems provided?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, good point.

I agree it makes sense to change this to float.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

// Positive infinity of type half-precision float.
private static final short POSITIVE_INFINITY = (short) 0x7c00;
// A Not-a-Number representation of a half-precision float.
private static final short NaN = (short) 0x7e00;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and POSITIVE_INFINTY be public for app users to make use of?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

public class UnionReader extends AbstractFieldReader {

private BaseReader[] readers = new BaseReader[45];
private BaseReader[] readers = new BaseReader[46];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good if this constant was given a name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@davisusanibar
Copy link
Copy Markdown
Contributor Author

@github-actions crossbow submit java-jars

@github-actions
Copy link
Copy Markdown

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/7650222562

@davisusanibar
Copy link
Copy Markdown
Contributor Author

In order to test Float16 data types in Arrow Java Dataset module, apache/arrow-testing#99 is required.

@davisusanibar davisusanibar marked this pull request as ready for review January 25, 2024 05:38
Copy link
Copy Markdown
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think I'd favor using short as the 'default' float16 type and having float be the 'special'. Else the 'default' case has ambiguity when going from float -> float16.

Comment on lines +270 to +293
/*
The purpose of this method is to refresh the data in alltypes-java.parquet as required:
https://github.com/apache/arrow-testing/blob/master/data/parquet/alltypes-java.parquet
*/
public static void main(String[] args) throws Exception {
TestAllTypes test = new TestAllTypes();
try (BufferAllocator allocator = new RootAllocator();
VectorSchemaRoot root = test.generateAllTypesVector(allocator)) {
byte[] featherData = test.serializeFile(root);
try (SeekableByteChannel channel = new ByteArrayReadableSeekableByteChannel(featherData);
ArrowStreamReader reader = new ArrowStreamReader(channel, allocator)) {
TMP.create();
final File writtenFolder = TMP.newFolder();
final String writtenParquet = writtenFolder.toURI().toString();
DatasetFileWriter.write(allocator, reader, FileFormat.PARQUET,
writtenParquet);
Objects.requireNonNull(writtenFolder.listFiles());
Files.move(writtenFolder.listFiles()[0].toPath(),
Paths.get(writtenFolder.toPath().toString(), "alltypes-java.parquet"));
System.out.println("The file data/parquet/alltypes-java.parquet should be updated with this new data: " +
writtenFolder.listFiles()[0].toURI());
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this so that it doesn't duplicate the test itself? Also, technically this isn't really necessary; you can run the test in an IDE (and set a breakpoint so you can copy the temporary file)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will clean and delete this section.

<configuration>
<enableAssertions>false</enableAssertions>
<systemPropertyVariables>
<arrow.test.dataRoot>${project.basedir}/../../testing/data</arrow.test.dataRoot>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this actually used? I don't see any new tests referencing it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant part has already been added before, let me remove it

* where the value will be written
* @param value value to write
*/
public void setFloat16(long index, float value) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, these methods should work via shorts and not floats, since there is no conversion ambiguity with shorts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

}
}

@Test /* Float2Vector */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

) {
buf.setFloat16(0, +32.875f);
assertEquals((short) 0x501c, Float16.toFloat16(+32.875f));
assertEquals(Float16.toFloat((short) 0x501c), buf.getFloat16(0), 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should just use short in the first place.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 25, 2024
lidavidm pushed a commit to apache/arrow-testing that referenced this pull request Jan 25, 2024
…tAllTypes (#99)

To be able to test Float16 data types into Arrow Java Dataset module.

To fix dataset errors at: apache/arrow#39681
@lidavidm
Copy link
Copy Markdown
Member

I've merged the testing PR.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 26, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 29, 2024

/**
* Lifted from Apache Parquet MR project:
* https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/schema/Float16.java
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, use the URL that contains the commit hash, not 'main'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jan 30, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 30, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jan 30, 2024
@davisusanibar
Copy link
Copy Markdown
Contributor Author

Just for the record, I will work on these items in another PR:

  • Testing the C data interface with Float16 data
  • Add support for VectorVisitor classes (compare/validate)
  • Add support for Float16 to the arrow-algorithm module

If there are any other items that need to be reviewed, please let me know.

cc @lidavidm @danepitkin @vibhatha

@lidavidm
Copy link
Copy Markdown
Member

Do we have IPC integration tests for float16? I don't obviously see any

@lidavidm lidavidm merged commit 2a87693 into apache:main Jan 31, 2024
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Jan 31, 2024
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jan 31, 2024

Do we have IPC integration tests for float16? I don't obviously see any

If not, then perhaps it's worth opening a GH issue?

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2a87693.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Feb 1, 2024

Do we have IPC integration tests for float16? I don't obviously see any

@davisusanibar can you follow up here?

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he#39681)

### Rationale for this change

- To enable half float support on Java module.

### What changes are included in this PR?

- [x] Add initial Float16 type support
- [x] Unit test
- [x] Integration test
- [x] Documentation

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No
* Closes: apache#39680

Authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@vibhatha
Copy link
Copy Markdown
Contributor

@github-actions crossbow submit -g java

@github-actions
Copy link
Copy Markdown

Revision: 19869ad

Submitted crossbow builds: ursacomputing/crossbow @ actions-f0517a91b4

Task Status
java-jars GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@vibhatha
Copy link
Copy Markdown
Contributor

@github-actions crossbow submit java-jars

@github-actions
Copy link
Copy Markdown

Revision: 19869ad

Submitted crossbow builds: ursacomputing/crossbow @ actions-62925ab8ab

Task Status
java-jars GitHub Actions

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…he#39681)

### Rationale for this change

- To enable half float support on Java module.

### What changes are included in this PR?

- [x] Add initial Float16 type support
- [x] Unit test
- [x] Integration test
- [x] Documentation

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No
* Closes: apache#39680

Authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] Add Float16 type support

6 participants