Skip to content

GH-37553: [Java] Allow FlightInfo#Schema to be nullable for long-running queries#37528

Merged
lidavidm merged 1 commit intoapache:mainfrom
tdcmeehan:schema
Sep 7, 2023
Merged

GH-37553: [Java] Allow FlightInfo#Schema to be nullable for long-running queries#37528
lidavidm merged 1 commit intoapache:mainfrom
tdcmeehan:schema

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan commented Sep 1, 2023

With #36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

We can modify our client code to allow this field to be optional. This is already the case for the Go code.

This changes the Java client code to allow the Schema to be null. A new getSchemaOptional method returns Optional<Schema>, which is a backwards compatible change. The existing method is deprecated, but will still return an empty schema if the schema is not present on wire (as it used to before).

Rationale for this change

With #36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

CC: @lidavidm

What changes are included in this PR?

This changes the Java client code to allow the Schema to be null. getSchema is now deprecated and a new getSchemaOptional returns Optional<Schema>, which is a backwards compatible change.

Are these changes tested?

Existing tests ensure serialization and deserialization continue to work.

Are there any user-facing changes?

The getSchema methods are now deprecated in favor of getSchemaOptional.

@tdcmeehan tdcmeehan requested a review from lidavidm as a code owner September 1, 2023 20:25
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 1, 2023

⚠️ GitHub issue apache/arrow-java#167 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-37527: [Java] Allow FlightInfo#Schema to be nullable for long-runn… GH-37527: [Java] Allow FlightInfo#Schema to be nullable for long-running queries Sep 1, 2023
@kou
Copy link
Copy Markdown
Member

kou commented Sep 1, 2023

It seems that the current implementation already accepts the empty schema message. And an empty schema is used for the case.

schema = pbFlightInfo.getSchema().size() > 0 ?
MessageSerializer.deserializeSchema(
new ReadChannel(Channels.newChannel(new ByteBufferBackedInputStream(schemaBuf))))
: new Schema(ImmutableList.of());

How about passing an empty instead of accepting null schema when we can't create a schema for long-running queries?
What is the difference of the above explicit empty schema approach and your suggested implicit empty schema approach? Easy to write?

@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Sep 1, 2023

I think there are two reasons:

  • Go already allows this, and we should make sure it's exposed and handled consistently across implementations
  • No schema is more informative than an empty schema (which I doubt would ever be returned by any database, but is still technically a valid schema)

Protobuf doesn't differentiate a null and an absent schema string, but an empty schema should serialize to a non-null string, so we can still tell the difference, too.

@tdcmeehan
Copy link
Copy Markdown
Contributor Author

My 0.02 cents: as a user of Arrow, it's not obvious that I should represent a missing schema as a new Schema with a 0-length list of fields.

I tried to browse the codebase for conventions I could learn from, and saw PollInfo as a source of inspiration: nullable fields are simply passed in null references, and they're exposed externally as Optional absent. Please let me know if I misjudged this project's conventions!

@kou
Copy link
Copy Markdown
Member

kou commented Sep 3, 2023

Thanks. I understand.

Could you create a new sub-issue of apache/arrow-java#167 (like #36620 and #36643) only for this (update the Java implementation)?
I think that we also need to update our documents to clarify FlightInfo::schema may be null. We can defer the task by creating a sub-issue for this.
(Do we need to update the C++ implementation too?)

@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Sep 4, 2023

(Do we need to update the C++ implementation too?)

Yes, the C++ implementation has the same (mistaken) assumption.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 4, 2023
@tdcmeehan
Copy link
Copy Markdown
Contributor Author

tdcmeehan commented Sep 4, 2023

Thanks. I understand.

Could you create a new sub-issue of GH-37527 (like GH-36620 and GH-36643) only for this (update the Java implementation)? I think that we also need to update our documents to clarify FlightInfo::schema may be null. We can defer the task by creating a sub-issue for this. (Do we need to update the C++ implementation too?)

I created #37554 for the docs update, and #37553 specifically for the Java update, and they are part of a task list in apache/arrow-java#167. I would like some help on the C++ issue. :)

@tdcmeehan tdcmeehan changed the title GH-37527: [Java] Allow FlightInfo#Schema to be nullable for long-running queries GH-37553: [Java] Allow FlightInfo#Schema to be nullable for long-running queries Sep 4, 2023
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 4, 2023

⚠️ GitHub issue #37553 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 5, 2023
@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Sep 5, 2023

I can likely help for C++, will just take me a while to be able to get around to it

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 5, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 6, 2023
Copy link
Copy Markdown
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlightInfo#equals still needs updating

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 6, 2023
@github-actions github-actions bot removed the awaiting changes Awaiting changes label Sep 6, 2023
@github-actions github-actions bot added the awaiting change review Awaiting change review label Sep 6, 2023
@tdcmeehan tdcmeehan force-pushed the schema branch 4 times, most recently from 2cfa4d2 to 567ee6c Compare September 6, 2023 23:51
Copy link
Copy Markdown
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a couple final nits.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 7, 2023
…g-running queries

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time has elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

We can modify our client code to allow this field to be optional. This is already the case for the Go code.

This changes the Java client code to allow the Schema to be null.  `getSchema` methods now return `Optional<Schema>`, which is a backwards incompatible change.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 7, 2023
Copy link
Copy Markdown
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Sep 7, 2023
@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Sep 7, 2023

(I tweaked the PR description to reflect the changes made.)

@lidavidm lidavidm merged commit 48cc2ab into apache:main Sep 7, 2023
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Sep 7, 2023
@tdcmeehan tdcmeehan deleted the schema branch September 7, 2023 14:55
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 48cc2ab.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

@tdcmeehan
Copy link
Copy Markdown
Contributor Author

Thank you for merging and advice @lidavidm and @kou

@lidavidm
Copy link
Copy Markdown
Member

lidavidm commented Sep 8, 2023

No problem. Feel free to reach out with questions about Flight, since I see there's a discussion going on for Presto

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…g-running queries (apache#37528)

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

We can modify our client code to allow this field to be optional. This is already the case for the Go code.

This changes the Java client code to allow the Schema to be null.  A new `getSchemaOptional` method returns `Optional<Schema>`, which is a backwards compatible change.  The existing method is deprecated, but will still return an empty schema if the schema is not present on wire (as it used to before).

### Rationale for this change

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

CC: `@ lidavidm`

### What changes are included in this PR?

This changes the Java client code to allow the Schema to be null.  `getSchema` is now deprecated and a new `getSchemaOptional` returns `Optional<Schema>`, which is a backwards compatible change.

### Are these changes tested?

Existing tests ensure serialization and deserialization continue to work.

### Are there any user-facing changes?

The `getSchema` methods are now deprecated in favor of `getSchemaOptional`.

* Closes: apache#37553

Authored-by: Tim Meehan <tim@timdmeehan.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…g-running queries (apache#37528)

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

We can modify our client code to allow this field to be optional. This is already the case for the Go code.

This changes the Java client code to allow the Schema to be null.  A new `getSchemaOptional` method returns `Optional<Schema>`, which is a backwards compatible change.  The existing method is deprecated, but will still return an empty schema if the schema is not present on wire (as it used to before).

### Rationale for this change

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

CC: `@ lidavidm`

### What changes are included in this PR?

This changes the Java client code to allow the Schema to be null.  `getSchema` is now deprecated and a new `getSchemaOptional` returns `Optional<Schema>`, which is a backwards compatible change.

### Are these changes tested?

Existing tests ensure serialization and deserialization continue to work.

### Are there any user-facing changes?

The `getSchema` methods are now deprecated in favor of `getSchemaOptional`.

* Closes: apache#37553

Authored-by: Tim Meehan <tim@timdmeehan.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…g-running queries (apache#37528)

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

We can modify our client code to allow this field to be optional. This is already the case for the Go code.

This changes the Java client code to allow the Schema to be null.  A new `getSchemaOptional` method returns `Optional<Schema>`, which is a backwards compatible change.  The existing method is deprecated, but will still return an empty schema if the schema is not present on wire (as it used to before).

### Rationale for this change

With apache#36155, implementations of Flight RPC may not return quickly via a newly added pollFlightInfo function. Sometimes, the system implementing this function may not know the output schema for some time--for example, after a lengthy queue time as elapsed, or after planning.

In proto3, fields may not be present, and it's a coding convention to require them 1. To support upcoming client integration work for pollFlightInfo, the schema field can be made optional so that it's not a requirement to populate the FlightInfo's schema on the first pollFlightInfo request.

CC: `@ lidavidm`

### What changes are included in this PR?

This changes the Java client code to allow the Schema to be null.  `getSchema` is now deprecated and a new `getSchemaOptional` returns `Optional<Schema>`, which is a backwards compatible change.

### Are these changes tested?

Existing tests ensure serialization and deserialization continue to work.

### Are there any user-facing changes?

The `getSchema` methods are now deprecated in favor of `getSchemaOptional`.

* Closes: apache#37553

Authored-by: Tim Meehan <tim@timdmeehan.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java][FlightRPC] Allow FlightInfo#Schema to be nullable for long-running queries

3 participants