Skip to content

Conversation

@pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Mar 9, 2019

No description provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be backwards incompatible, because the type is encoded with an enum which is just an integer (https://google.github.io/flatbuffers/md__schemas.html), you probably need to put this at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document that this is type optional for implementations (unless we can get consensus on the mailing list if we want it to be globally supported)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I fixed it!

@pcmoritz
Copy link
Contributor Author

This is ready to review in case there is more feedback!

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment on the jira asking for both c++ and Java implementations before adding a type to schema.fbs might want to discuss this on the mailing list (sorry if I missed the Java side)

@pcmoritz
Copy link
Contributor Author

Thanks for pointing me to that. Yeah, there is currently no Java implementation for this I'm afraid (and I don't know the Java codebase well enough to implement it). If anybody has time to do it that would be great!

@emkornfield
Copy link
Contributor

If someone else doesn't speak up I can put it on my to do list, behind getting my currently open PRs checked in (unless this is blocking something, in which case I can reprioritize)

@pcmoritz
Copy link
Contributor Author

Thanks, that would be amazing! I'm not really blocked and will just squash this PR for now and keep working on top of it.

Let me know if I can help you with your currently open PRs!

@emkornfield
Copy link
Contributor

Actually, if you want to trade #3644 is missing the C++ side of the implementation if you can pick that up, I can prioritize the work on the java side for this. For now I will just clone this PR and start working on top of it, but if there is a good way to collaborate (on a different PR a using branches was mentioned, I don't know the exact mechanics) let me know.

@wesm
Copy link
Member

wesm commented Mar 11, 2019

I have funding if you know any Java contractors who might be interested in doing work-for-hire on this project to help with work beyond what the handful of us volunteers are able to do

@maartenbreddels
Copy link
Contributor

I started working together with an 'ex'-JVM contractor today actally, maybe we can help out here.

@emkornfield
Copy link
Contributor

@maartenbreddels @pcmoritz I'm going to pick up the java side of things (unless you've already started). My aim is to have a PR out sometime towards the end of the week.

@emkornfield
Copy link
Contributor

emkornfield commented Mar 13, 2019

A proof of concept java version (see caveats in commit message): 03956ca @pcmoritz let me know if you want to try to write the integration tests or can I can try to do it later in the week. Also if someone can point me to java benchmarks (and any other tests that should be run), so I can try to assess regressions and upstream API breakages I would appreciate it.

@maartenbreddels
Copy link
Contributor

My aim is to have a PR out sometime towards the end of the week.

👍

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Mar 14, 2019

@emkornfield Thanks, this is great! I got started on the C++ side of DurationInterval and plan to have something by the end of the week. Unfortunately I can't commit to more at the moment :(

@emkornfield
Copy link
Contributor

emkornfield commented Mar 14, 2019

Actually if you want to share what you have I wouldn't mind doing it myself for durationinterval I want to get more familiar with the code. I think we should discuss use cases for LargeList on the ML since I think it is limited by other parts of the arrow spec but I might be misunderstanding something

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Mar 14, 2019

I haven't gotten too far yet: master...pcmoritz:interval-cpp

Most of the code will be pretty similar to this PR (in terms of what needs to be added, etc) :)

Feel free to continue/make your own branch!

@emkornfield
Copy link
Contributor

Thanks. Also I commented on the JIRA, but my confusion over utility was based on the memory layout document to indicates arrays should still only be 32-bit but Message.fbs was updated quite a while ago to indicate 64-bit offsets.


// List with 64-bit offsets. This type is optional and currently
// only supported by the C++ implementation.
table LargeList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsaintjacques had a suggestion which I liked and I think is worth discussing to parameterize List instead of creating a new type. I know @wesm suggested he prefers this as a separate type like you have here, but I'm not sure what the rationale is.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the principle this must be validated through the ML discussion. But here are some comments on the implementation.

LIST,

/// A list of some logical data type, using 64-bit offsets
LARGE_LIST,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this before the end of the enum will break the ABI. I'm not sure we care. @xhochy Comments ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sadly break the ABI quite a lot, I've given up a bit on keeping it stable. I'll look into this in 3-6 months again.


std::string ToString() const override;

std::string name() const override { return "large_list"; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps "list64".

///
/// List data is nested data where each value is a variable number of
/// child items. Lists can be recursively nested, for example
/// list(list(int32)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"large_list(large_list(int32))".

Status Visit(const BinaryArray& array) override { return VisitBinary(array); }

Status Visit(const ListArray& array) override {
template <typename ListArrayType, typename OffsetType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the offset type should be available as ListType::offset_type and LargeListType::offset_type.

template <typename T>
inline typename std::enable_if<std::is_base_of<ListArray, T>::value, Status>::type
inline typename std::enable_if<std::is_base_of<ListArray, T>::value ||
std::is_base_of<LargeListArray, T>::value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a BaseListArray base class would be useful (and perhaps also a BaseListType).
Also enable_if_list<T> in type_traits.h may deserve updating.

ASSERT_EQ("counts", type.value_field()->name());
}

TEST_F(TestLargeListArray, TestBuilderPreserveFieleName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to copy/paste all ListArray tests instead of using some kind of parametrization?

/// represent multiple different logical types. If no logical type is provided
/// at construction time, the class defaults to List<T> where t is taken from the
/// value_builder/values that the object is constructed with.
class ARROW_EXPORT LargeListBuilder : public ArrayBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copy/pasting the ListBuilder implementation, how about factoring it into a templated base class that both concrete implementations can inherit from?

@emkornfield
Copy link
Contributor

@pcmoritz were you going to take this up again?

@kszucs kszucs force-pushed the master branch 2 times, most recently from ed180da to 85fe336 Compare July 22, 2019 19:29
@pitrou
Copy link
Member

pitrou commented Jul 30, 2019

Recommending to close this in favour of #4969

@emkornfield
Copy link
Contributor

Closing in favor of #4969 which is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants