Skip to content

Conversation

@BryanCutler
Copy link
Member

This removes non-nullable vectors that are no longer part of the vector class hierarchy and renames Nullable*Vector classes to remove the Nullable prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly different API with the new BitVector

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's now a nullable BitVector, can't get values where a null is set

Copy link
Member Author

@BryanCutler BryanCutler Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value checked here seem wrong before, am I missing something? There are 4 values set out of 1024 total, so 1020 null values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is old (non-nullable vector) checks how many values are "0" rather than how many values are "set", so when you set two values to "0", they are still counted in NullCount.

I think the new behavior makes more sense, although this could be backward incompatible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would make sense, but sort of a misnomer imo. Should we include this change in the release notes somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is an misnomer. I think this is fine as long as it doesn't break Dremio. cc @siddharthteotia.

Copy link
Contributor

@icexelloss icexelloss Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For migration purpose, should me provide an alternative method to provide the old behavior? Something like getNullAndUnsetCount(). Seems like a weird API though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this value seemed wrong before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't check that a null entry has any type of value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this seemed wrong before... since 4 values are set earlier, then all 4 should be unset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, slightly different API, need to check range manually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider add setRangeToOne back in BitVector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? it is checking only the data buffer capacity i believe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is checking DataBuffer and the check after reAlloc is checking OffsetBuffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the offset buffer was a typo on my part, this should be checking the data buffer capacity - fixed now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what the test is checking for?

@BryanCutler
Copy link
Member Author

@siddharthteotia @icexelloss please take a look when you have a chance. There are still some things I was not sure of so I marked this as a WIP:

  1. MapVector still has a non-nullable vector class in it's hierarchy. Maybe I missed some previous discussion from the refactoring, but I couldn't just remove the non-nullable class so I renamed them to be consistent with the other vectors

  2. Holders have both nullable and non-nullable, not sure if there is still a use for that so I left it for now... maybe after ARROW-1833 we can fix this up

  3. There are still references to NullableMapWriter/Reader, that got a little messy trying to remove but I can try again later

cc @jacques-n @wesm

@BryanCutler
Copy link
Member Author

Given the scope of this, I would be nice to get it and resolve the above issues later as long as this doesn't break anything - looks like I already need to do a rebase..

@BryanCutler
Copy link
Member Author

@icexelloss I didn't see you were already working on #1330, which is also a sizable change, before I started on this. I'm not sure which one would be easier to merge first, any thoughts?

@icexelloss
Copy link
Contributor

@BryanCutler There might be some conflicts but I think we should be fine.

@icexelloss
Copy link
Contributor

If this is merged first. I would just delete different files, shouldn't be hard. If #1330 get merged first, you need to rename NullableTimestampVector -> TimestampVector.

@BryanCutler BryanCutler force-pushed the java-nullable-vector-rename-ARROW-1710 branch from 685cb46 to e96c11f Compare November 21, 2017 21:36
@icexelloss
Copy link
Contributor

@BryanCutler Good work! High level looks good. I will try to look more closely tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is NonNullableMapVector something new? Can you explain the change of class hierarchy here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, the original MapVector has now become NonNullableMapVector and NullableMapVector (subclass of of MapVector) has now become MapVector. @BryanCutler can confirm.

I would prefer to not change the naming here since there are essentially two classes -- NullableMapVector and MapVector where the former one is a MapVector with a validity buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different from changing the name of NullableInt to Int since we are going to remove the latter one.

For the MapVector case, I don't think so we are removing anything and instead keeping around both -- base class MapVector and subclass NullableMapVector. So probably no need to change name here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the above is correct and what I was referencing in (1) from #1341 (comment). Maybe I missed the discussion, so I thought we would be removing the NonNullableMapVector and would combine the 2 classes. Why do we need to keep both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not "MinorType"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this must have been auto-corrected, let me check on it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the other vector classes also use Types.MinorType. I agree, it would be better to just say MinorType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I think this is missed. Can you please fix this?

@icexelloss
Copy link
Contributor

@BryanCutler High level looks good to me. Two major comments:

  • Do we want to port the missing function (getNullCount() and setRangeToOne) to the new bit vector?
  • Can we remove NonNullableMapVector altogether? (git rid of the MapVector extends NonNullableMapVector inherence and roll them into a single class )

Happy thanksgiving!

@BryanCutler
Copy link
Member Author

Do we want to port the missing function (getNullCount() and setRangeToOne) to the new bit vector?
The new vector has getNullCount() just that it only counts nulls, maybe the equivalent would be getZeroCount() but I'm not sure how useful that really is. I think I saw there was also a setRangeToOne and I'm ok with adding those.

Can we remove NonNullableMapVector altogether? (git rid of the MapVector extends NonNullableMapVector inherence and roll them into a single class )

I was thinking this is what we were going to do, maybe as a followup though, so +1 for me. One thing I'm not sure of is ullability useful for this vector? For example, Spark StructType doesn't have a nullable param, it's up to the child type definitions. But maybe it's different in the Arrow sense, I'll have to think about that..

Happy Thanksgiving to you all too! You guys work too hard, I don't want to see any PRs coming through for a couple days :)

@icexelloss
Copy link
Contributor

icexelloss commented Nov 22, 2017

I was thinking this is what we were going to do, maybe as a followup though, so +1 for me. One thing I'm not sure of is ullability useful for this vector? For example, Spark StructType doesn't have a nullable param, it's up to the child type definitions. But maybe it's different in the Arrow sense, I'll have to think about that..

I think in Spark, nullable ability is part of StructField not DataType, so a nullable struct column in Spark would be:

StructField("struct", StructType(Seq(StructField("a", IntergerType) ...)), nullable=true)

@wesm
Copy link
Member

wesm commented Nov 22, 2017

Needs rebase

@BryanCutler
Copy link
Member Author

Oh yeah, you're right about Spark struct, duh! So should I change the naming of MapVector back to NullableMapVector <- MapVector then and we can discuss combing to 1 class in another JIRA?

@BryanCutler BryanCutler force-pushed the java-nullable-vector-rename-ARROW-1710 branch from b7a6930 to ff2120d Compare November 23, 2017 05:32
@icexelloss
Copy link
Contributor

I am still not sure why do we need both nullable and non-nullable map vectors. @siddharthteotia can you elaborate? If this is the only thing blocking the PR. Maybe we can merge this and address it later.

@BryanCutler BryanCutler changed the title [WIP] ARROW-1710: [Java] Remove Non-Nullable Vectors ARROW-1710: [Java] Remove Non-Nullable Vectors Nov 27, 2017
@BryanCutler
Copy link
Member Author

I removed the WIP tag because I think this is ok to merge unless @siddharthteotia objects to the changes of MapVector. Otherwise I can make a follow up to merge the 2 map vector classes.

@wesm
Copy link
Member

wesm commented Nov 27, 2017

This build is passing, @jacques-n @siddharthteotia ?

@icexelloss
Copy link
Contributor

LGTM except for changing Types.MinorType -> MinorType

@BryanCutler
Copy link
Member Author

LGTM except for changing Types.MinorType -> MinorType

This wasn't added by me and it's the same for the other vector classes, maybe change all in a followup?

@icexelloss
Copy link
Contributor

This wasn't added by me and it's the same for the other vector classes, maybe change all in a followup?

I see. Yeah this is fine.

@siddharthteotia
Copy link
Contributor

What are we doing with NonNullableMapVector, MapVector?

(1) Either we leave MapVector and NullableMapVector as is where the latter is a subclass. In this case, there are no code changes needed in these two modules.

OR

(2) We can combine them and just have a MapVector with a validity buffer.

I don't think there is a need to introduce a new class NonNullableMapVector in the hierarchy.

My assumption with this patch was all nullables will become non-nullables and we will get rid of FixedValueVectors.java template -- this is for scalars.

The only complex vector that has 2 different versions is MapVector and I feel we can handle them in either of the two ways mentioned above.

@BryanCutler
Copy link
Member Author

@siddharthteotia , I will do (2) in a followup and remove NonNullableMapVector (soon, probably tomorrow). The only reason I kept NonNullableMapVector here is because there were a couple usages of it that might need a closer look and I didn't want to further complicate this PR.

@wesm
Copy link
Member

wesm commented Nov 28, 2017

Merging tomorrow morning if no objections

@wesm
Copy link
Member

wesm commented Nov 28, 2017

+1. Is there a JIRA already for the follow up work?

@wesm wesm closed this in 689bbd7 Nov 28, 2017
@BryanCutler
Copy link
Member Author

Thanks all! I will make the follow up JIRAs now

@BryanCutler
Copy link
Member Author

Ok I made these for followup:

wesm pushed a commit that referenced this pull request Dec 1, 2017
This removes non-nullable vectors that are no longer part of the vector class hierarchy and renames Nullable*Vector classes to remove the Nullable prefix.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #1341 from BryanCutler/java-nullable-vector-rename-ARROW-1710 and squashes the following commits:

7d930dc [Bryan Cutler] fixed realloc test
ff2120d [Bryan Cutler] clean up test
374dfcc [Bryan Cutler] properly rename BitVector file
6b7a85e [Bryan Cutler] remove old BitVector.java before rebase
089f7fc [Bryan Cutler] some minor cleanup
4e580d9 [Bryan Cutler] removed legacy BitVector
74f771f [Bryan Cutler] fixed remaining tests
8c5dfef [Bryan Cutler] fix naming in support classes
6e498e5 [Bryan Cutler] removed nullable prefix
dfed444 [Bryan Cutler] removed non-nullable vectors
wesm pushed a commit that referenced this pull request Dec 4, 2017
Changes to MapVector were not part of the intended goal for ARROW-1710 and should be restored to `NullableMapVector` -> `MapVector` -> `AbstractMapVector` as existed prior to #1341.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #1389 from BryanCutler/java-restore-MapVector-class-ARROW-1885 and squashes the following commits:

9ff503a [Bryan Cutler] restore MapVector class names
3d818f0 [Bryan Cutler] renamed NonNullableMapVector to MapVector
ced1705 [Bryan Cutler] renamed MapVector to NullableMapVector
@BryanCutler BryanCutler deleted the java-nullable-vector-rename-ARROW-1710 branch November 19, 2018 05:49
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
This removes non-nullable vectors that are no longer part of the vector class hierarchy and renames Nullable*Vector classes to remove the Nullable prefix.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#1341 from BryanCutler/java-nullable-vector-rename-ARROW-1710 and squashes the following commits:

7d930dc [Bryan Cutler] fixed realloc test
ff2120d [Bryan Cutler] clean up test
374dfcc [Bryan Cutler] properly rename BitVector file
6b7a85e [Bryan Cutler] remove old BitVector.java before rebase
089f7fc [Bryan Cutler] some minor cleanup
4e580d9 [Bryan Cutler] removed legacy BitVector
74f771f [Bryan Cutler] fixed remaining tests
8c5dfef [Bryan Cutler] fix naming in support classes
6e498e5 [Bryan Cutler] removed nullable prefix
dfed444 [Bryan Cutler] removed non-nullable vectors
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Changes to MapVector were not part of the intended goal for ARROW-1710 and should be restored to `NullableMapVector` -> `MapVector` -> `AbstractMapVector` as existed prior to apache#1341.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#1389 from BryanCutler/java-restore-MapVector-class-ARROW-1885 and squashes the following commits:

9ff503a [Bryan Cutler] restore MapVector class names
3d818f0 [Bryan Cutler] renamed NonNullableMapVector to MapVector
ced1705 [Bryan Cutler] renamed MapVector to NullableMapVector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants