Skip to content

Conversation

@JacekPliszka
Copy link
Contributor

@JacekPliszka JacekPliszka commented Feb 14, 2020

Minimal implementation of casts:

  • from Decimal128 to Decimal128 - first it rescales to the output scale then assigns
  • from Decimal128 to Int64 - it first truncates the fractional part, then casts onto Int64

@github-actions
Copy link

@kou
Copy link
Member

kou commented Feb 15, 2020

Thanks for your first contribution.

Could you update the pull request description to describe this change?
We use the pull request description as commit message.

@JacekPliszka JacekPliszka changed the title ARROW-3329 [C++] Added casts Decimal128 to Decimal128 and Decimal128 … ARROW-3329 [C++] Added casts Decimal128 to Decimal128 and Int64 Feb 15, 2020
@JacekPliszka
Copy link
Contributor Author

Thanks for your first contribution.

Could you update the pull request description to describe this change?
We use the pull request description as commit message.

Could you tell me how to do it? I've pasted the description into the first comment.

But I can not see any way to add description to the PR itself. Normally I work with gitlab where there is button for it - can not find it here.

@kou
Copy link
Member

kou commented Feb 17, 2020

You did update the description ("the first comment" is the description of this pull request). It's enough.
Thanks.

Someone will review this. Please wait for a while.

@pitrou pitrou changed the title ARROW-3329 [C++] Added casts Decimal128 to Decimal128 and Int64 ARROW-3329: [C++] Added casts Decimal128 to Decimal128 and Int64 Feb 17, 2020
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for doing this. Here are some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't log errors but rather save them on the context. This is how it's done in another kernel:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L295-L298

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected - hopefully correctly.

Copy link
Member

@pitrou pitrou Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If options.allow_int_overflow is false, you should also check that the result value doesn't overflow out_type. You should be able to use Decimal::ToInteger for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected - hopefully correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: should save the error on the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Int64? I would expect all signed integer types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added other ints though they are not really needed as there is cast from int64 to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Int32 Int16 Int8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this include will worsen compile times. You should be able to add a forward declaration to type_fwd.h if needed, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to type_fwd.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect more test cases here, especially some examples where conversion fails because of overflow, but also examples including nulls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests: nulls, overflow, truncation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here: I would expect more tricky conversion cases, failures (overflow) and nulls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests: nulls, truncation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't add this if this isn't used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is used now

@pitrou
Copy link
Member

pitrou commented Feb 19, 2020

@JacekPliszka Did you forget to push your changes?

@JacekPliszka
Copy link
Contributor Author

@JacekPliszka Did you forget to push your changes?

No. Handling overflow and truncation requires more work and it would take more time at my C++ skill level than I have.

@nealrichardson
Copy link
Member

What about casting the other way (integer to decimal)? I was recently looking at adding some tests for decimal types in R but couldn't figure out how to make a decimal array to begin with.

@JacekPliszka
Copy link
Contributor Author

What about casting the other way (integer to decimal)? I was recently looking at adding some tests for decimal types in R but couldn't figure out how to make a decimal array to begin with.

I am not planning to do it - the scope is already larger than I had planned.
My goal is to have Decimal128 to Int64 (not finished) and Decimal128 to Decimal128 (finished [though may be incorrect], not yet pushed).

@pitrou
Copy link
Member

pitrou commented Feb 24, 2020

No. Handling overflow and truncation requires more work and it would take more time at my C++ skill level than I have.

Well, in this case, someone else will have to do it, but they may not make this a priority :-)

@nealrichardson
Copy link
Member

@pitrou can we merge this and make a followup Jira for the outstanding questions?

@pitrou
Copy link
Member

pitrou commented Mar 11, 2020

Why should we merge a deficient PR? There is no proper error handling here.

@wesm
Copy link
Member

wesm commented Mar 11, 2020

Yes if @JacekPliszka isn't able to complete it we should wait for someone to pick up the changes and add the necessary error handling etc.

@jacek-pliszka
Copy link

I will try to find some time during weekend. I believe have decimal to decimal done with options and error handling. decimal to int still needs some work.

@jacek-pliszka
Copy link

OK, I did decimal to int but still need some time for nulls handling in decimal to decimal - probably another day.

I have a question though about loop optimization - currently I move the loops inside ifs but maybe I can assume that compilers will do it for me - will they?

@pitrou
Copy link
Member

pitrou commented Mar 16, 2020

I have a question though about loop optimization - currently I move the loops inside ifs but maybe I can assume that compilers will do it for me - will they?

Ideally, they will. But optimization is always based on heuristics and you never know what the compiler will decide. So in some cases it makes sense to duplicate loops by hand.

@JacekPliszka
Copy link
Contributor Author

JacekPliszka commented Mar 17, 2020

OK, tests should be all green soon so please review.

I am not happy with all solutions I've applied there but I believe it is more or less correct.
If something needs corrections - let me know - haven't programmed in C++ in years.

Notes;

  • looks like handling negative scales is missing in several places
  • probably decimal scaling operations could be somehow optimized and vectorized voiding the need for separate function calls and reducing the code size

@pitrou
Copy link
Member

pitrou commented Mar 17, 2020

Thank you @JacekPliszka. I'm taking a look now.

@pitrou
Copy link
Member

pitrou commented Mar 17, 2020

So, everything was good functionally, thank you :-) I just pushed a number of simplifications in the code. Will merge if CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants