Skip to content

Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma#25791

Closed
xuhdev wants to merge 6 commits intopytorch:masterfrom
xuhdev:cuda-gamma
Closed

Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma#25791
xuhdev wants to merge 6 commits intopytorch:masterfrom
xuhdev:cuda-gamma

Conversation

@xuhdev
Copy link
Copy Markdown
Collaborator

@xuhdev xuhdev commented Sep 6, 2019

  • There are some missing casts.

  • Functions like ::log, ::sin will potentially always invoke the double version on host. For
    example, compiling the following code:

    #include <cmath>
    
    float log_float(float f) {
        return ::logf(f);
    }
    
    double log_double(double f) {
        return ::log(f);
    }
    
    float log_float2(float f) {
        return ::log(f);
    }
    
    float log_float3(float f) {
        return std::log(f);
    }

    using g++ -c -O3 leads to:

    log_float(float):
            jmp     logf
    log_double(double):
            jmp     log
    log_float2(float):
            subq    $8, %rsp
            cvtss2sd        %xmm0, %xmm0
            call    log
            addq    $8, %rsp
            cvtsd2ss        %xmm0, %xmm0
            ret
    log_float3(float):
            jmp     logf
    

    Note that log_float2 delegates the call to the double version of log
    (surrounded by cast), while log_float3 delegates the call correctly to
    logf. See https://godbolt.org/z/KsRWwW

…_trigamma

- There are some missing casts.
- Functions like ::log, ::sin will potentially always invoke the double version on host. For
  example, compiling the following code:

  ```c++
  #include <cmath>

  float log_float(float f) {
      return ::logf(f);
  }

  double log_double(double f) {
      return ::log(f);
  }

  float log_float2(float f) {
      return ::log(f);
  }

  float log_float3(float f) {
      return std::log(f);
  }
  ```

  using `g++ -c -O3` leads to:

      log_float(float):
              jmp     logf
      log_double(double):
              jmp     log
      log_float2(float):
              subq    $8, %rsp
              cvtss2sd        %xmm0, %xmm0
              call    log
              addq    $8, %rsp
              cvtsd2ss        %xmm0, %xmm0
              ret
      log_float3(float):
              jmp     logf

  Note that log_float2 delegates the call to the double version of log
  (surrounded by cast), while log_float3 delegates the call correctly to
  logf.
@xuhdev xuhdev requested a review from ifedan September 6, 2019 18:52
@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels Sep 6, 2019
@xuhdev xuhdev changed the title Fix many type mismatches in the CUDA version of calc_digamma and calc… Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma Sep 6, 2019
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 8, 2019

@pytorchbot rebase this please

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 9, 2019

@pytorchbot rebase this please

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 13, 2019

@pytorchbot rebase this please

}

bool x_is_integer = x == ::floor(x);
bool x_is_integer = x == std::floor(x);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. Does gcc compile this, or does nvcc do it? The code ultimately gets run on device, right?

Copy link
Copy Markdown
Collaborator Author

@xuhdev xuhdev Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is defined as __host__ __device__, that means it will be compiled twice on each

Copy link
Copy Markdown
Collaborator Author

@xuhdev xuhdev Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 The main point of this kind of change in this PR is about the confusion that, the CUDA doc is silent on this issue (given that std::floor exists), and that assuming ::floor is the same as std::floor is inconsistent with standard C++. I think it would best if we can be more compliant to avoid a potential issue, which by itself at least doesn't hurt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, to make sure I understand what's going on: where is floor in ::floor coming from?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can come from a C definition (as declared in a C header file, which is solely defined for double). The standard C++ library is silent on ::floor, while it defines std::floor. Does this answer your question?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does, thanks for the clarification

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 16, 2019

@pytorchbot rebase this please

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 17, 2019

@pytorchbot rebase this please

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 17, 2019

@pytorchbot merge this please

The CI failure is likely unrelated, because this PR does not change any GPU code, and the change of CPU code is very local.

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Sep 17, 2019
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@izdeby merged this pull request in 13b5446.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 18, 2019
…_trigamma (#25791)

Summary:
- There are some missing casts.
- Functions like ::log, ::sin will potentially always invoke the double version on host. For
  example, compiling the following code:

  ```c++
  #include <cmath>

  float log_float(float f) {
      return ::logf(f);
  }

  double log_double(double f) {
      return ::log(f);
  }

  float log_float2(float f) {
      return ::log(f);
  }

  float log_float3(float f) {
      return std::log(f);
  }
  ```

  using `g++ -c -O3` leads to:

      log_float(float):
              jmp     logf
      log_double(double):
              jmp     log
      log_float2(float):
              subq    $8, %rsp
              cvtss2sd        %xmm0, %xmm0
              call    log
              addq    $8, %rsp
              cvtsd2ss        %xmm0, %xmm0
              ret
      log_float3(float):
              jmp     logf

  Note that log_float2 delegates the call to the double version of log
  (surrounded by cast), while log_float3 delegates the call correctly to
  logf. See https://godbolt.org/z/KsRWwW
Pull Request resolved: pytorch/pytorch#25791

Differential Revision: D17452312

Pulled By: izdeby

fbshipit-source-id: 6276a011a373cd7cb144f9ecd84116aa206e7d1b
@mruberry
Copy link
Copy Markdown
Collaborator

Unlanding this with #26444.

The affected code does run on the GPU and likely broke the ROCm builds.

std:: and :: is an important distinction on GPUs. See some comments in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Distributions.h#L7.

@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Sep 19, 2019

Oops, sorry about this; looks like I made the merge comment on the wrong thread. yes, it is causing ROCm failure. I'll look into this, as ROCm works with some std:: functions, while not all of them.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…_trigamma (pytorch#25791)

Summary:
- There are some missing casts.
- Functions like ::log, ::sin will potentially always invoke the double version on host. For
  example, compiling the following code:

  ```c++
  #include <cmath>

  float log_float(float f) {
      return ::logf(f);
  }

  double log_double(double f) {
      return ::log(f);
  }

  float log_float2(float f) {
      return ::log(f);
  }

  float log_float3(float f) {
      return std::log(f);
  }
  ```

  using `g++ -c -O3` leads to:

      log_float(float):
              jmp     logf
      log_double(double):
              jmp     log
      log_float2(float):
              subq    $8, %rsp
              cvtss2sd        %xmm0, %xmm0
              call    log
              addq    $8, %rsp
              cvtsd2ss        %xmm0, %xmm0
              ret
      log_float3(float):
              jmp     logf

  Note that log_float2 delegates the call to the double version of log
  (surrounded by cast), while log_float3 delegates the call correctly to
  logf. See https://godbolt.org/z/KsRWwW
Pull Request resolved: pytorch#25791

Differential Revision: D17452312

Pulled By: izdeby

fbshipit-source-id: 6276a011a373cd7cb144f9ecd84116aa206e7d1b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-this-please Was marked for merge with @pytorchbot merge this please Merged module: cuda Related to torch.cuda, and CUDA support in general open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants