more numerically stable cosine_similarity by nikitaved · Pull Request #31378 · pytorch/pytorch

nikitaved · 2019-12-17T19:12:07Z

Previous behavior: compute inner product, then normalize.
This patch: first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms.
By design ensures that cosine similarity is within [-1.0, +1.0], so it should fix #29442.

P.S. I had to change tests because this implementation handles division by 0 differently.
This PR computes cosine similarity as follows: <x/max(eps, ||x||), y/max(eps, ||y||)>.
Let f(x,y) = <x,y>/(||x|| * ||y||), then
df/dx = y/(||x|| * ||y||) - (||y||/||x|| * <x,y> * x)/(||x|| * ||y||)^2.
The changed test checks division by zero in backward when x=0 and y != 0.
For this case the non-zero part of the gradient is just y / (||x|| * ||y||).
The previous test evaluates y/(||x|| * ||y||) to y / eps, and this PR to 1/eps * y/||y||.

kostmo · 2019-12-17T20:56:34Z

💊 CircleCI build failures summary and remediations

As of commit 4b3510f:

1/7 broken upstream at merge base 5b321a0 since Jan 27
Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:
```
git fetch origin viable/strict
git rebase --onto viable/strict $(git merge-base origin/master HEAD)
```
If your commit is older than viable/strict:
```
git fetch origin viable/strict
git rebase viable/strict
```
Check out the recency history of this "viable master" tracking branch.
5/7 failures introduced in this PR
1/7 recognized as flaky ❄️
- Re-run these jobs?

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 5 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakage:

pytorch_linux_xenial_py3_clang5_mobile_build (1/5)

Step: "Build" (full log | pattern match details)

Jan 27 14:46:51 caused by: Connection refused (os error 111)

Jan 27 14:46:51 +++ eval 'extract_trap_cmd ' 
Jan 27 14:46:51 ++++ extract_trap_cmd 
Jan 27 14:46:51 ++++ printf '%s\n' '' 
Jan 27 14:46:51 +++ printf '%s\n' cleanup 
Jan 27 14:46:51 ++ trap -- ' 
Jan 27 14:46:51 cleanup' EXIT 
Jan 27 14:46:51 ++ which sccache 
Jan 27 14:46:51 ++ sccache --stop-server 
Jan 27 14:46:51 Stopping sccache server... 
Jan 27 14:46:51 error: couldn't connect to server 
Jan 27 14:46:51 caused by: Connection refused (os error 111) 
Jan 27 14:46:51 ++ true 
Jan 27 14:46:51 ++ rm /var/lib/jenkins/sccache_error.log 
Jan 27 14:46:51 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Jan 27 14:46:51 ++ true 
Jan 27 14:46:51 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Jan 27 14:46:51 ++ SCCACHE_IDLE_TIMEOUT=1200 
Jan 27 14:46:51 ++ RUST_LOG=sccache::server=error 
Jan 27 14:46:51 ++ sccache --start-server 
Jan 27 14:46:51 Starting sccache server... 
Jan 27 14:46:51 ++ sccache --zero-stats

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build (2/5)

Step: "Build" (full log | pattern match details)

Jan 27 14:52:23 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o

Jan 27 14:52:08   const int64_t ndim = input.dim(); 
Jan 27 14:52:08                 ^ 
Jan 27 14:52:08 1 warning generated. 
Jan 27 14:52:11 [272/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Convolution.cpp.o 
Jan 27 14:52:12 [273/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionTBC.cpp.o 
Jan 27 14:52:18 [274/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Cross.cpp.o 
Jan 27 14:52:19 [275/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool2d.cpp.o 
Jan 27 14:52:19 [276/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Copy.cpp.o 
Jan 27 14:52:20 [277/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DispatchStub.cpp.o 
Jan 27 14:52:23 [278/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o 
Jan 27 14:52:23 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o  
extra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/Distance.cpp 
Jan 27 14:52:23 /var/lib/jenkins/workspace/aten/src/ATen/native/Distance.cpp:8:10: fatal error: 'torch/torch.h' file not found 
Jan 27 14:52:23 #include <torch/torch.h> 
Jan 27 14:52:23          ^~~~~~~~~~~~~~~ 
Jan 27 14:52:23 1 error generated. 
Jan 27 14:52:26 [279/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool3d.cpp.o 
Jan 27 14:52:27 [280/591] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Dropout.cpp.o 
Jan 27 14:52:27 ninja: build stopped: subcommand failed.

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_mobile_code_analysis (3/5)

Step: "Build" (full log | pattern match details)

Jan 27 14:52:52 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o

Jan 27 14:52:48 1 warning generated. 
Jan 27 14:52:48 [303/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool3d.cpp.o 
Jan 27 14:52:48 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/DilatedMaxPool3d.cpp:5: 
Jan 27 14:52:48 ../../aten/src/ATen/native/Pool.h:58:17: warning: unused variable 'nOutputPlane' [-Wunused-variable] 
Jan 27 14:52:48   const int64_t nOutputPlane = nInputPlane; 
Jan 27 14:52:48                 ^ 
Jan 27 14:52:48 1 warning generated. 
Jan 27 14:52:48 [304/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DispatchStub.cpp.o 
Jan 27 14:52:48 [305/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Copy.cpp.o 
Jan 27 14:52:52 [306/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o 
Jan 27 14:52:52 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o  
extra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/Distance.cpp 
Jan 27 14:52:52 /var/lib/jenkins/workspace/aten/src/ATen/native/Distance.cpp:8:10: fatal error: 'torch/torch.h' file not found 
Jan 27 14:52:52 #include <torch/torch.h> 
Jan 27 14:52:52          ^~~~~~~~~~~~~~~ 
Jan 27 14:52:52 1 error generated. 
Jan 27 14:52:58 [307/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Dropout.cpp.o 
Jan 27 14:53:00 [308/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Embedding.cpp.o 
Jan 27 14:53:07 [309/627] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distributions.cpp.o 
Jan 27 14:53:07 ninja: build stopped: subcommand failed. 
Jan 27 14:53:07 + sccache_epilogue

pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (4/5)

Step: "Test" (full log | pattern match details)

Jan 27 15:31:53 RuntimeError: test_quantization failed!

Jan 27 15:31:53 Ran 36 tests in 48.158s 
Jan 27 15:31:53  
Jan 27 15:31:53 FAILED (errors=1, skipped=1) 
Jan 27 15:31:53  
Jan 27 15:31:53 Generating XML reports... 
Jan 27 15:31:53 Traceback (most recent call last): 
Jan 27 15:31:53   File "test/run_test.py", line 456, in <module> 
Jan 27 15:31:53     main() 
Jan 27 15:31:53   File "test/run_test.py", line 449, in main 
Jan 27 15:31:53     raise RuntimeError(message) 
Jan 27 15:31:53 RuntimeError: test_quantization failed! 
Jan 27 15:31:53 =================== sccache compilation log =================== 
Jan 27 15:31:53 + cleanup 
Jan 27 15:31:53 + retcode=1 
Jan 27 15:31:53 + set +x 
Jan 27 15:31:53 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Jan 27 15:31:53 Compile requests                  7 
Jan 27 15:31:53 Compile requests executed         6 
Jan 27 15:31:53 Cache hits                        0 
Jan 27 15:31:53 Cache misses                      6 
Jan 27 15:31:53 Cache timeouts                    0

pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX2_test (5/5)

Step: "Test" (full log | pattern match details)

Jan 27 15:38:13 RuntimeError: test_quantization failed!

Jan 27 15:38:13 Ran 36 tests in 55.292s 
Jan 27 15:38:13  
Jan 27 15:38:13 FAILED (errors=1, skipped=1) 
Jan 27 15:38:13  
Jan 27 15:38:13 Generating XML reports... 
Jan 27 15:38:13 Traceback (most recent call last): 
Jan 27 15:38:13   File "test/run_test.py", line 456, in <module> 
Jan 27 15:38:13     main() 
Jan 27 15:38:13   File "test/run_test.py", line 449, in main 
Jan 27 15:38:13     raise RuntimeError(message) 
Jan 27 15:38:13 RuntimeError: test_quantization failed! 
Jan 27 15:38:14 + cleanup 
Jan 27 15:38:14 + retcode=1 
Jan 27 15:38:14 + set +x 
Jan 27 15:38:14 =================== sccache compilation log =================== 
Jan 27 15:38:14 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Jan 27 15:38:14 Compile requests                32 
Jan 27 15:38:14 Compile requests executed       11 
Jan 27 15:38:14 Cache hits                       1 
Jan 27 15:38:14 Cache misses                    10 
Jan 27 15:38:14 Cache timeouts                   0

❄️ 1 failure recognized as flaky

The following build failures have been detected as flaky and may not be your fault:

pytorch_linux_xenial_cuda10_1_cudnn7_py3_gcc7_test (1/1)

Step: "Test" (full log | pattern match details) ❄️

Jan 27 16:04:56 ConnectionResetError: [Errno 104] Connection reset by peer

Jan 27 16:04:56   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 455, in accept 
Jan 27 16:04:56     deliver_challenge(c, self._authkey) 
Jan 27 16:04:56   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 722, in deliver_challenge 
Jan 27 16:04:56     response = connection.recv_bytes(256)        # reject large message 
Jan 27 16:04:56   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes 
Jan 27 16:04:56     buf = self._recv_bytes(maxlength) 
Jan 27 16:04:56   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes 
Jan 27 16:04:56     buf = self._recv(4) 
Jan 27 16:04:56   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv 
Jan 27 16:04:56     chunk = read(handle, remaining) 
Jan 27 16:04:56 ConnectionResetError: [Errno 104] Connection reset by peer 
Jan 27 16:04:56 /opt/conda/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown 
Jan 27 16:04:56   len(cache)) 
Jan 27 16:04:59 Process ErrorTrackingProcess-122: 
Jan 27 16:04:59 Traceback (most recent call last): 
Jan 27 16:04:59   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
Jan 27 16:04:59     self.run() 
Jan 27 16:04:59   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 333, in run 
Jan 27 16:04:59     super(ErrorTrackingProcess, self).run() 
Jan 27 16:04:59   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
Jan 27 16:04:59     self._target(*self._args, **self._kwargs)

🚧 1 upstream failure recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:

pytorch_macos_10_13_py3_test from Jan 27

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 29 times.

ezyang · 2019-12-18T06:19:49Z

Can we get a comment explaining the strategy here? It would also be good to compare it with the old implementation as well in the comment.

Shall I touch upon how division by zero is handled?

So you are avoiding overflow numerical problems of the original implementation, but you are exposing yourself to undeflow problems with yours (when normalizing, individual components may underflow and turn to 0). Do we have a feeling why one sort of tradeoffs is better than other?

Underflow is not a problem, floats are best when the value is small (more significant bits in mantissa). And their precision deteriorates once their absolute value increases.

It is also one of the reasons why you normalize, say, imagines, before feeding them to a network... All these inner products inside can cause troubles otherwise

Ah, sorry, I assume that this function is used with floating point types. Would that be a correct assumption?

Yes, this function is used with floating point types. But you are right, usually underflow is less of a problem than overflow. It's possible to create an adversarial example though where instead of getting close to 1 cosine similarity you'd get a significantly smaller value.

rgommers · 2020-01-02T15:20:55Z

@nikitaved can you add regression tests for cases that used to give > 1. or < -1. return value?

Also, it would be good to expand on this comment in the PR description:

P.S. I had to change tests because this implementation handles division by 0 differently.

The change in behavior is nicely explained in the comment, but it looks to me like the old test should still pass as well. Best never to change tests in PRs where you also change behavior; that's typically a sign something is wrong. So I'd suggest to change that back, and only add new tests

nikitaved · 2020-01-02T19:22:37Z

@rgommers, ok, I will add the tests. The old test will not pass. In order to show that I need to write down the derivative and point out the difference, so it becomes rather "technical". Shall I at least write the explanation here?

rgommers · 2020-01-02T19:27:57Z

In order to show that I need to write down the derivative and point out the difference, so it becomes rather "technical". Shall I at least write the explanation here?

okay. just the explanation here will be useful

albanD · 2020-01-03T10:02:01Z

The change from max(foo, eps) to foo + eps is to enforce that the normalized vectors have norms <1 ?
The goal being that floating point errors in the dot product won't push us >1 ?
How big should eps be to make sure this is always true?

It is to enforce that we do not divide by zero, and yes, as a side effect, it makes norms less than 1. I can't say I am 100% sure that this will work for any epsilon now that I think about it, but it does work for the standard epsilon (1e-8). Anyway, this implementation should tolerate larger set of epsilon values compared to the older algorithm because the vectors in the inner product live in a unit ball, and mantissa of floating point values is utilized much better.

But the difference with the max is that if the norm is small (like 2*eps), you will make a much bigger error here. Is that something we should worry about?

Well, that is a philosophical dilemma of how to handle zeroes. I personally do not like using max, because it makes the function potentially non-differentiable... Are you ok with non-differentiability? Also, I did play with max, and had some issues with autograd not being able to process backward...

You can avoid this problem by doing this change in a non differentiable manner using NoGradGuard around the max operations. That way, this won't change the gradient computation.

Could you please explain why then clamping did not introduce any issues in the older implementation?

The current implementation do not have it.
And the other PR should have the NoGradGuard as well to avoid this issue. There is a comment in it asking to do so.

ezyang · 2020-01-07T20:11:42Z

I am prepared to merge this, but @nikitaved I'd like to know if you were planning to incorporate Alban's suggestion (use max instead of adding epsilon)

nikitaved · 2020-01-08T11:10:30Z

@ezyang, yes, I will try the things suggested by @albanD.

nikitaved · 2020-01-08T14:14:58Z

@albanD, seems like I am doing something wrong. This causes an error:

  auto x1_norm = at::norm(x1, 2, dim, true);
  auto x2_norm = at::norm(x2, 2, dim, true);

  {
    at::NoGradGuard guard;
    x1_norm.clamp_max_(std::sqrt(eps));
    x2_norm.clamp_max_(std::sqrt(eps));
  }

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [4, 1]], which is output 0 of NormBackward1, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

albanD · 2020-01-08T14:20:51Z

The problem is that you cannot change the result of the norm operation inplace because as you can see in the gradient formula the value of the output is needed to compute the gradients.
You will need to do the clamp out of place. Or define a custom backward for cosine_similarity.

albanD · 2020-01-08T14:49:36Z

Right, this would do what you want?

  auto x1_pow = at::pow(x1, 2).sum(dim, true);
  auto x2_pow = at::pow(x2, 2).sum(dim, true);

  {
    at::NoGradGuard guard;
    x1_pow.clamp_min_(eps);
    x2_pow.clamp_min_(eps);
  }

  auto x1_norm = x1_pow.sqrt_();
  auto x2_norm = x2_pow.sqrt_();

nikitaved · 2020-01-08T15:04:11Z

Sorry, it is the sum that has to be clamped. I used at::norm because you told me it is more stable near 0. But then, similar to the current implementation, it is sufficient to use inner product. This inner product can be clamped inplace. Could you tell me why clamping outside a guarded block might be problematic?

albanD · 2020-01-08T15:22:03Z

Could you tell me why clamping outside a guarded block might be problematic?

The guard won't change anything wrt inplace checks, If it is problematic inside, it would be problematic without the guard as well.
In this case, it is problematic only because the output of norm is needed for backward, so you are not allowed to modify it inplace.

Right I edited the code above.

nikitaved · 2020-01-08T15:33:46Z

@albanD, sorry for the confusion. It is exactly the question I already asked you previously. The current implementation uses unguarded clamps. From your answer I understood that it is better to guard it. So, guarded or unguarded, which is better and why?

nikitaved · 2020-01-08T17:51:05Z

@ezyang, I updated the code following the suggestions of @albanD. The example from 29442 now works without any clamping (note that the input vector to BCELoss has a negative value), but I decided to add it just in case. Still, the test checking division by zero fails (because of sqrt in the denominator). Do I have to do anything about it?

albanD · 2020-01-08T18:41:26Z

This is the backward graph generated by the current code in master:

The question is if the ClampMinMax node should be here or not. What it does is set the gradient flowing through it to 0 if it did clamping and let them flow as is otherwise.
If you add the guard, this node will be removed and the gradient will be unchanged even when clamp actually changed the value.

In my opinion, if you clamp, it is only for numerical stability and that should not zero-out any gradient. So you want to guard.

nikitaved · 2020-01-08T18:53:58Z

@albanD, it is an off-topic, bud could you recommend something to read about the autograd in pytorch?

albanD · 2020-01-08T19:46:42Z

You can check the current autograd note in the doc that give some info.
This PR attempts to clarify it: #30253 but it is still work in progress.

ezyang · 2020-01-09T16:34:55Z

Re your question about division by zero, let's ask @ailzhang, who added this in #18250. Ailing, do you think it's OK to adjust the zero test given our different implementation strategy here?

nikitaved · 2020-01-20T08:45:05Z

@ezyang, it is possible to keep the current testing approach, but for that I need the following logic implemented efficiently:

if (||x||^2 < eps OR ||y||^2 < eps) { ... }
else { ... }

Is there a way to do that without a loop over the batch dimension?

nikitaved · 2022-04-21T17:18:33Z

@albanD , @ngimel , I have removed clamp at +-1 to increase the performance. Unfortunately, I was not able to use norm as we need to condition inputs prior to taking sqrt as the grad relies on norm reciprocals squared.

albanD

Thanks for the update!

Updated

nikitaved · 2022-04-22T09:27:25Z

Thank you, let's merge it then!

nikitaved · 2022-04-22T09:27:30Z

@pytorchbot merge this please

github-actions · 2022-04-22T09:29:28Z

Hey @nikitaved.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: **Previous behavior**: compute inner product, then normalize. **This patch**: first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. By design ensures that cosine similarity is within `[-1.0, +1.0]`, so it should fix [#29442](#29442). P.S. I had to change tests because this implementation handles division by 0 differently. This PR computes cosine similarity as follows: <x/max(eps, ||x||), y/max(eps, ||y||)>. Let f(x,y) = <x,y>/(||x|| * ||y||), then df/dx = y/(||x|| * ||y||) - (||y||/||x|| * <x,y> * x)/(||x|| * ||y||)^2. The changed test checks division by zero in backward when x=0 and y != 0. For this case the non-zero part of the gradient is just y / (||x|| * ||y||). The previous test evaluates y/(||x|| * ||y||) to y / eps, and this PR to 1/eps * y/||y||. Pull Request resolved: #31378 Approved by: https://github.com/ezyang, https://github.com/albanD Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/9e137ee583c4fdb2dd3aa0c425dc9c289454cbf2 Reviewed By: seemethere Differential Revision: D35850003 fbshipit-source-id: 5de5563377ad3fa8acc62d0d974b241ee5bc0af1

lezcano · 2023-07-07T00:08:40Z

This implementation doesn't compute, albeit more stable, does not compute the cosine similarity as per the docs. If the norm of both x_1 and x_2 is less than eps, here we are dividing by eps^2, rather than by eps, as the docs declare.

That being said, I like this implementation much better, numerically speaking, than the previous one, so I'm more in favour of fixing the docs than fixing the code really.

albanD · 2023-07-07T12:39:15Z

Updating the doc sounds better than adding another max(eps, denom) to push eps^2 to eps.

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. [ghstack-poisoned]

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

…ility and the gradients" There was a regression in #31378 which was reported in #104564. This PR should keep the efficiency and memory usage from the original implementation, while keeping the stability of the latter. This solution was already discussed in #31378, but it was discarded because it modified the vector_norm in-place. The only magic ingredient that was missing for that solution to work was to add a `clone()` after calling the `vector_norm`. I hope this PR takes shorter to land than #104564. Fixes #104564 [ghstack-poisoned]

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. ghstack-source-id: 390edad Pull Request resolved: #104772

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. ghstack-source-id: 308ef4d Pull Request resolved: #104772

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

…keeping the stability and the gradients" There was a regression in #31378 which was reported in #104564. This PR should keep the efficiency and memory usage from the original implementation, while keeping the stability of the latter. This solution was already discussed in #31378, but it was discarded because it modified the vector_norm in-place. The only magic ingredient that was missing for that solution to work was to add a `clone()` after calling the `vector_norm`. I hope this PR takes shorter to land than #104564. Fixes #104564 [ghstack-poisoned]

…ility and the gradients" There was a regression in #31378 which was reported in #104564. This PR should keep the efficiency and memory usage from the original implementation, while keeping the stability of the latter. This solution was already discussed in #31378, but it was discarded because it modified the vector_norm in-place. The only magic ingredient that was missing for that solution to work was to add a `clone()` after calling the `vector_norm`. I hope this PR takes shorter to land than #104564. Fixes #104564 [ghstack-poisoned]

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

…he gradients (#104771) There was a regression in #31378 which was reported in #104564. This PR should keep the efficiency and memory usage from the original implementation, while keeping the stability of the latter. This solution was already discussed in #31378, but it was discarded because it modified the vector_norm in-place. The only magic ingredient that was missing for that solution to work was to add a `clone()` after calling the `vector_norm`. I hope this PR takes shorter to land than #104564. Fixes #104564 Pull Request resolved: #104771 Approved by: https://github.com/albanD

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. Pull Request resolved: #104772 Approved by: https://github.com/albanD, https://github.com/svekars

**Previous behavior**: compute inner product, then normalize. **This patch**: first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. By design ensures that cosine similarity is within `[-1.0, +1.0]`, so it should fix [pytorch#29442](pytorch#29442). P.S. I had to change tests because this implementation handles division by 0 differently. This PR computes cosine similarity as follows: <x/max(eps, ||x||), y/max(eps, ||y||)>. Let f(x,y) = <x,y>/(||x|| * ||y||), then df/dx = y/(||x|| * ||y||) - (||y||/||x|| * <x,y> * x)/(||x|| * ||y||)^2. The changed test checks division by zero in backward when x=0 and y != 0. For this case the non-zero part of the gradient is just y / (||x|| * ||y||). The previous test evaluates y/(||x|| * ||y||) to y / eps, and this PR to 1/eps * y/||y||. Pull Request resolved: pytorch#31378 Approved by: https://github.com/ezyang, https://github.com/albanD

nikitaved requested review from albanD and ezyang December 17, 2019 19:12

nikitaved added the open source label Dec 17, 2019

ezyang reviewed Dec 18, 2019

View reviewed changes

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 20, 2019

rgommers mentioned this pull request Jan 2, 2020

Restrict cosine_similarity max value to 1.0 #29513

Closed

albanD reviewed Jan 3, 2020

View reviewed changes

ezyang added module: bc-breaking Related to a BC-breaking change module: numerical-stability Problems related to numerical stability of operations labels Jan 7, 2020

ezyang approved these changes Jan 7, 2020

View reviewed changes

nikitaved force-pushed the better-cosine-similarity branch from a06a264 to 0a8d2ee Compare January 8, 2020 17:45

ezyang requested a review from ailzhang January 9, 2020 16:35

albanD approved these changes Apr 21, 2022

View reviewed changes

pytorchmergebot closed this in 9e137ee Apr 22, 2022

nikitaved mentioned this pull request Apr 22, 2022

torch.nn.functional.cosine_similarity(x, y) result greater than 1.0, and x != y. #29442

Closed

chenzhekl mentioned this pull request Jun 29, 2022

Improve the numerical stability of cosine similarity rusty1s/pytorch_cluster#134

Open

lezcano mentioned this pull request Jul 7, 2023

Memory use with torch.nn.functional.cosine_similarity() nearly doubled from torch 1.11.0 to torch 2.0.1 #104564

Closed

This was referenced Jul 7, 2023

Fuse ops in eager cosine_similarity while keeping the stability and the gradients #104771

Closed

Fix the docs for cosine_similarity #104772

Closed

lezcano added a commit that referenced this pull request Jul 7, 2023

Fix the docs for cosine_similarity

5f71f5e

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 7, 2023

Update base for Update on "Fix the docs for cosine_similarity"

8dc5134

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 7, 2023

Update on "Fix the docs for cosine_similarity"

e66c1a7

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 7, 2023

Fix the docs for cosine_similarity

35f9fa6

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. ghstack-source-id: 390edad Pull Request resolved: #104772

lezcano added a commit that referenced this pull request Jul 25, 2023

Fix the docs for cosine_similarity

6151afa

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. ghstack-source-id: 308ef4d Pull Request resolved: #104772

lezcano added a commit that referenced this pull request Jul 25, 2023

Update base for Update on "Fix the docs for cosine_similarity"

8394b48

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

lezcano added a commit that referenced this pull request Jul 25, 2023

Update on "Fix the docs for cosine_similarity"

0ff48a0

The behaviour of `cosine_similarity` was subtly changed in #31378, but the docs were not updated. cc svekars carljparker [ghstack-poisoned]

Conversation

nikitaved commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostmo commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Detailed failure analysis

🕵️ 5 new failures recognized by patterns

pytorch_linux_xenial_py3_clang5_mobile_build (1/5)

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build (2/5)

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_mobile_code_analysis (3/5)

pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (4/5)

pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX2_test (5/5)

❄️ 1 failure recognized as flaky

pytorch_linux_xenial_cuda10_1_cudnn7_py3_gcc7_test (1/1)

🚧 1 upstream failure recognized by patterns:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented Jan 2, 2020

Uh oh!

nikitaved commented Jan 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Jan 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Jan 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Jan 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Jan 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 7, 2020

Uh oh!

nikitaved commented Jan 8, 2020

Uh oh!

nikitaved commented Jan 8, 2020

Uh oh!

albanD commented Jan 8, 2020

Uh oh!

albanD commented Jan 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jan 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Jan 8, 2020

nikitaved commented Dec 17, 2019 •

edited

Loading

kostmo commented Dec 17, 2019 •

edited

Loading

nikitaved Dec 20, 2019 •

edited

Loading

nikitaved Dec 20, 2019 •

edited

Loading

nikitaved commented Jan 2, 2020 •

edited

Loading

nikitaved Jan 3, 2020 •

edited

Loading

nikitaved Jan 3, 2020 •

edited

Loading

nikitaved Jan 3, 2020 •

edited

Loading

albanD commented Jan 8, 2020 •

edited

Loading

nikitaved commented Jan 8, 2020 •

edited

Loading

nikitaved commented Jan 8, 2020 •

edited

Loading

nikitaved commented Jan 8, 2020 •

edited

Loading

nikitaved commented Jan 20, 2020 •

edited

Loading

albanD commented Jul 7, 2023 •

edited

Loading