Skip to content

Sparse benchmarking utils#48397

Closed
aocsa wants to merge 5 commits intopytorch:masterfrom
Quansight:aocsa/sparse_bench
Closed

Sparse benchmarking utils#48397
aocsa wants to merge 5 commits intopytorch:masterfrom
Quansight:aocsa/sparse_bench

Conversation

@aocsa
Copy link
Copy Markdown
Contributor

@aocsa aocsa commented Nov 23, 2020

This is a benchmarking tooling to work with sparse tensors. To implement this, we extended PR benchmarking util #38338 for sparse tensors. In order to extend the proposed utility library the FuzzedTensor class was extended by creating the new FuzzedSparseTensor class. In addition two new operator classes were added, the UnaryOpSparseFuzzer and BinaryOpSparseFuzzer.

The class FuzzedSparseTensor adds new input parameters to the constructor:

  1. sparse_dim: The number of sparse dimensions in a sparse tensor.
  2. nnz: Number of non-zero elements in the sparse tensor.
  3. density: The density of the sparse tensor.
  4. coalesced: As we know the sparse tensor format permits coalesced/uncoalesced sparse tensors.

and removes probability_contiguous, max_allocation_bytes, roll_parameter, tensor_constructor as they are dense-tensors related parameters.

In addition, I've extended the torch.utils.benchmark.examples to work with the new classes FuzzedSparseTensor, UnaryOpSparseFuzzer and BinaryOpSparseFuzzer.

Hopefully, this tooling and these examples will help to make other benchmarks in other PRs. Looking forward to your thoughts and feedback. cc @robieta, @mruberry, @ngimel

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Nov 23, 2020

💊 CI failures summary and remediations

As of commit ed680bd (more details on the Dr. CI page):



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jan 14 07:24:16 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 07:24:16                        ~~~~ <--- HERE
Jan 14 07:24:16 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 07:24:16 
Jan 14 07:24:16 
Jan 14 07:24:16 ======================================================================
Jan 14 07:24:16 ERROR [0.204s]: test_where_and_typing (__main__.TestTEFuser)
Jan 14 07:24:16 ----------------------------------------------------------------------
Jan 14 07:24:16 Traceback (most recent call last):
Jan 14 07:24:16   File "test_jit_fuser_te.py", line 1142, in test_where_and_typing
Jan 14 07:24:16     x = torch.randn(4, 4, dtype=torch.double, device=device)
Jan 14 07:24:16 RuntimeError: CUDA error: an illegal memory access was encountered
Jan 14 07:24:16 
Jan 14 07:24:16 ======================================================================
Jan 14 07:24:16 ERROR [0.189s]: test_zero_element_tensors_cuda (__main__.TestTEFuser)
Jan 14 07:24:16 ----------------------------------------------------------------------
Jan 14 07:24:16 Traceback (most recent call last):
Jan 14 07:24:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 888, in wrapper
Jan 14 07:24:16     method(*args, **kwargs)
Jan 14 07:24:16   File "test_jit_fuser_te.py", line 178, in test_zero_element_tensors_cuda
Jan 14 07:24:16     self._test_zero_element_tensors(device="cuda")
Jan 14 07:24:16   File "test_jit_fuser_te.py", line 174, in _test_zero_element_tensors

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 23, 2020

Codecov Report

Merging #48397 (e4e153e) into master (19f4c51) will decrease coverage by 0.07%.
The diff coverage is 4.92%.

@@            Coverage Diff             @@
##           master   #48397      +/-   ##
==========================================
- Coverage   80.77%   80.69%   -0.08%     
==========================================
  Files        1867     1860       -7     
  Lines      201583   200202    -1381     
==========================================
- Hits       162819   161558    -1261     
+ Misses      38764    38644     -120     

@aocsa aocsa force-pushed the aocsa/sparse_bench branch 2 times, most recently from fb8bb7a to 6093e93 Compare December 10, 2020 23:00
@aocsa aocsa requested review from mruberry, ngimel and robieta December 10, 2020 23:37
@aocsa aocsa changed the title [WIP] Sparse benchmarking utils Sparse benchmarking utils Dec 11, 2020
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 11, 2020
@rgommers rgommers added the module: sparse Related to torch.sparse label Dec 14, 2020
Comment thread torch/utils/benchmark/utils/sparse_fuzzer.py Outdated
Copy link
Copy Markdown
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like reusable parts of this PR are BinaryOpSparseFuzzer and UnaryOpSparseFuzzer, compare.py and fuzzer.py don't really add context compared to their non-sparse counterparts. For those, you probably want to specify densities and not nnzs though.
If you want to use fuzzers to benchmark matmuls, you probably will need to create matmul sparse fuzzer. In the near term, since we probably are not going to focus on benchmarking unary and binary ops, working on the infrastructure for matmuls is more valuable.

Comment thread torch/utils/benchmark/op_fuzzers/sparse_binary.py Outdated
Comment thread torch/utils/benchmark/utils/sparse_fuzzer.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like compare.py and fuzzer.py are not adding a lot to the existing dense examples of the same name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzzer.py is just an and example about how to use the new FuzzedSparseTensor. In the other hand compare.py is an example of how to bench matmul operation over sparse matrices. A more complete benchmark reference for matmul operations was submitted previously in a different PR https://github.com/pytorch/pytorch/blob/master/benchmarks/sparse/matmul_dlmc_bench.py. This benchmark uses a popular and real sparse matrix dataset https://github.com/google-research/google-research/tree/master/sgk.

Comment thread torch/utils/benchmark/examples/sparse/op_benchmark.py Outdated
@aocsa aocsa force-pushed the aocsa/sparse_bench branch from a867f4d to 978125b Compare December 19, 2020 03:26
@aocsa aocsa self-assigned this Jan 4, 2021
@aocsa aocsa force-pushed the aocsa/sparse_bench branch from 045d079 to ed680bd Compare January 14, 2021 04:35
@aocsa
Copy link
Copy Markdown
Contributor Author

aocsa commented Jan 14, 2021

I addressed review comments.

I feel like reusable parts of this PR are BinaryOpSparseFuzzer and UnaryOpSparseFuzzer, compare.py and fuzzer.py don't really add context compared to their non-sparse counterparts. For those, you probably want to specify densities and not nnzs though.

If you want to use fuzzers to benchmark matmuls, you probably will need to create matmul sparse fuzzer. In the near term, since we probably are not going to focus on benchmarking unary and binary ops, working on the infrastructure for matmuls is more valuable.

Regarding this request, I previously worked doing sparse matmul benchmarks using a real sparse matrix dataset. Would you think necessary to use a synthetic dataset implementing a matmul sparse fuzzer instead of using a real dataset?

@ngimel, @mruberry - looking forward your comments.

@mruberry
Copy link
Copy Markdown
Collaborator

I addressed review comments.

I feel like reusable parts of this PR are BinaryOpSparseFuzzer and UnaryOpSparseFuzzer, compare.py and fuzzer.py don't really add context compared to their non-sparse counterparts. For those, you probably want to specify densities and not nnzs though.

If you want to use fuzzers to benchmark matmuls, you probably will need to create matmul sparse fuzzer. In the near term, since we probably are not going to focus on benchmarking unary and binary ops, working on the infrastructure for matmuls is more valuable.

Regarding this request, I previously worked doing sparse matmul benchmarks using a real sparse matrix dataset. Would you think necessary to use a synthetic dataset implementing a matmul sparse fuzzer instead of using a real dataset?

@ngimel, @mruberry - looking forward your comments.

A real dataset would be great!

@mruberry mruberry self-requested a review January 21, 2021 23:22
Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks @aocsa!

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in 002d978.

facebook-github-bot pushed a commit that referenced this pull request Mar 14, 2021
Summary:
Minor PR following up the previous PR about sparse benchmarking utils #48397

Fixes #44634:  Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)

I ran  all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels.

 ---
<details><summary> forward tests (expand for details).
</summary>

- `sparse@sparse`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                           |   0.5   |   0.7   |   0.8   |   0.9   |  0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense    |  108.1  |  100.5  |  101.3  |  108.4  |  98.4  |  187.4
      torch:sparse@sparse  |  659.1  |  368.8  |  156.5  |   53.3  |  26.8  |   14.9
      scipy:sparse@sparse  |  565.1  |  233.9  |  130.2  |   23.1  |  21.6  |   15.2

Times are in milliseconds (ms).

[----------------------------------- cuda:matmul-forward -----------------------------------]
                           |    0.5    |    0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: ----------------------------------------------------------------------------------
      torch:dense@dense    |   2243.5  |   4392.5  |  4419.8  |  2272.3  |  4433.9  |  8920.1
      torch:sparse@sparse  |  21369.2  |  11877.6  |  7339.2  |  1787.2  |  1335.1  |   845.7

Times are in microseconds (us).

```
- `sparse@dense`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  105.8  |  103.8  |  103.0  |  104.4  |  104.4  |  197.0
      torch:sparse@dense  |  119.9  |  102.4  |   84.0  |   19.7  |   16.8  |   11.6
      scipy:sparse@dense  |  906.5  |  799.6  |  697.8  |  182.2  |  165.5  |  135.4

Times are in milliseconds (ms).

[------------------------- cuda:matmul-forward --------------------------]
                          |  0.5  |  0.7  |  0.8  |  0.9  |  0.95  |  0.98
1 threads: ---------------------------------------------------------------
      torch:dense@dense   |  2.2  |  4.4  |  4.4  |  2.3  |  4.5   |  2.3
      torch:sparse@dense  |  5.7  |  6.6  |  4.5  |  1.4  |  1.4   |  1.3

Times are in milliseconds (ms).

```
- `sparse@vector`
```
[----------------------------------- cpu:matmul-forward ----------------------------------]
                           |    0.5    |   0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: --------------------------------------------------------------------------------
      torch:dense@vector   |    510.6  |   505.8  |   759.6  |   782.1  |   682.4  |  764.6
      torch:sparse@vector  |  10122.8  |  6241.1  |  7935.6  |  2076.3  |  1049.5  |  826.3
      scipy:sparse@vector  |   1756.7  |  1033.9  |   678.2  |   343.5  |   168.5  |   65.4

Times are in microseconds (us).

[-------------------------------- cuda:matmul-forward --------------------------------]
                           |   0.5    |   0.7    |   0.8   |   0.9   |   0.95  |   0.98
1 threads: ----------------------------------------------------------------------------
      torch:dense@vector   |    36.1  |    21.5  |   21.6  |   21.5  |   21.6  |   21.5
      torch:sparse@vector  |  1099.2  |  1289.4  |  775.7  |  327.1  |  285.4  |  274.0

Times are in microseconds (us).

```
</details>

 ---
<details><summary> backward tests (expand for details).
</summary>

- `sparse@sparse`
```
[--------------------------------- cpu:matmul-backward ---------------------------------]
                           |   0.5    |   0.7    |   0.8    |   0.9    |   0.95  |   0.98
1 threads: ------------------------------------------------------------------------------
      torch:dense@dense    |   246.1  |   315.0  |   306.9  |   168.6  |  290.6  |  146.9
      torch:sparse@sparse  |  6417.5  |  4393.7  |  3012.7  |  1029.4  |  908.0  |  650.7

Times are in microseconds (us).

[----------------------------- cuda:matmul-backward -----------------------------]
                           |   0.5   |   0.7   |   0.8   |  0.9   |  0.95  |  0.98
1 threads: -----------------------------------------------------------------------
      torch:dense@dense    |    6.7  |   13.3  |   13.3  |   6.9  |  13.5  |   6.9
      torch:sparse@sparse  |  143.7  |  143.4  |  119.6  |  29.5  |  29.1  |  10.9

Times are in microseconds (us).

```
- `sparse@dense`
```
 [------------------------------ cpu:matmul-backward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  185.9  |  304.8  |  305.8  |  169.9  |  308.7  |  168.4
      torch:sparse@dense  |  407.9  |  345.8  |  274.6  |  114.2  |  163.6  |  230.5

Times are in milliseconds (ms).

[--------------------------- cuda:matmul-backward --------------------------]
                          |  0.5   |  0.7   |  0.8   |  0.9  |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
      torch:dense@dense   |   6.7  |  13.3  |  13.3  |  6.9  |  13.4  |   6.9
      torch:sparse@dense  |  16.7  |  19.0  |  15.1  |  6.3  |   8.2  |  12.7

Times are in milliseconds (ms).

```
</details>

Kindly review this  PR. cc mruberry, ngimel

Pull Request resolved: #51647

Reviewed By: albanD

Differential Revision: D27007809

Pulled By: mruberry

fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Minor PR following up the previous PR about sparse benchmarking utils pytorch#48397

Fixes pytorch#44634:  Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)

I ran  all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels.

 ---
<details><summary> forward tests (expand for details).
</summary>

- `sparse@sparse`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                           |   0.5   |   0.7   |   0.8   |   0.9   |  0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense    |  108.1  |  100.5  |  101.3  |  108.4  |  98.4  |  187.4
      torch:sparse@sparse  |  659.1  |  368.8  |  156.5  |   53.3  |  26.8  |   14.9
      scipy:sparse@sparse  |  565.1  |  233.9  |  130.2  |   23.1  |  21.6  |   15.2

Times are in milliseconds (ms).

[----------------------------------- cuda:matmul-forward -----------------------------------]
                           |    0.5    |    0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: ----------------------------------------------------------------------------------
      torch:dense@dense    |   2243.5  |   4392.5  |  4419.8  |  2272.3  |  4433.9  |  8920.1
      torch:sparse@sparse  |  21369.2  |  11877.6  |  7339.2  |  1787.2  |  1335.1  |   845.7

Times are in microseconds (us).

```
- `sparse@dense`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  105.8  |  103.8  |  103.0  |  104.4  |  104.4  |  197.0
      torch:sparse@dense  |  119.9  |  102.4  |   84.0  |   19.7  |   16.8  |   11.6
      scipy:sparse@dense  |  906.5  |  799.6  |  697.8  |  182.2  |  165.5  |  135.4

Times are in milliseconds (ms).

[------------------------- cuda:matmul-forward --------------------------]
                          |  0.5  |  0.7  |  0.8  |  0.9  |  0.95  |  0.98
1 threads: ---------------------------------------------------------------
      torch:dense@dense   |  2.2  |  4.4  |  4.4  |  2.3  |  4.5   |  2.3
      torch:sparse@dense  |  5.7  |  6.6  |  4.5  |  1.4  |  1.4   |  1.3

Times are in milliseconds (ms).

```
- `sparse@vector`
```
[----------------------------------- cpu:matmul-forward ----------------------------------]
                           |    0.5    |   0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: --------------------------------------------------------------------------------
      torch:dense@vector   |    510.6  |   505.8  |   759.6  |   782.1  |   682.4  |  764.6
      torch:sparse@vector  |  10122.8  |  6241.1  |  7935.6  |  2076.3  |  1049.5  |  826.3
      scipy:sparse@vector  |   1756.7  |  1033.9  |   678.2  |   343.5  |   168.5  |   65.4

Times are in microseconds (us).

[-------------------------------- cuda:matmul-forward --------------------------------]
                           |   0.5    |   0.7    |   0.8   |   0.9   |   0.95  |   0.98
1 threads: ----------------------------------------------------------------------------
      torch:dense@vector   |    36.1  |    21.5  |   21.6  |   21.5  |   21.6  |   21.5
      torch:sparse@vector  |  1099.2  |  1289.4  |  775.7  |  327.1  |  285.4  |  274.0

Times are in microseconds (us).

```
</details>

 ---
<details><summary> backward tests (expand for details).
</summary>

- `sparse@sparse`
```
[--------------------------------- cpu:matmul-backward ---------------------------------]
                           |   0.5    |   0.7    |   0.8    |   0.9    |   0.95  |   0.98
1 threads: ------------------------------------------------------------------------------
      torch:dense@dense    |   246.1  |   315.0  |   306.9  |   168.6  |  290.6  |  146.9
      torch:sparse@sparse  |  6417.5  |  4393.7  |  3012.7  |  1029.4  |  908.0  |  650.7

Times are in microseconds (us).

[----------------------------- cuda:matmul-backward -----------------------------]
                           |   0.5   |   0.7   |   0.8   |  0.9   |  0.95  |  0.98
1 threads: -----------------------------------------------------------------------
      torch:dense@dense    |    6.7  |   13.3  |   13.3  |   6.9  |  13.5  |   6.9
      torch:sparse@sparse  |  143.7  |  143.4  |  119.6  |  29.5  |  29.1  |  10.9

Times are in microseconds (us).

```
- `sparse@dense`
```
 [------------------------------ cpu:matmul-backward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  185.9  |  304.8  |  305.8  |  169.9  |  308.7  |  168.4
      torch:sparse@dense  |  407.9  |  345.8  |  274.6  |  114.2  |  163.6  |  230.5

Times are in milliseconds (ms).

[--------------------------- cuda:matmul-backward --------------------------]
                          |  0.5   |  0.7   |  0.8   |  0.9  |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
      torch:dense@dense   |   6.7  |  13.3  |  13.3  |  6.9  |  13.4  |   6.9
      torch:sparse@dense  |  16.7  |  19.0  |  15.1  |  6.3  |   8.2  |  12.7

Times are in milliseconds (ms).

```
</details>

Kindly review this  PR. cc mruberry, ngimel

Pull Request resolved: pytorch#51647

Reviewed By: albanD

Differential Revision: D27007809

Pulled By: mruberry

fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
This is a benchmarking tooling to work with sparse tensors. To implement this, we extended PR `benchmarking util` [https://github.com/pytorch/pytorch/issues/38338](https://github.com/pytorch/pytorch/pull/38338) for sparse tensors.   In order to extend the proposed utility library the **FuzzedTensor** class was extended  by creating the new **FuzzedSparseTensor** class.  In addition two new operator classes were added, the `UnaryOpSparseFuzzer` and `BinaryOpSparseFuzzer`.

The class `FuzzedSparseTensor` adds new input parameters to the constructor:
1. `sparse_dim`: The number of sparse dimensions in a sparse tensor.
2. `nnz`:   Number of non-zero elements in the sparse tensor.
3. `density`: The density of the sparse tensor.
4. `coalesced`: As we know the sparse tensor format permits coalesced/uncoalesced sparse tensors.

and removes `probability_contiguous`, `max_allocation_bytes`, `roll_parameter`, `tensor_constructor` as they are dense-tensors related parameters.

In addition, I've extended the `torch.utils.benchmark.examples` to work with the new classes `FuzzedSparseTensor`, `UnaryOpSparseFuzzer` and `BinaryOpSparseFuzzer`.

Hopefully, this tooling and these examples will help to make other benchmarks in other PRs. Looking forward to your thoughts and feedback. cc robieta, mruberry,  ngimel

Pull Request resolved: pytorch#48397

Reviewed By: ejguan

Differential Revision: D26008137

Pulled By: mruberry

fbshipit-source-id: 2f37811c7c3eaa3494a0f2500e519267f2186dfb
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Minor PR following up the previous PR about sparse benchmarking utils pytorch#48397

Fixes pytorch#44634:  Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)

I ran  all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels.

 ---
<details><summary> forward tests (expand for details).
</summary>

- `sparse@sparse`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                           |   0.5   |   0.7   |   0.8   |   0.9   |  0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense    |  108.1  |  100.5  |  101.3  |  108.4  |  98.4  |  187.4
      torch:sparse@sparse  |  659.1  |  368.8  |  156.5  |   53.3  |  26.8  |   14.9
      scipy:sparse@sparse  |  565.1  |  233.9  |  130.2  |   23.1  |  21.6  |   15.2

Times are in milliseconds (ms).

[----------------------------------- cuda:matmul-forward -----------------------------------]
                           |    0.5    |    0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: ----------------------------------------------------------------------------------
      torch:dense@dense    |   2243.5  |   4392.5  |  4419.8  |  2272.3  |  4433.9  |  8920.1
      torch:sparse@sparse  |  21369.2  |  11877.6  |  7339.2  |  1787.2  |  1335.1  |   845.7

Times are in microseconds (us).

```
- `sparse@dense`
```
[------------------------------- cpu:matmul-forward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  105.8  |  103.8  |  103.0  |  104.4  |  104.4  |  197.0
      torch:sparse@dense  |  119.9  |  102.4  |   84.0  |   19.7  |   16.8  |   11.6
      scipy:sparse@dense  |  906.5  |  799.6  |  697.8  |  182.2  |  165.5  |  135.4

Times are in milliseconds (ms).

[------------------------- cuda:matmul-forward --------------------------]
                          |  0.5  |  0.7  |  0.8  |  0.9  |  0.95  |  0.98
1 threads: ---------------------------------------------------------------
      torch:dense@dense   |  2.2  |  4.4  |  4.4  |  2.3  |  4.5   |  2.3
      torch:sparse@dense  |  5.7  |  6.6  |  4.5  |  1.4  |  1.4   |  1.3

Times are in milliseconds (ms).

```
- `sparse@vector`
```
[----------------------------------- cpu:matmul-forward ----------------------------------]
                           |    0.5    |   0.7    |   0.8    |   0.9    |   0.95   |   0.98
1 threads: --------------------------------------------------------------------------------
      torch:dense@vector   |    510.6  |   505.8  |   759.6  |   782.1  |   682.4  |  764.6
      torch:sparse@vector  |  10122.8  |  6241.1  |  7935.6  |  2076.3  |  1049.5  |  826.3
      scipy:sparse@vector  |   1756.7  |  1033.9  |   678.2  |   343.5  |   168.5  |   65.4

Times are in microseconds (us).

[-------------------------------- cuda:matmul-forward --------------------------------]
                           |   0.5    |   0.7    |   0.8   |   0.9   |   0.95  |   0.98
1 threads: ----------------------------------------------------------------------------
      torch:dense@vector   |    36.1  |    21.5  |   21.6  |   21.5  |   21.6  |   21.5
      torch:sparse@vector  |  1099.2  |  1289.4  |  775.7  |  327.1  |  285.4  |  274.0

Times are in microseconds (us).

```
</details>

 ---
<details><summary> backward tests (expand for details).
</summary>

- `sparse@sparse`
```
[--------------------------------- cpu:matmul-backward ---------------------------------]
                           |   0.5    |   0.7    |   0.8    |   0.9    |   0.95  |   0.98
1 threads: ------------------------------------------------------------------------------
      torch:dense@dense    |   246.1  |   315.0  |   306.9  |   168.6  |  290.6  |  146.9
      torch:sparse@sparse  |  6417.5  |  4393.7  |  3012.7  |  1029.4  |  908.0  |  650.7

Times are in microseconds (us).

[----------------------------- cuda:matmul-backward -----------------------------]
                           |   0.5   |   0.7   |   0.8   |  0.9   |  0.95  |  0.98
1 threads: -----------------------------------------------------------------------
      torch:dense@dense    |    6.7  |   13.3  |   13.3  |   6.9  |  13.5  |   6.9
      torch:sparse@sparse  |  143.7  |  143.4  |  119.6  |  29.5  |  29.1  |  10.9

Times are in microseconds (us).

```
- `sparse@dense`
```
 [------------------------------ cpu:matmul-backward -------------------------------]
                          |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -------------------------------------------------------------------------
      torch:dense@dense   |  185.9  |  304.8  |  305.8  |  169.9  |  308.7  |  168.4
      torch:sparse@dense  |  407.9  |  345.8  |  274.6  |  114.2  |  163.6  |  230.5

Times are in milliseconds (ms).

[--------------------------- cuda:matmul-backward --------------------------]
                          |  0.5   |  0.7   |  0.8   |  0.9  |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
      torch:dense@dense   |   6.7  |  13.3  |  13.3  |  6.9  |  13.4  |   6.9
      torch:sparse@dense  |  16.7  |  19.0  |  15.1  |  6.3  |   8.2  |  12.7

Times are in milliseconds (ms).

```
</details>

Kindly review this  PR. cc mruberry, ngimel

Pull Request resolved: pytorch#51647

Reviewed By: albanD

Differential Revision: D27007809

Pulled By: mruberry

fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: sparse Related to torch.sparse open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants