matmul performance benchmarks#51647
Conversation
💊 CI failures summary and remediationsAs of commit b000c5a (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 2 fixed upstream failures:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
a7aa982 to
fb37a87
Compare
985c08d to
3a2f8ec
Compare
robieta
left a comment
There was a problem hiding this comment.
At a high level it seems like spmm.py and spmv.py could be merged by having a central definition of tasks
_TASKS = {
"mm": (
("torch", "torch.mm(dx, dy)"),
("torch.sparse", "torch.sparse.mm(x, y)"),
("scipy", "scipy_coo_matmul(sx, sy)"),
),
"mv": (
("torch", "torch.matmul(dx, vector)"),
("torch.sparse", "torch.matmul(x, vector)"),
("scipy", "scipy_coo_matmul(sx, vector)"),
),
"backward": (
("torch", "torch_backward(dx, dy)"),
("torch.sparse", "sparse_torch_backward(x, y)"),
)
}
and some switching code at the start:
tasks = _TASKS[operation]
if device != "cpu":
# SciPy is CPU only.
tasks = tuple(t for t in tasks if t[0] != "scipy")
loader = (
load_spmv_dataset if operation == "mv" else
functools.partial(load_spmm_dataset, spmm_type=spmm_type)
)
There's also a bit of logic for globals, but it doesn't seem insurmountable. WDYT?
|
Do you have results from running this benchmark? |
Indeed, it was surmountable. Now |
5125648 to
d24d081
Compare
Codecov Report
@@ Coverage Diff @@
## master #51647 +/- ##
==========================================
+ Coverage 73.77% 77.35% +3.57%
==========================================
Files 1879 1879
Lines 183186 183186
==========================================
+ Hits 135147 141701 +6554
+ Misses 48039 41485 -6554 |
I addressed the feedback comments. Moreover, I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for forward tests (expand for details).
backward tests (expand for details).
|
dfe3167 to
4c3ae8e
Compare
Thanks @mruberry, the current implementation of sparse-sparse matmul has almost the same performance as SciPy. I forgot to considering the overhead of converting the I moved main comment and benchmark results into the PR description. |
ab4359b to
d2868ce
Compare
|
Let's give @robieta a chance to review. |
robieta
left a comment
There was a problem hiding this comment.
I had a single nitpick, but overall looks great. Thanks!
d2868ce to
fc2554f
Compare
fc2554f to
5f4b689
Compare
|
I rebased this PR and I think this PR is ready to be merged. |
24f8665 to
870de27
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
I'm trying to land this but the internal tools are struggling to reconcile these whitespace changes. Would you revert these changes and/or rebase this PR, @aocsa?
first commit bench matmul
update update update 2
minor fix
update default timer_min_run_time update update update repeats
quick-checks
870de27 to
b000c5a
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Minor PR following up the previous PR about sparse benchmarking utils pytorch#48397 Fixes pytorch#44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense) I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels. --- <details><summary> forward tests (expand for details). </summary> - `sparse@sparse` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 108.1 | 100.5 | 101.3 | 108.4 | 98.4 | 187.4 torch:sparse@sparse | 659.1 | 368.8 | 156.5 | 53.3 | 26.8 | 14.9 scipy:sparse@sparse | 565.1 | 233.9 | 130.2 | 23.1 | 21.6 | 15.2 Times are in milliseconds (ms). [----------------------------------- cuda:matmul-forward -----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------------- torch:dense@dense | 2243.5 | 4392.5 | 4419.8 | 2272.3 | 4433.9 | 8920.1 torch:sparse@sparse | 21369.2 | 11877.6 | 7339.2 | 1787.2 | 1335.1 | 845.7 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 105.8 | 103.8 | 103.0 | 104.4 | 104.4 | 197.0 torch:sparse@dense | 119.9 | 102.4 | 84.0 | 19.7 | 16.8 | 11.6 scipy:sparse@dense | 906.5 | 799.6 | 697.8 | 182.2 | 165.5 | 135.4 Times are in milliseconds (ms). [------------------------- cuda:matmul-forward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: --------------------------------------------------------------- torch:dense@dense | 2.2 | 4.4 | 4.4 | 2.3 | 4.5 | 2.3 torch:sparse@dense | 5.7 | 6.6 | 4.5 | 1.4 | 1.4 | 1.3 Times are in milliseconds (ms). ``` - `sparse@vector` ``` [----------------------------------- cpu:matmul-forward ----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: -------------------------------------------------------------------------------- torch:dense@vector | 510.6 | 505.8 | 759.6 | 782.1 | 682.4 | 764.6 torch:sparse@vector | 10122.8 | 6241.1 | 7935.6 | 2076.3 | 1049.5 | 826.3 scipy:sparse@vector | 1756.7 | 1033.9 | 678.2 | 343.5 | 168.5 | 65.4 Times are in microseconds (us). [-------------------------------- cuda:matmul-forward --------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------- torch:dense@vector | 36.1 | 21.5 | 21.6 | 21.5 | 21.6 | 21.5 torch:sparse@vector | 1099.2 | 1289.4 | 775.7 | 327.1 | 285.4 | 274.0 Times are in microseconds (us). ``` </details> --- <details><summary> backward tests (expand for details). </summary> - `sparse@sparse` ``` [--------------------------------- cpu:matmul-backward ---------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------------ torch:dense@dense | 246.1 | 315.0 | 306.9 | 168.6 | 290.6 | 146.9 torch:sparse@sparse | 6417.5 | 4393.7 | 3012.7 | 1029.4 | 908.0 | 650.7 Times are in microseconds (us). [----------------------------- cuda:matmul-backward -----------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ----------------------------------------------------------------------- torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.5 | 6.9 torch:sparse@sparse | 143.7 | 143.4 | 119.6 | 29.5 | 29.1 | 10.9 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------ cpu:matmul-backward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 185.9 | 304.8 | 305.8 | 169.9 | 308.7 | 168.4 torch:sparse@dense | 407.9 | 345.8 | 274.6 | 114.2 | 163.6 | 230.5 Times are in milliseconds (ms). [--------------------------- cuda:matmul-backward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------ torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.4 | 6.9 torch:sparse@dense | 16.7 | 19.0 | 15.1 | 6.3 | 8.2 | 12.7 Times are in milliseconds (ms). ``` </details> Kindly review this PR. cc mruberry, ngimel Pull Request resolved: pytorch#51647 Reviewed By: albanD Differential Revision: D27007809 Pulled By: mruberry fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
Summary: Minor PR following up the previous PR about sparse benchmarking utils pytorch#48397 Fixes pytorch#44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense) I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels. --- <details><summary> forward tests (expand for details). </summary> - `sparse@sparse` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 108.1 | 100.5 | 101.3 | 108.4 | 98.4 | 187.4 torch:sparse@sparse | 659.1 | 368.8 | 156.5 | 53.3 | 26.8 | 14.9 scipy:sparse@sparse | 565.1 | 233.9 | 130.2 | 23.1 | 21.6 | 15.2 Times are in milliseconds (ms). [----------------------------------- cuda:matmul-forward -----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------------- torch:dense@dense | 2243.5 | 4392.5 | 4419.8 | 2272.3 | 4433.9 | 8920.1 torch:sparse@sparse | 21369.2 | 11877.6 | 7339.2 | 1787.2 | 1335.1 | 845.7 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 105.8 | 103.8 | 103.0 | 104.4 | 104.4 | 197.0 torch:sparse@dense | 119.9 | 102.4 | 84.0 | 19.7 | 16.8 | 11.6 scipy:sparse@dense | 906.5 | 799.6 | 697.8 | 182.2 | 165.5 | 135.4 Times are in milliseconds (ms). [------------------------- cuda:matmul-forward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: --------------------------------------------------------------- torch:dense@dense | 2.2 | 4.4 | 4.4 | 2.3 | 4.5 | 2.3 torch:sparse@dense | 5.7 | 6.6 | 4.5 | 1.4 | 1.4 | 1.3 Times are in milliseconds (ms). ``` - `sparse@vector` ``` [----------------------------------- cpu:matmul-forward ----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: -------------------------------------------------------------------------------- torch:dense@vector | 510.6 | 505.8 | 759.6 | 782.1 | 682.4 | 764.6 torch:sparse@vector | 10122.8 | 6241.1 | 7935.6 | 2076.3 | 1049.5 | 826.3 scipy:sparse@vector | 1756.7 | 1033.9 | 678.2 | 343.5 | 168.5 | 65.4 Times are in microseconds (us). [-------------------------------- cuda:matmul-forward --------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------- torch:dense@vector | 36.1 | 21.5 | 21.6 | 21.5 | 21.6 | 21.5 torch:sparse@vector | 1099.2 | 1289.4 | 775.7 | 327.1 | 285.4 | 274.0 Times are in microseconds (us). ``` </details> --- <details><summary> backward tests (expand for details). </summary> - `sparse@sparse` ``` [--------------------------------- cpu:matmul-backward ---------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------------ torch:dense@dense | 246.1 | 315.0 | 306.9 | 168.6 | 290.6 | 146.9 torch:sparse@sparse | 6417.5 | 4393.7 | 3012.7 | 1029.4 | 908.0 | 650.7 Times are in microseconds (us). [----------------------------- cuda:matmul-backward -----------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ----------------------------------------------------------------------- torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.5 | 6.9 torch:sparse@sparse | 143.7 | 143.4 | 119.6 | 29.5 | 29.1 | 10.9 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------ cpu:matmul-backward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 185.9 | 304.8 | 305.8 | 169.9 | 308.7 | 168.4 torch:sparse@dense | 407.9 | 345.8 | 274.6 | 114.2 | 163.6 | 230.5 Times are in milliseconds (ms). [--------------------------- cuda:matmul-backward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------ torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.4 | 6.9 torch:sparse@dense | 16.7 | 19.0 | 15.1 | 6.3 | 8.2 | 12.7 Times are in milliseconds (ms). ``` </details> Kindly review this PR. cc mruberry, ngimel Pull Request resolved: pytorch#51647 Reviewed By: albanD Differential Revision: D27007809 Pulled By: mruberry fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd
Minor PR following up the previous PR about sparse benchmarking utils #48397
Fixes #44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)
I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for
DLMC/magnitude_pruningdataset with different sparsity levels.forward tests (expand for details).
sparse@sparsesparse@densesparse@vectorbackward tests (expand for details).
sparse@sparsesparse@denseKindly review this PR. cc @mruberry, @ngimel