Conversation
| size_t large = 0; | ||
| return THCudaMemGetInfo1(state, freeBytes, totalBytes, &large); | ||
| } | ||
|
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/THC/THCGeneral.h.in
Outdated
| THC_API void THCudaHostRecord(THCState *state, void *ptr); | ||
|
|
||
| THC_API cudaError_t THCudaMemGetInfo(THCState *state, size_t* freeBytes, size_t* totalBytes); | ||
| THC_API cudaError_t THCudaMemGetInfo1(THCState *state, size_t* freeBytes, size_t* totalBytes, size_t* large); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/csrc/cudnn/Conv.cpp
Outdated
| CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM, | ||
| CUDNN_CONVOLUTION_FWD_ALGO_DIRECT, | ||
| CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD, | ||
| CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
apaszke
left a comment
There was a problem hiding this comment.
The PR looks good now. We only need one more change so we don't make the allocator cache unnecessarily large.
|
|
||
| CHECK(cudnnFindConvolutionForwardAlgorithmEx(handle, conv.idesc.desc, in, | ||
| conv.wdesc.desc, wght, conv.cdesc.desc, conv.odesc.desc, out, 1, &algoCount, | ||
| &perfResults, ws.data, ws.size)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Sorry, Adam. Removed your comment by mistake. I thought I remove mine. I thought it's not the right place to call emptyCache after |
|
Yeah sure. Everything is ok as long as the cache is going to be flushed only when benchmarks are run (this should only happen a few times during training). Actually I've just started wondering if it wouldn't be simpler to limit the max workspace size to |
|
You can't cudaMalloc largest_cached_free_block without first emptying cache, right? And in some unfortunate cases subsequent cudaFree'ing can interfere with nccl (though in some unfortunate cases that may happen also when caching allocator can't allocate requested block and starts cleaning up). Grrr, this is so hard to solve cleanly. |
|
Right, I meant to use the max and:
cudnnFind (no ex) used to call cudaFree too, and this part is already guarded by a free mutex. |
|
We discussed this earlier and I don't think these heuristics are a good idea. If you're worried that the caching allocator will unnecessarily hold on to large blocks, call |
|
@pytorchbot test this please |
|
@aromnvidia looks like you haven't tested the GPU builds after you pushed your latest commits? They actually fail with basic compiler errors. |
|
@pytorchbot test this please |
|
@pytorchbot add to whitelist |
|
this is now merged into master. |
…002d19 Summary: Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - **[f461f7a](onnx/onnx@f461f7a)**: Show the op's type and name when the shape inference is failed. (pytorch#1623) <Jerry> - **[ab8aaf9](onnx/onnx@ab8aaf9)**: Add scan test case (pytorch#1586) <G. Ramalingam> - **[c95357e](onnx/onnx@c95357e)**: link the tutorial (pytorch#1650) <Lu Fang> - **[d7e2420](onnx/onnx@d7e2420)**: Upgrade label encoder to support more input types (pytorch#1596) <Wei-Sheng Chin> - **[6425108](onnx/onnx@6425108)**: Add Doc about Adding New Operator into ONNX (pytorch#1647) <Lu Fang> - **[295889c](onnx/onnx@295889c)**: use an empty initializer to create map (pytorch#1643) <Lu Fang> - **[e38f3ec](onnx/onnx@e38f3ec)**: Remove redundant const (pytorch#1639) <daquexian> - **[ea694bf](onnx/onnx@ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (pytorch#1565) <Armen> - **[6db386e](onnx/onnx@6db386e)**: make output shape clear enough for Softmax family (pytorch#1634) <Lu Fang> - **[2b67c6e](onnx/onnx@2b67c6e)**: fix batchnorm doc (pytorch#1633) <Lu Fang> - **[c901784](onnx/onnx@c901784)**: remove inappropriate consts (pytorch#1632) <Lu Fang> - **[de82119](onnx/onnx@de82119)**: Shape inference fix for broadcast, concat and scan (pytorch#1594) <KeDengMS> - **[d7ffe3b](onnx/onnx@d7ffe3b)**: Update Optimizer Docs (pytorch#1607) <Armen> - **[d09d139](onnx/onnx@d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (pytorch#1466) <Yuta Okamoto> - **[eb4b7c2](onnx/onnx@eb4b7c2)**: allow variadic parameters of different types (pytorch#1615) <G. Ramalingam> - **[4166246](onnx/onnx@4166246)**: Fix onnxifi test (pytorch#1617) <Yinghai Lu> - **[6706a4d](onnx/onnx@6706a4d)**: Fix a bug in vector address access (pytorch#1598) <Raymond Yang> - **[ae39866](onnx/onnx@ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (pytorch#1610) <Spandan Tiwari> - **[45ba661](onnx/onnx@45ba661)**: Handle new types in the switch. (pytorch#1608) <Dmitri Smirnov> - **[14853b6](onnx/onnx@14853b6)**: Bump docker image version to 230 used in CircleCI (pytorch#1606) <bddppq> - **[e0993b8](onnx/onnx@e0993b8)**: [onnxifi] Make sure that backend handles run async. (pytorch#1599) <Roman Dzhabarov> - **[e6965cc](onnx/onnx@e6965cc)**: Introduce SparseTensor ML proto (pytorch#1554) <Dmitri Smirnov> - **[75b782f](onnx/onnx@75b782f)**: In driver test check the return status of onnxGetBackendIDs (pytorch#1597) <bddppq> - **[c05b364](onnx/onnx@c05b364)**: Make CI log less verbose (pytorch#1595) <bddppq> - **[fa568e4](onnx/onnx@fa568e4)**: Loop type shape inferencing (pytorch#1591) <Scott McKay> - **[937e64c](onnx/onnx@937e64c)**: add uint8 (pytorch#1590) <Lu Fang> - **[f86e951](onnx/onnx@f86e951)**: Add domain as an optional parameter for make_node function (pytorch#1588) <Young Kim> - **[ff45588](onnx/onnx@ff45588)**: Remove unreachable code in shape_inference.h (pytorch#1585) <Changming Sun> - **[f7dcad0](onnx/onnx@f7dcad0)**: Add several hyperbolic function ops. (pytorch#1499) <Sergii Dymchenko> - **[a60ac7d](onnx/onnx@a60ac7d)**: Add OneHot op to ONNX. (pytorch#1567) <Spandan Tiwari> - **[f6c3a7e](onnx/onnx@f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (pytorch#1583) <Roman Dzhabarov> - **[88d1784](onnx/onnx@88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (pytorch#1578) <Spandan Tiwari> - **[20041b7](onnx/onnx@20041b7)**: Add type shape inferencing for the If operator (pytorch#1571) <Scott McKay> - **[d6c4c75](onnx/onnx@d6c4c75)**: Add a virtual destructor to GraphInferencer (pytorch#1574) <Changming Sun> - **[a339598](onnx/onnx@a339598)**: fix ConvTranspose spec (pytorch#1566) <Wenhao Hu> Differential Revision: D13263831 fbshipit-source-id: 0c158dd12c45d704b6f37f63f3d74ed34ef2f534
…002d19 (#14568) Summary: Pull Request resolved: #14568 Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - **[f461f7a](onnx/onnx@f461f7a)**: Show the op's type and name when the shape inference is failed. (#1623) <Jerry> - **[ab8aaf9](onnx/onnx@ab8aaf9)**: Add scan test case (#1586) <G. Ramalingam> - **[c95357e](onnx/onnx@c95357e)**: link the tutorial (#1650) <Lu Fang> - **[d7e2420](onnx/onnx@d7e2420)**: Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin> - **[6425108](onnx/onnx@6425108)**: Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang> - **[295889c](onnx/onnx@295889c)**: use an empty initializer to create map (#1643) <Lu Fang> - **[e38f3ec](onnx/onnx@e38f3ec)**: Remove redundant const (#1639) <daquexian> - **[ea694bf](onnx/onnx@ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen> - **[6db386e](onnx/onnx@6db386e)**: make output shape clear enough for Softmax family (#1634) <Lu Fang> - **[2b67c6e](onnx/onnx@2b67c6e)**: fix batchnorm doc (#1633) <Lu Fang> - **[c901784](onnx/onnx@c901784)**: remove inappropriate consts (#1632) <Lu Fang> - **[de82119](onnx/onnx@de82119)**: Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS> - **[d7ffe3b](onnx/onnx@d7ffe3b)**: Update Optimizer Docs (#1607) <Armen> - **[d09d139](onnx/onnx@d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto> - **[eb4b7c2](onnx/onnx@eb4b7c2)**: allow variadic parameters of different types (#1615) <G. Ramalingam> - **[4166246](onnx/onnx@4166246)**: Fix onnxifi test (#1617) <Yinghai Lu> - **[6706a4d](onnx/onnx@6706a4d)**: Fix a bug in vector address access (#1598) <Raymond Yang> - **[ae39866](onnx/onnx@ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari> - **[45ba661](onnx/onnx@45ba661)**: Handle new types in the switch. (#1608) <Dmitri Smirnov> - **[14853b6](onnx/onnx@14853b6)**: Bump docker image version to 230 used in CircleCI (#1606) <bddppq> - **[e0993b8](onnx/onnx@e0993b8)**: [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov> - **[e6965cc](onnx/onnx@e6965cc)**: Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov> - **[75b782f](onnx/onnx@75b782f)**: In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq> - **[c05b364](onnx/onnx@c05b364)**: Make CI log less verbose (#1595) <bddppq> - **[fa568e4](onnx/onnx@fa568e4)**: Loop type shape inferencing (#1591) <Scott McKay> - **[937e64c](onnx/onnx@937e64c)**: add uint8 (#1590) <Lu Fang> - **[f86e951](onnx/onnx@f86e951)**: Add domain as an optional parameter for make_node function (#1588) <Young Kim> - **[ff45588](onnx/onnx@ff45588)**: Remove unreachable code in shape_inference.h (#1585) <Changming Sun> - **[f7dcad0](onnx/onnx@f7dcad0)**: Add several hyperbolic function ops. (#1499) <Sergii Dymchenko> - **[a60ac7d](onnx/onnx@a60ac7d)**: Add OneHot op to ONNX. (#1567) <Spandan Tiwari> - **[f6c3a7e](onnx/onnx@f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov> - **[88d1784](onnx/onnx@88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari> - **[20041b7](onnx/onnx@20041b7)**: Add type shape inferencing for the If operator (#1571) <Scott McKay> - **[d6c4c75](onnx/onnx@d6c4c75)**: Add a virtual destructor to GraphInferencer (#1574) <Changming Sun> - **[a339598](onnx/onnx@a339598)**: fix ConvTranspose spec (#1566) <Wenhao Hu> Reviewed By: zrphercule Differential Revision: D13263831 fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb
Thank you guys again for your comments and recommendations on my prev. pull request.
Here is updated version of my implementation.