[Kernel] Optimize isfinite kernel #69596

HydrogenSulfate · 2024-11-21T11:59:22Z

PR Category

Performance Optimization

PR Types

Improvements

Description

Pcard-75624

p_norm_grad组合算子中使用了isfinite基础算子

Paddle/paddle/fluid/prim/api/composite_backward/composite_backward_api.h

Lines 2077 to 2082 in 7ca7f2c

    
           auto _zero_tensor = 
        
               full<T>(common::vectorize(x.dims()), 0.0, x.dtype(), x.place()); 
        
           auto finite_mask = isfinite<T>(x_grad_tmp); 
        
           x_grad_tmp = where<T>(finite_mask, x_grad_tmp, _zero_tensor); 
        
           x_grad_tmp = expand_out_grad * (x_grad_tmp);

而isfintie的kernel实现使用了thrust库，这个库的API调用时会触发CudaStreamSynchronize，最终导致每个step的耗时增加（下图红圈），因此参考 isclose kernel 重构了 isinite kernel。

Important

重构后的代码结构如下四部分组成（以isfinite为例）

通用模板声明
整数类型的偏特化，由于整数不会出现inf或nan，不需要判断直接赋值true或false即可
标准浮点数类型的偏特化，根据device类型，调用cuda或std提供的判断函数
其他自定义浮点类型的特化，根据device类型，调用cuda或phi提供的判断函数

修复后，平均耗时(ns): 659408.8 下降至 34268.3，耗时减少为可忽略状态，timeline也没有再出现绿块

paddle-bot · 2024-11-21T11:59:26Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…e/Paddle into optimize_isfinite

Summary of this PR: 1. upload DPA-1 related code 2. merge much develop code 3. add all eager composite operators except `softmax_grad`, `p_norm_grad`, `split_grad`, and `concat_grad` to the composite operator blacklist(<https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148>) to significantly improve model execution speed (reducing the time taken from 100% more than PyTorch to about 10% to 15% more). related PR: lanpa/tensorboardX#728 ### Training curve: ![training_curves_comparison_eager_opt](https://github.com/user-attachments/assets/3b71fc99-5abf-4353-a61a-38737d3c7f2c) ### Accuracy test(left: paddle, right: torch): ![image](https://github.com/user-attachments/assets/a42b4bfd-c0f8-4eb8-85eb-ff1adf981dbb) Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556  ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced several new classes for molecular descriptors, including `DescrptDPA1`, `DescrptBlockSeAtten`, and `LayerNorm`, enhancing the modeling capabilities for molecular simulations. - Added new JSON configuration files for model parameters and multitask models related to water simulations. - Implemented new test classes for validating the functionality of the `DPAtomicModel` and various descriptor classes. - Added new test classes for evaluating denoising models, including `TestDenoiseModelDPA1` and `TestDenoiseModelDPA2`. - Enhanced the `ModelWrapper` class to clarify the handling of model parameters and state management. - **Bug Fixes** - Improved internal logic for handling model state saving and loading, ensuring consistency in outputs. - **Documentation** - Enhanced type hints and return annotations across various classes and methods for better clarity. - **Tests** - Expanded the testing framework with new test cases for denoising models and descriptor functionalities, ensuring robust validation of features. - Activated previously skipped tests for energy models, improving test coverage. - Enhanced multitask training tests with new configuration handling and test classes.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Support DPA-2 in paddle backend. This PR will be updated after #4414 is merged. ### Training curve: ![training_curves_comparison_dpa2](https://github.com/user-attachments/assets/29bdeffa-cf2d-4586-afcf-7df0569997c3) ### Accuracy test(left: paddle, right: torch): ![image](https://github.com/user-attachments/assets/5bff55f3-1c39-4b95-93f0-68783e794716) Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556  ## Summary by CodeRabbit - **New Features** - Introduced new classes for molecular descriptors: `DescrptDPA2`, `DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`. - Added new functions for tensor operations and descriptor management, enhancing the capabilities of the module. - Updated JSON configurations for multitask models to refine selection criteria and data paths. - **Bug Fixes** - Improved error handling and parameter validation across various descriptor classes. - **Documentation** - Enhanced test coverage for new descriptor functionalities and configurations. - **Tests** - Added new test classes to validate the functionality of `DescrptDPA2` and multitask training scenarios. - Expanded test capabilities for descriptor classes based on installed dependencies. - Updated existing tests to support new configurations and functionalities.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

optimize isfinite kernel

5c18e29

HydrogenSulfate added 4 commits November 21, 2024 20:01

remove no use code

a64dc95

Update isfinite_kernel_impl.h

f69457c

reformat code and adapt for windows VC compiler

2d2d649

Merge branch 'develop' into optimize_isfinite

04c7506

HydrogenSulfate force-pushed the optimize_isfinite branch from f69457c to 04c7506 Compare November 22, 2024 04:04

HydrogenSulfate added 6 commits November 22, 2024 14:11

update for integer dtype

3bd7381

Merge branch 'optimize_isfinite' of https://github.com/HydrogenSulfat…

41d5d83

…e/Paddle into optimize_isfinite

add int16 and int8 for fixing for windows

0cd3248

update uint8_t

35c6ad4

simplify code

b8946a7

Merge branch 'develop' into optimize_isfinite

78057cc

zyfncg approved these changes Nov 25, 2024

View reviewed changes

HydrogenSulfate merged commit dc6bba9 into PaddlePaddle:develop Nov 25, 2024

HydrogenSulfate deleted the optimize_isfinite branch November 25, 2024 05:12

This was referenced Nov 25, 2024

pd: support dpa1 deepmodeling/deepmd-kit#4414

Merged

pd: support dpa2 deepmodeling/deepmd-kit#4418

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel] Optimize isfinite kernel #69596

[Kernel] Optimize isfinite kernel #69596

Uh oh!

HydrogenSulfate commented Nov 21, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	auto _zero_tensor =
	full<T>(common::vectorize(x.dims()), 0.0, x.dtype(), x.place());
	auto finite_mask = isfinite<T>(x_grad_tmp);
	x_grad_tmp = where<T>(finite_mask, x_grad_tmp, _zero_tensor);
	x_grad_tmp = expand_out_grad * (x_grad_tmp);

[Kernel] Optimize isfinite kernel #69596

[Kernel] Optimize isfinite kernel #69596

Uh oh!

Conversation

HydrogenSulfate commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

修复后，平均耗时(ns): 659408.8 下降至 34268.3，耗时减少为可忽略状态，timeline也没有再出现绿块

Uh oh!

paddle-bot bot commented Nov 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HydrogenSulfate commented Nov 21, 2024 •

edited

Loading