-
-
Notifications
You must be signed in to change notification settings - Fork 116
LieTensor workable in func.jacrev #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hxu296 Could you please take a look at the PR and maybe help limit code redundancy in the context manager? So far this is only a workable solution but still have duplicate code segments. Perhaps we could write a general wrapper/decorator that is workable on any torch functions like this type. |
hxu296
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapping the torch ops worked, nice! I added some comment to point out some places that look weird to me, but generally this is a great step forward!
|
I will go ahead and address some of my own comment. After that, we can meet to discuss how to make it more general. |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I modified the retain_ltype context manager to restore functions-to-wrap and added tests to assert functions are indeed restored. Next step is to make retain_ltype thread safe. The save, swap, and restore critical regions need to be protected with a semaphore, as we discussed.
Update, according to this PyTorch docs:
DistributedDataParallel uses multiprocessing where a process is created for each GPU, while DataParallel uses multithreading. There are significant caveats to using CUDA models with multiprocessing; unless care is taken to meet the data handling requirements exactly, it is likely that your program will have incorrect or undefined behavior. It is recommended to use DistributedDataParallel, instead of DataParallel to do multi-GPU training, even if there is only a single node.
Using the recommended DistributedDataParallel won't cause race conditions, as it leverages multiprocessing where each process has its own virtual memory. Thread safety is not an issue for DistributedDataParallel.
pypose/func/jac.py
Outdated
| @@ -0,0 +1,58 @@ | |||
| import torch | |||
| from typing import Callable, Union, Tuple, Optional | |||
| from pypose.lietensor.lietensor import retain_ltype | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't import pypose in pypose modules, only import pypose in examples and tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
from .. import retain_ltype
pypose/__init__.py
Outdated
| from .lietensor import Sim3_type, sim3_type, RxSO3_type, rxso3_type | ||
| from .lietensor import tensor, translation, rotation, scale, matrix, euler | ||
| from .lietensor import mat2SO3, mat2SE3, mat2Sim3, mat2RxSO3, from_matrix, matrix, euler2SO3, vec2skew | ||
| from .func import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest
from .lietensor.lietensor import retain_ltype
from . import func
pypose/func/jac.py
Outdated
| @@ -0,0 +1,58 @@ | |||
| import torch | |||
| from typing import Callable, Union, Tuple, Optional | |||
| from pypose.lietensor.lietensor import retain_ltype | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
from .. import retain_ltype
pypose/lietensor/lietensor.py
Outdated
| from contextlib import contextmanager | ||
| from .basics import vec2skew | ||
| import collections, numbers, warnings | ||
| import collections, numbers, warnings, importlib | ||
| from .operation import broadcast_inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pyramid style, refer to point 12 in guideline
tests/optim/test_jacobian.py
Outdated
| } | ||
|
|
||
| with check_fn_equal(TO_BE_CHECKED): | ||
| with retain_ltype(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest using pp.retain_ltype()
initial solution for letting lietensor workable in jacrev