Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
|
||
|
|
||
| class BridgeTowerModelTester: | ||
| class BridgeTowerTextModelTester: |
There was a problem hiding this comment.
There is no BridgeTowerTextModelTest however: we just use this tester class to create text config and text inputs
| ) | ||
|
|
||
|
|
||
| class BridgeTowerImageModelTester: |
There was a problem hiding this comment.
same as mentioned for text model tester above.
| hidden_size=128, | ||
| num_hidden_layers=2, | ||
| num_attention_heads=4, | ||
| intermediate_size=256, |
There was a problem hiding this comment.
This model requires some attributes to be defined in the top config (BridgeTowerConfig).
| has_attentions = False | ||
|
|
||
| @unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | ||
| def test_cpu_offload(self): |
There was a problem hiding this comment.
With large version, this test passes
| pass | ||
|
|
||
| @unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | ||
| def test_disk_offload(self): |
|
|
||
| @unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | ||
| def test_model_parallelism(self): | ||
| pass |
There was a problem hiding this comment.
With large model, there is a device issue when running the forward pass.
I tried to look it, but constantly got GPU OOM. So I decided to update this test file.
I will take a look this test with larger model (but not too large)
| return config, inputs_dict | ||
|
|
||
|
|
||
| @slow |
|
Remark: with lager model (but not too large), we get FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!Better to check this separately. Here is the full log > new_output = new_model(**inputs_dict_class)
tests/test_modeling_common.py:2616:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py:165: in new_forward
output = old_forward(*args, **kwargs)
src/transformers/models/bridgetower/modeling_bridgetower.py:1423: in forward
image_embeds = self.vision_model.visual.transformer.resblocks[i](image_embeds).type(
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BridgeTowerResidualAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_feature...ar(in_features=2048, out_features=512, bias=True)
)
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
hidden_state = tensor([[[ 0.5531, 0.0555, -0.0248, ..., 0.2110, -0.0403, 0.0487]],
[[ 0.2963, -0.1709, 0.0074, ..., 0... [[ 0.3324, -0.0536, -0.0069, ..., 0.0911, -0.0565, -0.2751]]],
device='cuda:1', grad_fn=<ViewBackward0>)
attention_mask = None
def forward(self, hidden_state: torch.Tensor, attention_mask: torch.Tensor = None):
residual_state = hidden_state + self.attention(self.ln_1(hidden_state), attention_mask)
hidden_state = self.ln_2(residual_state)
for _, layer in self.mlp.items():
hidden_state = layer(hidden_state)
> hidden_state = residual_state + hidden_state
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
src/transformers/models/bridgetower/modeling_bridgetower.py:237: RuntimeError
================================================================================================== warnings summary ==================================================================================================
../usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46
/usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46: DeprecationWarning: LINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use BILINEAR or Resampling.BILINEAR instead.
def __init__(self, src_rect, output_size, interp=Image.LINEAR, fill=0):
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================== short test summary info ===============================================================================================
FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! |
* update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
Update
BridgeTowerModelTesterto use small values for config.