It's confusing that you can't select the media part of the Media & Text block but you can select the other side's blocks. The result of this is that you don't get the associated controls and interactions that a "normal" image or video block would offer. In other words, I would expect the media side to act as InnerBlocks of the Media & Text block.
To Reproduce
Steps to reproduce the behavior:
- Select a Media & Text block with an image on one side and some text block on the other side
- Try tapping on the image part
- Notice nothing happens, the Media & Text block ("parent") remains selected
- Try selecting text on the other side
- Notice that this works as you'd expect it to (as a "child" of the Media & Text block)
Example

It's confusing that you can't select the media part of the Media & Text block but you can select the other side's blocks. The result of this is that you don't get the associated controls and interactions that a "normal" image or video block would offer. In other words, I would expect the media side to act as InnerBlocks of the Media & Text block.
To Reproduce
Steps to reproduce the behavior:
Example