20 Comments
User's avatar
⭠ Return to thread
Sherman's avatar

While I definitely failed to read your initial example right on the first pass, I think the particular example chosen isn't ambiguous.

The `Returns` part of the docstring strictly implies that all inputs b with ndim >= 2 will be interpreted as the M,K case.

From there, my (2nd attempt) instincts said to let K=1 and unsqueeze-squeeze, which was correct.

·

In general, I do agree that tensor methods can be ill-formed.

Contractually, it would be nice if all ops had

* a single unbatched implementation with fixed ndims for all inputs,

* clear descriptions of what broadcasting/batching semantics are supported for the op.

Or, if you could remake everything from scratch, you would gate all batching behavior behind a vmap() wrap which simply replaces all ops with their batched variant, making intended code behavior obvious... but this gets yucky when you need to flatten/split a batch dim for reasons.

·

I don't think broadcasting can be ditched. As you describe, it's good in "simple" cases, and it's quite easy to see how ugly code would become in the no-broadcast world. I think most cases of bad ambiguity go away if your broadcasting system is repeat-only rather than also ndim modifying.

·

I agree arrays-in-indices ("advanced" indexing) was a mistake. It is more confusing than a simple .take/.index_select. The nd cases are even worse.

·

I'm eagerly looking forward to the “better” NumPy :)

dynomight's avatar

You mean the np.linalg.solve documentation? Yeah, I don't think it's actually ambiguous, just very hard to understand!