-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: Add ndmax parameter to np.array to control recursion depth #29569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kibitzing - thanks for the PR. I think this is rather nice, especially that you found all it really takes is exposing existing functionality!
However, to me at least there is one annoyance: it would seem to me that there is no reason not to support np.array(..., ndmax=0) as meaning that the user wants an array with ndim=0 (i.e., a scalar array).
Unfortunately, this clashes with the numpy C API, which has PyArray_FromAny and friends interpret max_depth=0 as arbitrary (it is actually not documented as such, but since it is used like that throughout the code base, it surely is used outside of numpy as well).
But I think we can make it work by adjusting PyArray_FromAny to change 0 to NPY_MAXDIMS and then have PyArray_FromAny_int not do any checks. See in-line comments.
What do you think?
| op = args[0]; | ||
| } | ||
|
|
||
| if (ndmax > NPY_MAXDIMS || ndmax <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here I'd also check ndmax < 0 rather than <= 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, I've changed as you suggested.
numpy/_core/src/multiarray/ctors.c
Outdated
| int ndim = 0; | ||
| npy_intp dims[NPY_MAXDIMS]; | ||
|
|
||
| if (max_depth == 0 || max_depth > NPY_MAXDIMS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support actually using max_depth = 0 as a real option, the check here would need to be removed.
But that means adjusting code that calls it. The main user is PyArray_FromAny (in ctors.c as well), which would then need this if statement (indeed, perhaps the whole statement should be moved there; it is not unreasonable to assume that for internal calls the values are known to be good).
One other user is PyArray_CheckFromAny_int (also inctors.c), though logically there also perhaps PyArray_CheckFromAny is the one that should be adjusted.
The final user seems to be scalartypes.c.src -- there one could just replace the 0 with NPY_MAX_DIMS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clear guidance.
As you suggested, I've refactored the logic so PyArray_FromAny_int can treat max_depth=0 as a request for a 0-D array.
The responsibility for handling the "no limit" case (by converting 0 to NPY_MAXDIMS) has been moved to its callers:
- The check was added to
PyArray_FromAnyandPyArray_CheckFromAny. - Call sites in
scalartypes.c.srcandarray_converter_newwere updated to useNPY_MAXDIMSexplicitly.
|
Hello @mhvk, I also considered supporting However, the detailed guidance you've provided makes the path forward very clear, so I'm happy to get to work on implementing it. |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kibitzing - this looks great. I was about to approve, but then I realized we should make sure this is actually documented. So, could you add a description in _add_newdocs.py (search for 'array' - l.806; you'll need a.. versionadded:... in there, see examples in that file).
Furthermore, personally, I feel this is worth a little what's-new entry - there are many who have run into this issue! (We've got some code in astropy that we can now simplify!) For this you need to add a fragment in doc/release/upcoming_changes - see the README.rst in that directory).
Since that means another CI run anyway, I also put two absoltely nitpicky comments in-line, which you might as well do too... (but feel free to disagree and ignore).
Let me also ping @mattip and @rkern, in case they want to have a look as well (though in the end this is so much a case of just exposing what was effectively in place already that I'm quite happy to just get it in myself too).
|
|
||
| if (ndmax > NPY_MAXDIMS || ndmax < 0) { | ||
| if (ndmax > NPY_MAXDIMS) { | ||
| PyErr_Format(PyExc_ValueError, "ndmax must be <= NPY_MAXDIMS (=%d)", NPY_MAXDIMS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I wouldn't ask if there wasn't another reason to push a further commit, but to have error path code interrupt the flow as little as possible, I'd tend to have just one PyErr_Format, with (e.g.) "must have 0 <= ndmax <= NPY_MAXDIMS (=%d)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback!
Updated to use a single PyErr_Format with the combined bounds check message.
|
Hello @mhvk,
Please let me know if there's anything else you'd like me to adjust! |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, this now looks great! Let's get it in. Thanks very much for solving a long-standing annoyance in an elegant way!
|
Hello @mhvk, |
|
Indeed, a nice collaboration! |
This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.
This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.
|
@mhvk Yes, that patch resolves both SciPy release branch test failures on x86_64 Linux--I just checked. That said, it looks like Ralf agrees this is a pretty minor matter, so I'm just going to skip those tests for now--we're just exchanging one error class for another as far as the user is concerned. |
|
Although on Windows it looks like sometimes the error doesn't happen at all... yuck, anyway I'm going to suppress the test downstream for now I think |
…y#29569) * ENH: add ndmax parameter to np.array * ENH: validate ndmax argument is positive * TST: add ndmax tests for array creation * ENH: allow np.array with ndmax=0 to create 0-D array * MNT: simplify ndmax validation and error message * MNT: improve consistency in error message formatting * DOC: add comment explaining legacy behavior of max_depth=0 * TST: update tests to reflect new ndmax validation and error message * DOC: add documentation and examples for np.array ndmax parameter * DOC: add release note for numpy.array ndmax parameter
This follows numpy#29569, and also fills in the missing parameter defaults, towards numpy#28428.
Implementaion of #29499
Description
This PR introduces a new parameter,
ndmax, to thenp.arrayfunction. This parameter allows users to explicitly limit the maximum number of dimensions that NumPy will create when converting nested sequences (like lists of lists) into an array.To ensure backward compatibility, the default behavior remains unchanged. When
ndmaxis not specified, the function falls back to its current heuristic. This means it will create as many dimensions as possible from nested sequences, up to the compile-time limit defined byNPY_MAXDIMS.Motivation and Context
As discussed in issue #29499, the current behavior of
np.arraycan lead to unexpected results whendtype=objectis specified. For example:listobjects.ndarray, even withdtype=object.This inconsistency arises because, as @rkern explained, NumPy has a strong heuristic that assumes uniformly nested sequences are intended to be multi-dimensional arrays. While this is often the desired outcome, it overrides the user's explicit intent in cases where a 1D array of
listobjects is needed, regardless of their shape. This can cause subtle bugs, particularly in data processing pipelines for machine learning.The existing workarounds, such as pre-allocating with
np.emptyand then assigning or usingnp.fromiter, are effective but less intuitive than a direct parameter. The addition ofndmaxprovides a clear and explicit way to control this behavior directly within the most commonly used array creation function.How to use
ndmaxThe
ndmaxparameter provides fine-grained control over the array creation process.ndmax=1(Desired Outcome):This aligns the behavior for uniform lists with that of non-uniform lists when the user's intent is to create an array of objects.
Implementation Details
The implementation was made straightforward by leveraging the pre-existing
max_depthparameter in the core C-API function,PyArray_FromAny_int. The new Python-levelndmaxargument is passed directly to this internal parameter.This approach ensures that the new feature is built upon NumPy's established and tested array creation logic, minimizing risk and maintaining internal consistency.