Skip to content

ENH: Faster array padding #11126

@lagru

Description

@lagru

Dear devs,

As suggested here #11033 (comment) the current implementation of numpy.pad uses copies more than necessary. Currently most of the pad modes use numpy.concatenate under the hood to create the new array. This has to happen twice for each padded axis. I think it would be faster to pre-allocate the returned array once with the correct final shape and just set the appropriate edge values.

Here is a first draft of a function that would pre-allocate an array with padded shape and undefined content in the padded areas.

def _pad_empty(arr, pad_amt):
    """Pad array with undefined values.

    Parameters
    ----------
    arr : ndarray
        Array to grow.
    pad_amt : sequence of tuple[int, int]
        Pad width on both sides for each dimension in `arr`.

    Returns
    -------
    padded : ndarray
        Larger array with undefined values in padded areas.
    """
    # Allocate grown array
    new_shape = tuple(s + sum(p) for s, p in zip(arr.shape, pad_amt))
    padded = np.empty(new_shape, dtype=arr.dtype)

    # Copy old array into correct space
    old_area = tuple(
        slice(None if left == 0 else left, None if right == 0 else -right)
        for left, right in pad_amt
    )
    padded[old_area] = arr

    return padded

These undefined pad-areas could then be filled by simple value assignment, e.g. with new _set_const_after, _set_mean_before... I think this would be significantly faster and I (kind of) tested this already with the suggested function _fast_pad in scikit-image/scikit-image#3022.

If you like this idea, I'd be happy to make a PR that addresses this after #11012 is resolved one way or another. I'm looking forward to your feedback.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions