Skip to content

Addition of _repr_jpeg_ causes both JPEG and PNG format images to be attached to Jupyter cells #7259

@smason

Description

@smason

I've had a go at improving the interaction between Jupyter and Images. As I commented on in #7135, that change causes the image to be saved in JPEG and PNG format and both files attached to the invoking cell.

This behavior seems redundant, and given that no resizing is done can result in very large .ipynb files if the user isn't careful when working with high resolution images. I've had a look at how to avoid this, and the nicest way I could see to do this was by adding a Image._repr_mimebundle_ method that would do the "right thing" in more circumstances.

Specifically, it would know whether the user has called display_jpeg or display_png and could return the appropriate encoding. If neither format is specifically requested it could return something appropriate for the data.

I've had a go at implementing this and it currently looks like:

THRESHOLD_SIZE = 1200

MODE_MAP = {
    'La': 'LA',
    'LAB': 'RGB',
    'HSV': 'RGB',
    'RGBX': 'RGB',
    'RGBa': 'RGBA',
}

VALID_JPEG_MODES = {'L', 'RGB', 'YCbCr', 'CMYK'}
VALID_PNG_MODES = {'1', 'P', 'L', 'RGB', 'RGBA', 'LA', 'PA'}

def _repr_mimebundle_(self, include=None, exclude=None, **kwargs):
    # let IPython interpret its ambiguous include/exclude flags
    if include is not None or exclude is not None or kwargs:
        return None

    # massage the image into something that has a reasonable
    # chance of being viewed by the user

    # handle the special snowflake modes
    if self.mode in {'I', 'F'}:
        # linearly transform extrema to fit in [0, 255]
        # this should have a similar result as Image.histogram
        lo, hi = self.getextrema()
        scale = 256 / (hi - lo) if lo != hi else 1
        image = self.point(lambda e: (e - lo) * scale).convert('L')
    elif self.mode == 'I;16':
        # linearly transform max down to 255
        image = self.point(lambda e: e / 256).convert('L')
    else:
        image = self

    # shrink large images so they don't take too long to transfer/render
    factor = max(image.size) // THRESHOLD_SIZE
    if factor > 1:
        image = image.reduce(factor)

    # process remaining modes into things supported by writers
    if image.mode in MODE_MAP:
        image = image.convert(MODE_MAP[image.mode])

    jpeg = image._repr_jpeg_() if image.mode in VALID_JPEG_MODES else None
    png = image._repr_png_() if image.mode in VALID_PNG_MODES else None

    # prefer lossless format if it's not significantly larger
    if jpeg and png:
        # 1.125 and 2**18 used as they have nice binary representations
        if len(png) < len(jpeg) * 1.125 + 2**18:
            jpeg = None
        else:
            png = None

    return {
        'image/jpeg': jpeg,
        'image/png': png,
    }

This can be tested with something like:

size = 100, 100
arr = np.linspace(0, 10, np.prod(size), endpoint=False, dtype=np.int32)
image = Image.fromarray(arr.reshape(size))
display.display(_repr_mimebundle_(image), raw=True)

Could turn this into a pull-request if this seems like the right way to go.

Note the if include is not None or exclude is not None code at the top would mean that explicit calls like:

display.display_jpeg(image)

would cause Jupyter to end up calling Image._repr_jpeg_ directly, bypassing this code. If called with a large image, this would cause a full-res JPEG to be returned. Not sure if this is the right thing, or even what the "correct" way of handling these arguments should be.

I've only seen exclude=None in my testing, so am going on description in IPython/core/formatters.py and what graphviz does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions