Skip to content

GIF's n_frames is (about 60x) slower than it should be  #6105

@FirefoxMetzger

Description

@FirefoxMetzger

While idling around on SO and answering a question I noticed that n_frames is surprisingly slow for GIF:

# setup code based on ImageIO 
# because it's short and we can avoid measuring disk IO
import imageio.v3 as iio
from PIL import Image
import io

img = iio.imread("imageio:newtonscradle.gif", index=None)
img_bytes = iio.imwrite("<bytes>", img, format="GIF")

The desired way to do things:

%%timeit

with Image.open(io.BytesIO(img_bytes)) as file:
    n_frames = file.n_frames

Timing for this approach:

13.2 ms ± 94.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is much more time than it should take to check a series of block headers. So I checked, and - indeed - n_frames parses pixel data, which (in pillow9) involves a conversion to RGB(A) which is what makes this slow. Here is an alternative approach that only reads headers and is - more or less - a drop-in replacement:

%%timeit
# based on SO: https://stackoverflow.com/a/7506880/6753182

with Image.open(io.BytesIO(img_bytes)) as file:
    def skip_color_table(flags):
        if flags & 0x80: 
            file.fp.seek(3 << ((flags & 7) + 1), 1)

    current_fp = file.fp.tell()
    total_frames = file.tell()  # start counting from the current frame

    # seek to beginning of next block
    buffer_start = file.tile[0][2]
    file.fp.seek(buffer_start)
    while True:
        size = file.fp.read(1)
        if size and size[0]:
            file.fp.seek(size[0], 1)
        else:
            break
    
    # count the remaining blocks
    while True:
        block = file.fp.read(1)
        if block == b';': 
            break
        if block == b'!': 
            file.fp.seek(1, 1)
        elif block == b',':
            total_frames += 1
            file.fp.seek(8, 1)
            skip_color_table(ord(file.fp.read(1)))
            file.fp.seek(1, 1)
        else: raise RuntimeError("unknown block type")
        
        # skip to next block instead of loading pixels
        while True:
            l = ord(file.fp.read(1))
            if not l: break
            file.fp.seek(l, 1)

    file.fp.seek(current_fp)

The timings:

211 µs ± 529 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

So if we only check block headers we can be about 60x faster than we are now. If desired, and there is somebody willing to review, I can submit a PR that adds this some time next week :)

(tests were done on Windows 11 with a AMD Ryzen 7 5800X 8-Core Processor and 2x Kingston KHX3200C16D4/32GX 32GB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions