Skip to content

🐛 Fix: Memory leak in image processing endpoint#1513

Merged
cpuhrsch merged 3 commits into
pytorch:mainfrom
dongxiaolong:main
Jan 8, 2025
Merged

🐛 Fix: Memory leak in image processing endpoint#1513
cpuhrsch merged 3 commits into
pytorch:mainfrom
dongxiaolong:main

Conversation

@dongxiaolong

@dongxiaolong dongxiaolong commented Jan 7, 2025

Copy link
Copy Markdown
Contributor

@cpuhrsch

🐛 Fix: Memory leak causing process termination under high load

Issue Description

When processing large batches of images through the /upload endpoint in production, we encountered two critical issues:

  1. Memory Leak Warning:

RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory.

  1. Process Termination:
    Under high load with multiple concurrent requests, the server process would eventually terminate due to memory exhaustion caused by unclosed matplotlib figures accumulating in memory.

Root Cause

The /upload endpoint creates new matplotlib figures for each request but doesn't properly clean them up, leading to:

  • Memory leaks as figures accumulate
  • Resource exhaustion under high concurrency
  • Eventual process termination in production environments

Changes Made

  • Added proper figure cleanup using try/finally block
  • Fixed the incorrect usage of context manager with plt.figure()
  • Ensures matplotlib figures are properly closed after generating the image response, even when exceptions occur

Before

@app.post("/upload")
async def upload_image(image: UploadFile = File(...)):
    # ... other code ...
    plt.figure(figsize=(image_tensor.shape[1]/100., image_tensor.shape[0]/100.), dpi=100)
    plt.imshow(image_tensor)
    # ... figure generation ...
    return StreamingResponse(buf, media_type="image/png")

After

@app.post("/upload")
async def upload_image(image: UploadFile = File(...)):
    # Convert uploaded file to image tensor
    image_tensor = file_bytes_to_image_tensor(bytearray(await image.read()))
    response_future = asyncio.Future()
    await request_queue.put((image_tensor, response_future))
    masks = await response_future
    
    # Create figure and ensure it's closed after use
    fig = plt.figure(figsize=(image_tensor.shape[1]/100., image_tensor.shape[0]/100.), dpi=100)
    try:
        # Display the original image
        plt.imshow(image_tensor)
        # Overlay annotations (masks) on the image
        show_anns(masks, rle_to_mask)
        # Remove axis for cleaner visualization
        plt.axis('off')
        # Adjust layout to remove padding
        plt.tight_layout()
        
        # Create a buffer to store the image
        buf = BytesIO()
        # Save the figure to buffer in PNG format
        plt.savefig(buf, format='png')
        # Reset buffer position to start
        buf.seek(0)
        
        return StreamingResponse(buf, media_type="image/png")
    finally:
        # Ensure figure is closed even if an error occurs
        plt.close(fig)

Testing Done

  • Fixed the AttributeError: __enter__ error when using with statement
  • Verified that the endpoint still returns correct image responses
  • Confirmed that the memory leak warning no longer appears
  • Load tested with high concurrency (100+ requests/minute) to verify:
    • No memory leaks under sustained load
    • Process remains stable
    • All figures are properly cleaned up

Production Impact

This fix addresses a critical stability issue where image processing services would terminate under load due to memory exhaustion, requiring manual restarts and potentially causing service interruptions.

Let me know if you'd like me to make any adjustments to this PR or if you need additional testing information.

@pytorch-bot

pytorch-bot Bot commented Jan 7, 2025

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1513

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 25c346d with merge base 270a90f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

Hi @dongxiaolong!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot

Copy link
Copy Markdown
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 7, 2025
@cpuhrsch

cpuhrsch commented Jan 7, 2025

Copy link
Copy Markdown
Contributor

Oh right, wouldn't this however be fixed without at try/except statement as well and just using by adding plt.close(fig)? I fear this change wouldn't allow any errors to be raised.

@cpuhrsch cpuhrsch self-requested a review January 7, 2025 08:10
@dongxiaolong

Copy link
Copy Markdown
Contributor Author

Oh right, wouldn't this however be fixed without at try/except statement as well and just using by adding plt.close(fig)? I fear this change wouldn't allow any errors to be raised.

Thanks for the feedback! You're right. I've simplified the code to just use plt.close(fig). The new version is cleaner and more straightforward.

@cpuhrsch cpuhrsch added the topic: bug fix Use this tag for PRs that fix bugs label Jan 7, 2025
@cpuhrsch cpuhrsch merged commit 2c3d44c into pytorch:main Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants