Skip to content

Conversation

@frank-dong-ms
Copy link
Contributor

Description

part of #21448
This change is intend to save CPU memory during model load for inference.
Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on:

  1. optimize model with inference session, prepacked external initializer will be saved into data file.
  2. load optimized model and external data file with prepacked initializer, no prepack is needed
  3. run inference with optimized model and data file

Tested with model Phi-3-mini-instruct-onnx,
with ORT 1.12.0:
image

with this change:
image

Peak memory usage dropped from 5.438 GB to 2.726GB.
This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers).

next step:
Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR.

Motivation and Context

import os

import numpy as np
import onnx

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from'

Module 'onnx' is imported with both 'import' and 'import from'. Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.
@yuslepukhin yuslepukhin requested a review from tianleiwu October 23, 2024 16:59
yuslepukhin
yuslepukhin previously approved these changes Oct 23, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@frank-dong-ms frank-dong-ms dismissed github-actions[bot]’s stale review October 24, 2024 05:04

can't re-request review so dismiss to unblock

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@frank-dong-ms frank-dong-ms merged commit c5b6be0 into main Oct 25, 2024
@frank-dong-ms frank-dong-ms deleted the frdong/prepack_1 branch October 25, 2024 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants