.Net How to free GPU memory after each inference #1131

strikene · 2024-12-09T09:11:02Z

I am using Phi3.5mini-cuda-fp16 With A Nvida GPU (24G Memory).

When i load model Memory is 8490MiB in use.

When I entered an inference of about 3K tokens, the GPU Memory used 10580MiB

If I continue the conversation afterwards, GPU memory will continue to rise

If I am not having a conversation, even if I leave it for an hour, the memory will not decrease.

I don't know if this is a bug, as this phenomenon seems to have existed since 0.4, and the same goes for 0.5.2
Or did I miss something?

This is My code ,I did not forget to release any object, of course, the Model object was not released because we need to reuse it

RyanUnderhill · 2024-12-11T00:20:06Z

Our current design keeps the OrtAllocator cuda allocator until you exit. So the cuda memory pool will not decrease to zero until that point. We could potentially have a way to release this allocator if no objects are allocated from it.

strikene · 2024-12-12T09:05:03Z

Our current design keeps the OrtAllocator cuda allocator until you exit. So the cuda memory pool will not decrease to zero until that point. We could potentially have a way to release this allocator if no objects are allocated from it.

At present, there are some problems with this, in smaller GPU memory devices, it is not possible to inference efficiently multiple times, and the inference speed is getting slower and slower as GPU Memory approaches 100%.
I don't quit after every inference (reloading the model, which results in a cold start, making a single inference longer). We prefer to release the GPU Memory used in this inference after inference, but keep the loaded model

RyanUnderhill · 2024-12-12T19:49:18Z

The memory shouldn't be growing every time, that might be a bug. Marked this as an enhancement & bug to track.

strikene · 2024-12-16T07:08:23Z

The memory shouldn't be growing every time, that might be a bug. Marked this as an enhancement & bug to track.

I look forward to the next update, although it may take a while.

ambroser53 · 2025-01-08T10:49:48Z

Is this a C# issue or an issue with ort-genai in general with all the APIs?

RyanUnderhill added enhancement New feature or request bug Something isn't working labels Dec 12, 2024

natke added the performance label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net How to free GPU memory after each inference #1131

.Net How to free GPU memory after each inference #1131

strikene commented Dec 9, 2024 •

edited

Loading

RyanUnderhill commented Dec 11, 2024 •

edited

Loading

strikene commented Dec 12, 2024

RyanUnderhill commented Dec 12, 2024

strikene commented Dec 16, 2024

ambroser53 commented Jan 8, 2025

.Net How to free GPU memory after each inference #1131

.Net How to free GPU memory after each inference #1131

Comments

strikene commented Dec 9, 2024 • edited Loading

RyanUnderhill commented Dec 11, 2024 • edited Loading

strikene commented Dec 12, 2024

RyanUnderhill commented Dec 12, 2024

strikene commented Dec 16, 2024

ambroser53 commented Jan 8, 2025

strikene commented Dec 9, 2024 •

edited

Loading

RyanUnderhill commented Dec 11, 2024 •

edited

Loading