Add QNN EP HTP shared memory allocator #23136

edgchen1 · 2024-12-18T01:08:30Z

Description

Adds QNN EP HTP shared memory allocator.

The HTP shared memory allocator (HtpSharedMemoryAllocator) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU.

The allocator can be enabled by setting QNN EP option enable_htp_shared_memory_allocator to 1. QNNExecutionProvider::CreatePreferredAllocators() will then return an instance of HtpSharedMemoryAllocator.

For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to QnnBackendManager, which also manages the QNN context handles.

For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial

Limitations:

HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported.
An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer.

Motivation and Context

Improve performance by using HTP shared memory to avoid overhead from copying data between CPU and NPU.

…test

… declarations and definitions for IAllocator::TensorAlloc().

…ion clean up callback

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/qnn/qnn_allocator.cc

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h

edgchen1 · 2024-12-19T02:02:01Z

onnxruntime/core/providers/qnn/builder/qnn_utils.cc

@@ -63,6 +65,12 @@ size_t GetElementSizeByType(ONNXTensorElementDataType elem_type) {
  return pos->second;
 }

+size_t GetQnnTensorDataSize(gsl::span<const uint32_t> shape, Qnn_DataType_t element_type) {
+  ORT_ENFORCE(!shape.empty(), "Empty shape not allowed.");  // TODO can we just treat empty shape as a scalar?


this check is copied from the original implementation here:

onnxruntime/onnxruntime/core/providers/qnn/builder/qnn_model.cc

Line 281 in 31e6e10

ORT_RETURN_IF(dims.empty(), "Tensor dimensions is nullptr");

I'm not sure if it's needed

…date comments

edgchen1 · 2025-01-09T02:08:02Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

+          // - QNN context handle is still valid. This should be true as long as QNN contexts are not freed from
+          //   anywhere other than the destructor.


This should be true as long as QNN contexts are not freed from anywhere other than the destructor.

it seems kind of brittle to depend on this.

HectorSVC · 2025-01-09T06:39:14Z

onnxruntime/test/providers/qnn/qnn_basic_test.cc

@@ -1098,6 +1099,38 @@ TEST_F(QnnHTPBackendTests, EPOffloadsGraphIOQuantDequant) {
  }
 }

+TEST_F(QnnHTPBackendTests, UseHtpSharedMemoryAllocatorForInputs) {
+#if !defined(__ANDROID__) && !defined(_WIN32)


QC device for windows is Arm64 based, so you can check defined(aarch64) defined(_M_ARM64)

this code is within an ifdef that checks for those macros:

onnxruntime/onnxruntime/test/providers/qnn/qnn_basic_test.cc

Line 538 in 425023b

#if defined(__aarch64__) || defined(_M_ARM64) || defined(__linux__)

HectorSVC · 2025-01-09T06:47:10Z

onnxruntime/test/providers/qnn/qnn_basic_test.cc

@@ -1098,6 +1099,38 @@ TEST_F(QnnHTPBackendTests, EPOffloadsGraphIOQuantDequant) {
  }
 }

+TEST_F(QnnHTPBackendTests, UseHtpSharedMemoryAllocatorForInputs) {


We should also have some codes to demonstrate how this feature get used from user code.
Here are some IObinding examples for other EPs:

onnxruntime/onnxruntime/test/shared_lib/test_inference.cc

Line 2076 in 3b1a900

#if defined(USE_CUDA) || defined(USE_TENSORRT)

include/onnxruntime/core/framework/ortmemoryinfo.h

skottmckay · 2025-01-09T11:07:12Z

onnxruntime/core/providers/qnn/qnn_allocator.h

+
+  struct AllocationRecord {
+    SharedMemoryInfo shared_memory_info;
+    InlinedVector<AllocationCleanUpFn, 1> clean_up_fns;


Do we expect more than one cleanup func?

it's not unexpected. e.g., if the same shared memory is used from more than one QNN context, there will be a separate cleanup function per QNN context.

skottmckay · 2025-01-09T11:11:54Z

onnxruntime/core/providers/qnn/qnn_allocator.cc

+    marker.fill('\0');
+    allocator_ptr = nullptr;


Should we limit doing the fill to a debug build? not sure how many allocations QNN makes and whether there's any meaningful perf cost.

skottmckay · 2025-01-09T11:23:15Z

onnxruntime/core/providers/qnn/qnn_allocator.cc

+
+namespace {
+
+struct AllocationHeader {


Would be great to add a comment describing the overall setup and how it uses this header.

skottmckay · 2025-01-09T11:26:54Z

onnxruntime/core/providers/qnn/qnn_allocator.cc

+  const size_t allocation_offset = AllocationOffsetFromStartOfHeader();
+  const size_t shared_memory_block_size_in_bytes = allocation_offset + requested_size;
+
+  // rpcmem_alloc() has an int size parameter. make sure we don't overflow.


Can we use SafeInt?

skottmckay · 2025-01-09T11:32:58Z

onnxruntime/core/providers/qnn/qnn_execution_provider.cc

-      htp_arch,
-      soc_model,
-      enable_htp_weight_sharing);
+  static const std::string QNN_HTP_SHARED_MEMORY_ALLOCATOR_ENABLED = "enable_htp_shared_memory_allocator";


Should this be more user visible?

skottmckay · 2025-01-09T11:36:34Z

onnxruntime/core/providers/qnn/shared_context.h

+  SharedContext(const SharedContext&) = delete;
+  SharedContext& operator=(const SharedContext&) = delete;


ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE?

edgchen1 · 2025-01-09T18:29:00Z

onnxruntime/core/providers/qnn/shared_context.h

note: moved SharedContext class from qnn_execution_provider.h to its own file.

edgchen1 and others added 30 commits November 5, 2024 15:12

save work

110a3bc

save work

0ba3a2f

add logging for setting QNN tensor memory, update comment

8436b14

add option to enable HTP shared memory allocator to onnxruntime_perf_…

c9826f4

…test

hack - try to cache mem handles in QnnModel

c07c35e

Remove duplicate include.

60dc837

hack, continued - move cache out to SharedContext

24e072f

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

e66cbef

move mem handle registration to allocator

8c515da

hook up some test code

18e2780

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

09ddce5

rename to RpcMemAllocator to HtpSharedMemoryAllocator

a65bb71

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

bfb135e

remove onnx protobuf dependency from allocator.h, add shared provider…

f179a0d

… declarations and definitions for IAllocator::TensorAlloc().

remove unused CPUAllocator::TensorAlloc declaration

7645ef4

Check for nullptr when trying to free

1043732

move mem handle management to QNN backend manager

022f4bc

remove IAllocator::TensorAlloc()

c527dee

document IAllocator::Free

e4f72b3

remove IAllocator__TensorAlloc

39ff901

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

1bed5a4

fix android build warning

d70db84

remove shared mem handles from shared context

45ef883

remove allocation clean up callback removal, use weak_ptrs in allocat…

d2e7b3c

…ion clean up callback

some clean up

c892c18

more clean up

b295eef

add helper to get qnn error message

13f5e30

use make_shared for QnnBackendManager

d5eace1

add test to qnn_basic_test.cc, document allocator parameter.

bacbcdc

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

30cd9ed

edgchen1 added 2 commits December 17, 2024 17:02

rename variables

b29ab61

revert changes to onnxruntime/test/providers/qnn/max_min_op_test.cc

67a54b8

github-actions bot reviewed Dec 18, 2024

View reviewed changes

onnxruntime/core/providers/qnn/qnn_allocator.cc Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Dec 18, 2024

View reviewed changes

onnxruntime/core/providers/qnn/qnn_allocator.cc Fixed Show fixed Hide fixed

jywu-msft requested a review from HectorSVC December 18, 2024 23:23

edgchen1 added 3 commits December 18, 2024 17:33

fix formatting

c0569e2

skip test if not android and not windows

dd45c84

update comment

959d8df

edgchen1 commented Dec 19, 2024

View reviewed changes

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h Outdated Show resolved Hide resolved

edgchen1 commented Dec 19, 2024

View reviewed changes

remove QnnBackendManager::ReleaseQnnContextMemHandles declaration, up…

ab48516

…date comments

edgchen1 requested review from skottmckay, baijumeswani, adrianlizarraga and jywu-msft December 19, 2024 02:38

edgchen1 added 5 commits January 6, 2025 10:21

add onnxruntime_c_api.h include to ortmemoryinfo.h

4a3f6c3

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

65ce4b1

rename GetQnnTensorDataSize to GetQnnTensorDataSizeInBytes

ff12541

add QnnBackendManager::Create function to ensure shared_ptr usage

5e6e103

make some QnnBackendManager member functions private, update comment

78e86cc

edgchen1 marked this pull request as ready for review January 6, 2025 23:14

edgchen1 changed the title ~~[WIP] Add QNN EP HTP shared memory allocator~~ Add QNN EP HTP shared memory allocator Jan 6, 2025

document GetOrRegister functions

e665a2b

HectorSVC added the ep:QNN issues related to QNN exeution provider label Jan 7, 2025

add enable_htp_shared_memory_allocator to available_keys

425023b

edgchen1 commented Jan 9, 2025

View reviewed changes

HectorSVC reviewed Jan 9, 2025

View reviewed changes

skottmckay reviewed Jan 9, 2025

View reviewed changes

edgchen1 commented Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QNN EP HTP shared memory allocator #23136

Add QNN EP HTP shared memory allocator #23136

edgchen1 commented Dec 18, 2024 •

edited

Loading

github-actions bot left a comment

edgchen1 Dec 19, 2024

edgchen1 Jan 9, 2025

HectorSVC Jan 9, 2025

edgchen1 Jan 9, 2025

HectorSVC Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 9, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 9, 2025

		// - QNN context handle is still valid. This should be true as long as QNN contexts are not freed from
		// anywhere other than the destructor.

		SharedContext(const SharedContext&) = delete;
		SharedContext& operator=(const SharedContext&) = delete;

Add QNN EP HTP shared memory allocator #23136

Are you sure you want to change the base?

Add QNN EP HTP shared memory allocator #23136

Conversation

edgchen1 commented Dec 18, 2024 • edited Loading

Description

Motivation and Context

github-actions bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edgchen1 commented Dec 18, 2024 •

edited

Loading