Skip to content

Conversation

@annarev
Copy link
Contributor

@annarev annarev commented Mar 5, 2021

use_bfc_allocator parameter will decide whether to wrap pluggable device SubAllocator with BFC allocator or not.

I am thinking it can be used as follows:
See lines here:

PluggableDeviceBFCAllocator* device_bfc_allocator =

They should be replaced with something like

Allocator * device_allocator = nullptr;
if (use_bfc_allocator) {
  PluggableDeviceBFCAllocator* device_bfc_allocator =
        new PluggableDeviceBFCAllocator(
            sub_allocator, total_bytes, options,
            strings::StrCat("PluggableDevice_", tf_device_id.value(), "_bfc"));
  device_allocator = device_bfc_allocator;
} else {
  PluggableDeviceSimpleAllocator* simple_allocator = new PluggableDeviceSimpleAllocator(sub_allocator);
  device_allocator = simple_allocator;
}

PluggableDeviceSimpleAllocator should just be a minimal wrapper over the sub_allocator (currently not implemented as a part of this PR). This seems much simpler to me than what I was thinking of originally (sorry for the delay).

PTAL @penpornk @kulinseth, @jzhoulon

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Mar 5, 2021
@google-cla google-cla bot added the cla: yes label Mar 5, 2021
@penpornk penpornk self-requested a review March 5, 2021 21:55
@gbaned gbaned self-assigned this Mar 8, 2021
@jzhoulon
Copy link
Contributor

jzhoulon commented Mar 9, 2021

@annarev Thanks for the PR!
Do you mean the PluggableDeviceSimpleAllocator will directly call the stream_exec_->AllocateArray(), which calls SP_StreamExecutor::allocate(const SP_Device* device, uint64_t size, int64_t memory_space,SP_DeviceMemoryBase* mem ), and plugin authors decides whether allocate is build on top of a custom allocator? if it is, I think it is better to add an field such as void* custom_allocator in SP_Device? since the allocator should be per-device? Thanks

@kulinseth
Copy link
Contributor

Thanks @annarev for the PR. SimpleAllocator can be something straightforward calling into AllocateRaw and that would be fine for us. We had brief discussion of this during the design phase. To keep it simple we can leverage SubAllocator currently. After the pluggable device implementation merges we can use the use_bfc_allocator flag to pass this information in device_process_state..

There are few places in pluggable_device_process_state like AllocatorParts will require changes to accomodate the SimpleAllocator design, currently it assumes the Device memory allocator is BFC. I can upload a PR which I have tested locally works with our design. Please let me know.

@annarev
Copy link
Contributor Author

annarev commented Mar 10, 2021

@annarev Thanks for the PR!
Do you mean the PluggableDeviceSimpleAllocator will directly call the stream_exec_->AllocateArray(), which calls SP_StreamExecutor::allocate(const SP_Device* device, uint64_t size, int64_t memory_space,SP_DeviceMemoryBase* mem ), and plugin authors decides whether allocate is build on top of a custom allocator? if it is, I think it is better to add an field such as void* custom_allocator in SP_Device? since the allocator should be per-device? Thanks

@jzhoulon Sounds about right. use_bfc_allocator will decide whether to use a simple allocator or a BFC allocator. So, a plugin that wants to use their own custom allocation strategy and doesn't want to wrap with BFC, can set this to true. Do you mean add use_bfc_allocator parameter per device? I was just thinking having one setting for all device types is easier to pass through on TF end. Do we have a usecase for using or not using bfc allocator per device?

Thanks @annarev for the PR. SimpleAllocator can be something straightforward calling into AllocateRaw and that would be fine for us. We had brief discussion of this during the design phase. To keep it simple we can leverage SubAllocator currently. After the pluggable device implementation merges we can use the use_bfc_allocator flag to pass this information in device_process_state..

@kulinseth Yep that makes sense. I misread it back then as custom allocator wrapper itself being something customized by the plugin.

There are few places in pluggable_device_process_state like AllocatorParts will require changes to accomodate the SimpleAllocator design, currently it assumes the Device memory allocator is BFC. I can upload a PR which I have tested locally works with our design. Please let me know.

Sounds good, you can upload the PR and we can comment on it if anything needs changing.

@jzhoulon
Copy link
Contributor

@annarev Thanks for the PR!
Do you mean the PluggableDeviceSimpleAllocator will directly call the stream_exec_->AllocateArray(), which calls SP_StreamExecutor::allocate(const SP_Device* device, uint64_t size, int64_t memory_space,SP_DeviceMemoryBase* mem ), and plugin authors decides whether allocate is build on top of a custom allocator? if it is, I think it is better to add an field such as void* custom_allocator in SP_Device? since the allocator should be per-device? Thanks

@jzhoulon Sounds about right. use_bfc_allocator will decide whether to use a simple allocator or a BFC allocator. So, a plugin that wants to use their own custom allocation strategy and doesn't want to wrap with BFC, can set this to true. Do you mean add use_bfc_allocator parameter per device? I was just thinking having one setting for all device types is easier to pass through on TF end. Do we have a usecase for using or not using bfc allocator per device?

No, I mean an opaque handle (void* custom_allocator) in SP_Device. See the following code example:

void create_device (SP_Platform platform, SE_CreateDeviceParams params, status) {
 params->custom_allocator = new CustomAllocator()
}

void allocate(SP_Device device, unit64_t size, ...) {
  CustomAllocator* allocator = static_cast<CustomAllocator*>(device->custom_allocate);
  allocator->allocate_raw(size, ...)
} 

@annarev
Copy link
Contributor Author

annarev commented Mar 11, 2021

@annarev Thanks for the PR!
Do you mean the PluggableDeviceSimpleAllocator will directly call the stream_exec_->AllocateArray(), which calls SP_StreamExecutor::allocate(const SP_Device* device, uint64_t size, int64_t memory_space,SP_DeviceMemoryBase* mem ), and plugin authors decides whether allocate is build on top of a custom allocator? if it is, I think it is better to add an field such as void* custom_allocator in SP_Device? since the allocator should be per-device? Thanks

@jzhoulon Sounds about right. use_bfc_allocator will decide whether to use a simple allocator or a BFC allocator. So, a plugin that wants to use their own custom allocation strategy and doesn't want to wrap with BFC, can set this to true. Do you mean add use_bfc_allocator parameter per device? I was just thinking having one setting for all device types is easier to pass through on TF end. Do we have a usecase for using or not using bfc allocator per device?

No, I mean an opaque handle (void* custom_allocator) in SP_Device. See the following code example:

void create_device (SP_Platform platform, SE_CreateDeviceParams params, status) {
 params->custom_allocator = new CustomAllocator()
}

void allocate(SP_Device device, unit64_t size, ...) {
  CustomAllocator* allocator = static_cast<CustomAllocator*>(device->custom_allocate);
  allocator->allocate_raw(size, ...)
} 

I think setting a custom allocator makes things more complicated. I think we should do it only if we think of a usecase that cannot be covered by just implementing allocate/deallocate in the StreamExecutor API: https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/experimental/stream_executor/stream_executor.h;l=245.

@kulinseth
Copy link
Contributor

I think setting a custom allocator makes things more complicated. I think we should do it only if we think of a usecase that cannot be covered by just implementing allocate/deallocate in the StreamExecutor API: https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/experimental/stream_executor/stream_executor.h;l=245.

Agreed, I can queue up the PR after #45784 merges.

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work. I'm approving this now in the interest of time. We can continue the discussion here.

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 12, 2021
@penpornk
Copy link
Member

Agreed, I can queue up the PR after #45784 merges.

That would be awesome. Thank you very much, Kulin!

@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 12, 2021
@copybara-service copybara-service bot merged commit 844d24f into tensorflow:master Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes ready to pull PR ready for merge process size:S CL Change Size: Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants