Skip to content

Conversation

@jvanstraten
Copy link
Contributor

@jvanstraten jvanstraten commented Jan 10, 2022

Add memory-optimized logic to create immutable buffers filled with zeros as MakeArrayOfNull, by reusing them at the MemoryPool level rather than only at Array level. Includes an additional optimization for Linux that uses mmap to generate read-only zero buffers that, depending on the implementation in the kernel and the architecture, might not cost physical memory at all and/or do the zeroing operation lazily.

Draft for now; tests still fail, but the general implementation is there.

@github-actions
Copy link

@jvanstraten jvanstraten force-pushed the ARROW-7051-Improve-MakeArrayOfNull-to-support-creat branch from 08ad7da to 8d8c3c6 Compare January 19, 2022 14:51
@jvanstraten jvanstraten marked this pull request as ready for review January 19, 2022 20:35
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looks like this slipped through the cracks. I took an initial look, but it also needs to be rebased now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this meant to be MakeMutableArrayOfNull? it sounds a little odd for a function to defer to itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was. I was on the fence about whether MakeArrayOfNull() should be the immutable or mutable variant and refactored it once or twice, eventually choosing the former since most existing users of the function didn't need mutability. I guess that one slipped through the cracks. e878edb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here - was this meant to be MakeMutableArrayFromScalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in e878edb

Comment on lines 163 to 164
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but most docstrings use \param syntax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think my editor is configured to autocomplete using @param and I just didn't notice. a551146

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use BytesForBits here:

// Return the number of bytes needed to fit the given number of bits
constexpr int64_t BytesForBits(int64_t bits) {
// This formula avoids integer overflow on very large `bits`
return (bits >> 3) + ((bits & 7) != 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 689a8af

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the child also use NullArrayFactory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does, good catch. In fact, it looks like I completely forgot to refactor this part of the code for mutable vs immutable as the implied version. Fixed in b30526c and then refactored in 21507bf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but why is the existing Buffer interface not sufficient for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, actually. My reasoning for introducing it was that I needed something to model shared ownership of a region of zero-allocated (virtual) memory for a bunch of Buffers (ImmutableZerosPoolBuffer specifically, as created using MakeBufferOfZeros()), so some RAII object inside a shared_ptr would do the trick. I didn't even consider that that RAII object could also just be a Buffer implementation, with its implementation details hidden that way. It felt very wrong to define a public class for something this specific in such a prominent place to begin with, so I'm glad to get rid of it... The current ImmutableZerosPoolBuffer class could then be used much more generically for shared-ownership views of some buffer, unless that already exists (?), in which case I can just reuse that.

I don't think I'll get around to refactoring this today, but I'll take a closer look tomorrow (and then rebase after that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, ImmutableZeros now implements Buffer and has its implementation hidden.

As I looked at the Buffer interface a bit better I also figured a specialized memory manager was probably in order, since these buffers have different, managed allocate operations. With that, I think the interface for using immutable zero CPU buffers is now fully symmetric to the one for regular, mutable CPU buffers.

81aa078

@jvanstraten jvanstraten force-pushed the ARROW-7051-Improve-MakeArrayOfNull-to-support-creat branch from 21507bf to 81aa078 Compare February 10, 2022 13:34
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @pitrou as well

return std::shared_ptr<Buffer>(std::move(buffer));
}

static Result<std::shared_ptr<Buffer>> CreateEmptyBuffer() { return AllocateBuffer(0); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AllocateBuffer(0, pool_)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

Status Visit(const NullType&) {
out_->buffers.resize(1, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should this be out_->buffers.resize(0) since a NullArray has no buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this from the original NullArrayFactory, and don't ask me why, but at least the ValidateLayout() fails when buffers is empty. I'm assuming there is code in Arrow that assumes that the buffers[0] entry always exists and points to the values buffer, even if a type doesn't actually have a values buffer, as this also explains the nullptr entry before the actual buffer pointers for UnionType.


/// A memory manager that uses the immutable zeros interface of the given memory pool,
/// rather than the normal mutable buffer interface.
class ARROW_EXPORT CPUImmutableZerosMemoryManager : public MemoryManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be placed inside device.cc instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I don't see is_mutable overridden. But if it's not overridden, is it useful to have?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also check Buffer::is_mutable in CopyBufferTo instead above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be placed inside device.cc instead?

Probably, since it's not used extensively or at all yet. But it seems odd to me to make CPUMemoryManager public (and it needs to be, there is code referring to it directly) while hiding its cousin. IMO it should be public for symmetry if nothing else.

Additionally, I don't see is_mutable overridden.

Oops, I completely forgot to do that. 41ebefc

[...] is it useful to have?

My logic here is that if a buffer is associated with a memory pool that generates immutable buffers of which the data corresponds to some pattern (otherwise the buffers wouldn't be very useful in general), one should be able to assume that the contents of this buffer actually correspond to said pattern. If however views/copies of other buffers can be created within each CPU memory pool, which the previous implementation did (in fact it assumed there is only one CPU memory pool, as it always just used AllocateBuffer for the default CPU memory pool) that invariant does not hold. The is_mutable() flag was my way of solving this, or at least my way of catching as many possible application logic problems related to this as possible (a user can just make their own memory pools that ignore the destination memory pool argument in CopyBufferTo and ViewBufferTo, not unlike what the default CPU memory pool was doing, so it's certainly not foolproof).

However, whether this logic is sound and/or whether this is desirable is certainly disputable. As is the name of the method.

We could also check Buffer::is_mutable in CopyBufferTo instead above.

That would be too restrictive per the above reasoning. For example, MemoryManager::is_mutable() is/should be checked whenever a view is made of another buffer within the confines of that memory manager; if the memory manager makes immutable buffers and thus is making some assertion about the contents of its buffers, making the view is denied (that's pessimistic, because the data might still conform to the pattern, but at this point the user is probably doing something they didn't intend to do regardless of what's actually in the buffer). On the other hand, if the memory pool can make mutable buffers and thus does not make any assertions about buffer contents, it's perfectly fine to return a view of an existing buffer, regardless of that buffer's mutability.

#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
if (size > 0) {
*out = static_cast<uint8_t*>(mmap(
nullptr, size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is useful, because the underlying allocator probably already relies on mmap for large-ish allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that, because of the flags passed to mmap, the kernel doesn't have to allocate (or clear) physical memory at all, as opposed to the allocate + memset that the alternative implementation would do to satisfy mutability. You could allocate several terabytes of virtual memory with this call if you'd want to, and it would cost you zero physical bytes (page table structures aside). At least, it works that way on my kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, we don't know that the allocator returned us zero-allocated memory, unfortunately.
(note that both jemalloc and mimalloc have zero-initializing allocation APIs, so we could use those)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, Allocate() generally does not return zero-initialized memory, due to the poisoning mechanic. I would rather not dive into this rabbit hole as well right now, so I just added a TODO comment in the code. I can file a followup JIRA for it too, if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvanstraten You're right. So let's keep it like this and just explain why it is in a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hide the entire mmap() handling behind a compatibility wrapper in arrow/util/io_util.{h,cc}, so that we can easily add Windows and macOS support?

Copy link
Member

@pitrou pitrou Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that VirtualAlloc on Windows zero-initializes pages. We can fall back on calloc by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3738819 does what you want in terms of refactoring? I left the CMake option to disable the OS-specific stuff and use the regular memory pool allocator in, so people can still opt to use that allocator (and the statistics tracking thereof) in favor of calloc().

I don't have a Windows dev environment set up though, so I'd prefer to open a followup issue for VirtualAlloc instead, so someone else can pick it up if they want. Same for mac; I guess mmap will probably work, but I can't test it.


public:
Result<std::shared_ptr<Buffer>> GetImmutableZeros(int64_t size) override {
// Thread-safely get the current largest buffer of zeros.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm seems quite complicated and costly (in terms of inter-thread synchronization).
Instead, how about maintaining lazily-allocated power-of-two-sized entries (perhaps starting with 4096 to avoid having a ton of tiny size classes)? Something like:

static constexpr int kMinPower2Size = 12;  // 4096 bytes
static constexpr int kNumAllocClasses = 64 - kMinPower2Size;
// Addresses of allocated zeros, for each size class
std::array<std::atomic<uintptr_t>, kNumAllocClasses> allocated_;
std::mutex allocation_mutex_;

Result<const uint8_t*> GetImmutableZeros(int64_t size) override {
  const auto size_class = std::max(0, bit_util::Log2(size) - kMinPower2Size);
  auto addr = allocated_[size_class].load();
  if (ARROW_PREDICT_TRUE(addr != nullptr)) {
    // Fast path: already allocated
    return reinterpret_cast<const uint8_t*>(addr);
  }
  // Slow path: allocate if not already done
  std::lock_guard<std::mutex> lock(allocation_mutex_);
  auto addr = allocated_[size_class].load();
  if (addr == nullptr) {
    const int64_t alloc_size = static_cast<int64_t>(1) << (size_class + kMinPower2Size);
    ARROW_ASSIGN_OR_RAISE(const uint8_t* data, AllocateImmutableZeros(size));
    allocated_[size_class] = addr = reinterpret_cast<uintptr_t>(data);
  }
  return reinterpret_cast<const uint8_t*>(addr);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to my algorithm:

  • + In terms of thread primitives, your fast path only involves a memory fence, whereas mine involves a mutex. I'm not sure if it's even needed, though: it isn't if you can safely make a copy of a shared_ptr while another thread may be updating the contained pointer. That feels like a true statement, at least on x86, but I couldn't figure it out for sure from the C++ docs. If I can remove it, my fast path is just a shared_ptr copy (so, an atomic increment), a null check, and a size check, which I'm pretty sure is the fastest way to do it that implements reference counting for deallocation.
  • + Your version doesn't allocate unnecessarily small buffers.
  • + Your version is more readable, especially seeing how unnecessarily cryptic I wrote the reallocation logic.
  • - Your version has no way to free buffers, so I would argue that it leaks memory. Granted, it's upper-bounded by a bit less than 2x the next larger power-of-two of the largest buffer allocated, so it won't grow without bound. By comparison however, my version will release smaller buffers when they are no longer used, and will free its cache when ReleaseUnused() is called and there are no other users. I also considered a version where the cache is a weak_ptr, in which case the ReleaseUnused() would not be needed, but decided against it mostly because ReleaseUnused() already existed.
  • - Nit, but your version will allocate small buffers regardless of whether a larger buffer is already available, whereas my version will return the largest buffer allocated thus far, and will automatically free previously allocated smaller buffers when all their users go out of scope.
  • - Also kind of a nit, but rounding up to power-of-two-sized buffers means that you might throw an out of memory error even if almost half of the requested memory isn't needed. My algorithm will back off and allocate only as much as is needed if the 2 * previous size allocation fails.

An inability to free something, especially if that something is large, feels like bad news to me, so I'm hesitant to just copy your version in and call it a day. But if nothing else, I'll add a lower bound for allocation size and try to rewrite the allocation algorithm to be less cryptic tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An inability to free something, especially if that something is large, feels like bad news to me, so I'm hesitant to just copy your version in and call it a day.

That is true. However, it shouldn't be a concern if we can ensure that the pages don't actually allocate physical memory (or almost none of it, such as /dev/zero).

Copy link
Member

@pitrou pitrou Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not sure if it's even needed, though: it isn't if you can safely make a copy of a shared_ptr while another thread may be updating the contained pointer.

You can safely update the object pointed to (well, except that accessing that object from different threads then becomes racy). However, changing the pointer requires use of dedicated atomic access functions: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

I guess this means you should favour changing the pointer, since all accesses will be confined in this utility function.

Copy link
Contributor Author

@jvanstraten jvanstraten Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. However, it shouldn't be a concern if we can ensure that the pages don't actually allocate physical memory (or almost none of it, such as /dev/zero).

I don't think we can in general, unfortunately. I would have no idea how to do it on something as ubiquitous as Windows (or if it can be done at all), and I'm sure that in general there are more exotic operating systems and architectures that simply can't do it. Also, for 32-bit systems/builds (if Arrow supports those) virtual memory is also in relatively short supply.

However, changing the pointer requires use of dedicated atomic access functions: [...]

Ah, great, those functions are exactly what I had missed!

I improved my allocation algorithm accordingly here: e628688

ETA: and d89c297 (old habits die hard)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the atomic operations don't exist in libstdc++ 4.9 and down, as still used in CI via RTools 35. So I guess in that case, my algorithm has to resort to a mutex after all. f67f75c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. You can just use arrow/util/atomic_shared_ptr.h :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that's certainly a lot cleaner. dee486a


~ImmutableZeros() override;

// Prevent copies and handle moves explicitly to avoid double free
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds hackish to have a separate ImmutableZeros class, IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why or what you mean by that? It's analogous to PoolBuffer in that it uses RAII to free memory when it is no longer needed, except two Buffer classes are needed to correctly model shared ownership, i.e. ImmutableZeros models the data, ImmutableZerosPoolBuffer models a shared reference to that data (even in contexts where a unique_ptr or raw pointer/reference to a Buffer is needed). Writing it down like this though, the names could use some refactoring, especially now that they both implement Buffer.

std::shared_ptr<MemoryManager> default_cpu_memory_manager();

/// A memory manager that uses the immutable zeros interface of the given memory pool,
/// rather than the normal mutable buffer interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I honestly don't understand why it would be useful to implement this, while we're already exposing additional methods to MemoryPool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, if a memory manager is associated with a buffer, it should be safe to assume that that memory manager actually is managing that buffer, i.e. at the very least it should be possible to create a buffer with the same allocation behavior by calling the associated memory manager's AllocateBuffer method. So, I ended up implementing this special memory manager in the end because that otherwise wouldn't be the case for these buffers.

In general, I'm too new to the project to have a good understanding of why there are so many layers of abstractions and methods for creating a buffer and doing memory management... but surely there are good reasons for them, so wouldn't not implementing them consistently lead to issues down the line?

I will yield to the idea that a lot of this complexity can be removed if never being able to safely free these buffers is acceptable, because then they'd just become like any other Buffer that has no ownership information associated with it. But IMO that's a slippery slope at best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPUImmutableZerosMemoryManager doesn't really respect the MemoryManager contract, since it only allocates immutable memory. So I don't think it's useful implementing it: you won't be able to pass a CPUImmutableZerosMemoryManager to arbitrary Arrow APIs and hope that it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's fair... but don't you agree with my point that if a Buffer is associated with some memory manager, it should be possible to recreate it with that memory manager? Maybe the solution is just to not associate a memory manager with immutable zeros buffers, but doesn't that imply that the memory it points to is not owned by a memory pool? I'm generally confused about how to correctly implement this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the memory manager APIs are fairly new/unused and the contract hasn't been pinned down here. That said, I also don't see the issue with sticking to the regular CPU device/memory manager even for immutable buffers. (After all, regular buffers can also be immutable already.) And to me 'managing' means 'can read the data backed by the buffer', not necessarily 'will allocate you exactly the same buffer'. (For instance: PyBuffer is part of the CPU memory manager, since it can read the data in that buffer, even if it can't allocate you a new Python object.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to me 'managing' means 'can read the data backed by the buffer', not necessarily 'will allocate you exactly the same buffer'.

But that's what Device represents according to its docstring? In contrast, you can have multiple memory managers per device, expressly intended to provide memory management primitives for some particular allocation method, such as for some particular memory pool. The allocation method for immutable zero buffers is fundamentally different, so I stand by my interpretation of those docstrings that immutable zero buffers warrant a new memory manager class. However, I certainly can't vouch for the correctness of those docstrings or the usefulness/applicability of this layer if defined as such.

Furthermore, if the intention of associating a MemoryManager with a buffer is only to specify which device can actually access it and how (which seems very sensible to me) rather than also allowing similar buffers to be allocated (which indeed does not seem very useful to me), the association shouldn't have been a memory manager but a Device. After all, the only other information provided by MemoryManager on top of its associated Device is how to allocate buffers. The rest of its methods should IMO be moved to Device. If Device should be kept clean of memory-related stuff, there should be an intermediate layer called MemoryRegion or AddressSpace or something that decouples allocation strategies from address space/the significance and accessibility of a pointer. In general a device can have multiple address spaces, anyway (such as NUMA nodes on CPU, different memory banks on an accelerator, etc.). Whether MemoryManager is still relevant on top of MemoryPool at that point is disputable, but I guess the biggest difference is that MemoryPool operates on pointers to bytes, whereas MemoryManager yields buffers that clean themselves up via RAII.

Nevertheless, I feel like this discussion is getting to the point where it shouldn't be held in a github thread on a mostly unrelated PR (what would be the right place? a JIRA issue?), and in general I don't really have the bandwidth or the use cases right now to dive further into this rabbit hole. So, for now, I've just reverted the additions I made relating to MemoryManager stuff, and we (or other people) can delve into this if/when it becomes relevant. 976dcf6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do have good points. These abstractions are rather new, or at least under-used, and could be refined.

Most APIs do indeed work with MemoryPool, not MemoryManager; MemoryPool has been there for quite a long time, and MemoryManager was only added relatively recently, to support non-CPU memory. The right place would be JIRA and/or the ML, I think. (Or just an ML post referencing this thread; GitHub is OK for discussion, we might just need more eyes.)

to:
 - use atomic_load/atomic_store for shared access to the cache,
   rather than using a mutex
 - allocate at least 4096 bytes for the first allocation
 - be generally more readable
// allocations).
static const int64_t kMinAllocSize = 4096;
int64_t alloc_size =
std::max(size, current_buffer ? (current_buffer->size() * 2) : kMinAllocSize);
Copy link
Member

@pitrou pitrou Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps bit_util::RoundUpToPowerOf2(size) rather than size?

Edit: perhaps that would be detrimental for large allocations. Nevermind.


// If we fail to do so, fall back to trying to allocate the requested size
// exactly as a last-ditch effort.
if (!result.ok() || data == nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is it possible to get nullptr here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be able to in the default implementation, but AllocateImmutableZeros is virtual, so someone could override it in a custom implementation. Still, I suppose I should have gotten rid of that. As a first pass I had it throw an error properly when data is set to null or not modified, then changed my mind and did a DCHECK instead. Now the null check there indeed doesn't really do anything useful anymore. 3dda474

// Because we now have a copy in our thread, the use count will be 2 if
// nothing else is using it.
if (cache.use_count() <= 2) {
atomic_store(&immutable_zeros_cache_, std::shared_ptr<ImmutableZeros>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem if immutable_zeros_cache_ was modified in the meantime? Probably not, just checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely thread-safe in the sense that if other threads are doing stuff with the cache while ReleaseUnused() is called, ReleaseUnused() may or may not flush the cache due to a data race in use_count. But it doesn't break anything in either case, and ReleaseUnused() is already documented to be best-effort.

}

void FreeImmutableZeros(uint8_t* buffer, int64_t size) override {
#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
Copy link
Member

@pitrou pitrou Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: we can hide this behind a compatibility function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3738819 (already done earlier, forgot to mention in this thread)

@jvanstraten jvanstraten force-pushed the ARROW-7051-Improve-MakeArrayOfNull-to-support-creat branch from 2f7f874 to 3738819 Compare February 15, 2022 19:21
@jvanstraten jvanstraten force-pushed the ARROW-7051-Improve-MakeArrayOfNull-to-support-creat branch from f67f75c to dee486a Compare February 16, 2022 11:39
@jvanstraten
Copy link
Contributor Author

jvanstraten commented Feb 16, 2022

@pitrou @lidavidm I think I've now resolved (or defended my choices for) everything that came up thus far, and CI is happy. Let me know if I missed anything and/or you think there's more that needs to be changed.

@jvanstraten
Copy link
Contributor Author

I kinda gave up on this a while ago, and by now the JIRA has been unassigned from me due to inactivity. So I should probably just close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants