ARROW-13680: [C++] Create an asynchronous nursery to simplify capture logic #10968

westonpace · 2021-08-20T04:05:56Z

This PR aims to introduce some structured concurrency utilities for working with futures. The async nursery contains a few utilities for managing the lifetimes of objects to avoid constantly worrying about capturing shared_ptr's to maintain lifetimes.

A nursery is created with a lambda and everything that runs in that lambda runs "in the nursery". Any object (that has a reference to the nursery) can add a future to the nursery as a dependent task. The nursery will not finish until all of those futures are finished. This means that any objects created outside the nursery (e.g. typically things like options objects, request objects, etc.) will remain valid and can be captured by reference.

Objects that spawn callbacks which need references to this can extend AsyncCloseable and override DoClose (which returns a Future). These objects will not be deleted until the future returned by DoClose has been completed. In addition, objects extending AsyncCloseable can add futures as dependent tasks and the object will be kept alive until those futures complete. In order for this to work these objects must be created by the nursery using MakeSharedCloseable or MakeUniqueCloseable. Any object created in this way will keep the nursery alive until the object's DoClose and any of the object's dependent tasks have finished.

Objects that extend AsyncCloseable have a OnClosed future which will only be completed when the object is destroyed. This facilitates parent/child relationships. If a parent needs to stay alive until a child has completed all of its work (often because the child has callbacks referencing parent state or the parent needs to perform some final cleanup) then the parent can add the child's OnClosed future as a dependent task.

Cons:

It's a bit tricky to get it working with the pimpl pattern but NurseryPimpl helps here
There are a lot of "nursery-like" objects starting to pop up (IOContext, ExecContext, cancellation tokens, ...) and we may want to consolidate at some point

ToDo:

Fix up some of the APIs with some template hackery (e.g. merging RunInNursery and RunInNurserySt)

github-actions · 2021-08-20T04:06:22Z

https://issues.apache.org/jira/browse/ARROW-13680

westonpace · 2021-08-20T04:11:26Z

@pitrou @bkietz @lidavidm curious to get your thoughts on the general idea. I think I still have at least one good overhaul (hopefully getting rid of the base classes) but so far it has been very helpful keeping the dataset writer logic clean.

lidavidm

As mentioned before, I'm supportive of the idea. It would be good to try to consolidate some of these abstractions (TaskGroup is another) and I think it would be good to study prior art more seriously if/when we build this out more (though perhaps we won't need anything more sophisticated).

TaskGroup is the closest to this abstraction IMO. There's not that many uses of it in the codebase in the first place, either.

cpp/src/arrow/util/async_nursery.h

cpp/src/arrow/util/async_nursery.cc

…s optional

…rent then the parent should stay alive until the child is finished, not the other way around.

…ing for a future to complete and then deleting the future (which still has callbacks to run that haven't yet been captured

cpp/src/arrow/util/async_nursery.h

lidavidm · 2021-08-24T17:05:36Z

cpp/src/arrow/util/async_nursery.cc

+  // Lazily create the future to save effort if we don't need it
+  if (!on_closed_.is_valid()) {
+    on_closed_ = Future<>::Make();
+  }


I'm a bit worried about this being race-prone. So far it looks like it's only used in the one constructor, perhaps it could be inlined there?

I think you're right about the danger. It is used elsewhere at the moment in the dataset writer PR (https://github.com/apache/arrow/pull/10955/files#diff-387ad04c2450a38044e667e07183b8265866cb3736d10acdce137c2b83737b16R345-R346). I use it decrement the number of open writers when the file has finished writing. So I just changed it so we always create the future. It was a bit of a premature optimization anyways.

cpp/src/arrow/util/async_nursery.h

westonpace · 2021-08-27T19:18:08Z

I've cleaned up the todos and addressed PR comments.

lidavidm

Looks good to me.

cpp/src/arrow/util/async_nursery.h

pitrou · 2021-08-30T13:43:31Z

I'm trying to understand the ergonomics of this API.
If I understand correctly, nothing here actually keeps the tasks alive. What this relies upon is that the user stored the shared_ptr or unique_ptr returned by Nursery::MakeXXXCloseable somewhere so that lifetimes are handled correctly?
Is there a risk that these pointers may be kept alive too long (and delay the DoClose calls accordingly)? What is the recommended strategy for using this facility?

Did you try to use this in the codebase to check that it actually reduces the burden of managing the sequencing of destructor calls?

pitrou · 2021-08-30T13:20:26Z

cpp/src/arrow/util/async_nursery.h

+class ARROW_EXPORT Nursery {
+ public:
+  template <typename T, typename... Args>
+  typename std::enable_if<!std::is_array<T>::value, std::shared_ptr<T>>::type


The enable_if doesn't seem useful here (especially as you have a static_assert below that would catch arrays)

pitrou · 2021-08-30T13:20:39Z

cpp/src/arrow/util/async_nursery.h

+ public:
+  template <typename T, typename... Args>
+  typename std::enable_if<!std::is_array<T>::value, std::shared_ptr<T>>::type
+  MakeSharedCloseable(Args&&... args) {


Can you add a docstring explaining what it does?

pitrou · 2021-08-30T13:20:51Z

cpp/src/arrow/util/async_nursery.h

+  friend struct DestroyingDeleter;
+};
+
+class ARROW_EXPORT Nursery {


Can you add a docstring explaining what this is/does?

pitrou · 2021-08-30T13:20:59Z

cpp/src/arrow/util/async_nursery.h

+
+  template <typename T, typename... Args>
+  typename std::enable_if<!std::is_array<T>::value,
+                          std::unique_ptr<T, DestroyingDeleter<T>>>::type


Same remarks here.

pitrou · 2021-08-30T13:23:23Z

cpp/src/arrow/util/async_nursery.cc

+      on_closed_.MarkFinished(st);
+    }
+    nursery_->OnTaskFinished(st);
+    delete this;


Ok, so this mandates that this object is heap-allocated using the default C++ allocator, right? Can you mention this somewhere in the docstring?

pitrou · 2021-08-30T13:27:29Z

cpp/src/arrow/util/async_nursery.h

+
+#pragma once
+
+#include <list>


This doesn't seem used?

pitrou · 2021-08-30T13:29:48Z

cpp/src/arrow/util/async_nursery.cc

+      finish_fut = DoClose();
+    }
+  } else {
+    // No dependent tasks were added


Hmm... is it possible for dependent tasks to be added after this?

pitrou · 2021-08-30T13:34:51Z

cpp/src/arrow/util/async_nursery_test.cc

+  bool* destroyed_;
+};
+
+class EvictsChild : public AsyncCloseable {


Add a comment explaining what this does/exercises?

pitrou · 2021-08-30T13:37:45Z

cpp/src/arrow/util/async_nursery_test.cc

+                                                    child_future, final_future);
+      evicts_child->EvictChild();
+      // Owner no longer has reference to child here but it's kept alive by nursery
+      // because it isn't done


I don't understand this comment, because by reading the source code, I get the impression that the nursery doesn't keep anything alive (it does not have a container of tasks or futures).

pitrou · 2021-08-30T13:40:13Z

cpp/src/arrow/util/async_nursery.h

+  /// Subclasses should override this and perform any cleanup.  Once the future returned
+  /// by this method finishes then this object is eligible for destruction and any
+  /// reference to `this` will be invalid
+  virtual Future<> DoClose() = 0;


API ergonomics question: since this is the single point of customization, would it be easier if AsyncCloseable took a std::function<Future<>> close_func parameter, instead of having to write a subclass?

westonpace · 2021-08-31T05:33:13Z

If I understand correctly, nothing here actually keeps the tasks alive. What this relies upon is that the user stored the shared_ptr or unique_ptr returned by Nursery::MakeXXXCloseable somewhere so that lifetimes are handled correctly?

There's two concerns here. 1) Keeping the object alive (the nursery does not do this, but the smart pointers do) and 2) Not returning until all tasks have finished (the nursery does this because it blocks until every task has finished).

Is there a risk that these pointers may be kept alive too long (and delay the DoClose calls accordingly)?

Yes, the nursery could very much trigger deadlock if a task never finishes. I'll call back to this later.

What is the recommended strategy for using this facility?
Did you try to use this in the codebase to check that it actually reduces the burden of managing the sequencing of destructor calls?

See #11017
@pitrou

I also received some negative feedback offline from @bkietz and so I tried today to rewrite this in another light, using only the asynchronous smart pointers and an asynchronous task group. The result is here. However, it still doesn't quite solve the problem. So, let me try and state the problem I am trying to solve in clear terms and I am open to any solution someone can come up with but at the moment this PR is still my preferred solution.

Problem Statement

The use case here is from the perspective of a developer trying to write some code and they want to ensure the code they write is used safely. For example, let's consider the case of writing a file writer queue. A synchronous declaration might look something like this...

class FileWriterQueue {
public:
  FileWriterQueue(std::unique_ptr<FileWriter> writer, const FileWriteOptions& options);
  ~FileWriterQueue();
  Status QueueBatch(std::shared_ptr<RecordBatch> batch);
  void Finish();
private:
  std::unique_ptr<FileWriter> writer;
  const FileWriteOptions& options;
};

Now, let's pretend our implementation creates a dedicated writer thread and every call to QueueBatch adds the batch to a producer consumer queue that the writer thread drains. Our destructor would then look like this...

FileWriterQueue::~FileWriterQueue() {
  EnsureFinished();
  writer_thread.join();
}

Now let's look at how this is used...

void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) {
  std::unique_ptr<FileWriter> file_writer = OpenWriter();
  FileWriterQueue file_writer_queue(std::move(file_writer));
  for (const auto& batch : batches) {
    ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
  }
  file_writer_queue.Finish(); // I could also skip this and just rely on the EnsureFinished in the destructor
}

We are ensured several things:

The FileWriterQueue will not be deleted until the thread is joined
The options remain valid until the thread is joined
All work is completely done when WriteBatches returns

My goal is to allow those same guarantees to exist if the "finishing work" (the stuff in the destructor) is a future. For example, the asynchronous analogue of above might be...

class FileWriterQueue {
public:
  FileWriterQueue(std::unique_ptr<FileWriter> writer, const FileWriteOptions& options);
  ~FileWriterQueue();
  void QueueBatch(std::shared_ptr<RecordBatch> batch);
  Future<> Finish();
private:
  std::unique_ptr<FileWriter> writer;
  const FileWriteOptions& options;
};

Now our destructor can't do anything. It can't block on Finish() because that would defeat the purpose of being asynchronous. Someone naively using this class might do...

void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) {
  std::unique_ptr<FileWriter> file_writer = OpenWriter();
  FileWriterQueue file_writer_queue(std::move(file_writer));
  for (const auto& batch : batches) {
    ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
  }
  return file_writer_queue.Finish().status();
}

This works ok until a call to QueueBatch returns an error status halfway through the write. Then the code will bail out and both file_writer_queue and options will go out of scope. This means if there are any leftover captures and they get executed they will segfault.

With the nursery you can write...

void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) {
  return util::RunInNursery([] (Nursery* nursery) {
    std::unique_ptr<FileWriter> file_writer = OpenWriter();
    FileWriterQueue file_writer_queue(nursery, std::move(file_writer));
    for (const auto& batch : batches) {
      ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
    }
    return file_writer_queue.status();
  }
}

...and now you can be assured that an error occurring during QueueBatch will not cause a segmentation fault because the nursery will still block until any outstanding captures have resolved.

Is there a risk that these pointers may be kept alive too long (and delay the DoClose calls accordingly)?

Yes, if there is a bug, but, the risk is no greater than what you have with...

FileWriterQueue::~FileWriterQueue() {
  EnsureFinished();
  writer_thread.join(); // Could deadlock / delay for a long time if there is a bug
}

westonpace · 2021-08-31T05:36:07Z

Why can't FileWriteOptions be a `shared_ptr`?

It can, and if we force everything to be shared_ptr then we just have to keep this alive and an asynchronous smart pointer would be sufficient but there are still lots of things like ExecContext, Executor, IOContext, etc. which we often pass by pointer and can go out of scope when the operation finishes.

Plus, it still seems disingenuous to return from a high-level function/operation when cleanup work is still outstanding.

westonpace · 2021-09-04T02:37:33Z

Ok, I took a week and did other things and, coming back, I've decided not to pursue this PR in this form. Instead I'll push the async smart ptrs + an async task group as its own thing. I appreciate all the feedback as it has helped guide me tremendously in coming up with the best solution.

…r logic This ended up being a fairly comprehensive overhaul of the existing dataset writer mechanism. ~~This PR relies on #10968 and will remain in draft until that is completed.~~ Breaking Changes: * The dataset writer no longer works with the synchronous scanner. I don't think this would be a huge change but I think the current plan is to hopefully deprecate the synchronous scanner. (ARROW-13338) This required changes in a python/r/ruby which will presumably be reverted when ARROW-13338 is done. * The default behavior is now to error if the output directory has any existing data. This can be controlled with `existing_data_behavior` (see below) * Previously a single global counter was used for naming files. This PR changes to a counter per directory. So the following... ``` /a1/b1/part-0.parquet /a1/b1/part-2.parquet /a1/b2/part-1.parquet ``` ...would be impossible. Instead you would receive... ``` /a1/b1/part-0.parquet /a1/b1/part-1.parquet /a1/b2/part-0.parquet ``` ...this does not, however, mean that the resulting data files will be deterministic. If the data in `/a1/b1/part-0.parquet` and `/a1/b1/part-1.parquet` originated from two different files being scanned in an unordered fashion then either part could represent either file. A number of test cases in all implementations had to change as the expected paths for dataset writes changed. * New features: * The dataset writer now works with the async scanner (ARROW-12803) * The dataset writer now respects backpressure (closes ARROW-2628?, related to but does not fully solve ARROW-13590 and ARROW-13611) and will stop pulling from the scanner when `max_rows_queued` (provided as an argument to `DatasetWriter`) is exceeded. By default `max_rows_queued` is 64M. This is not an "option" as it I don't think it should be exposed to the user. I think it would be offering too many knobs. I think eventually we may want to wrap up all backpressure into a single configurable setting. * `FileSystemDatasetWriteOptions` now has a `max_rows_per_file` setting (ARROW-10439). * `FileSystemDatasetWriteOptions` now has a `max_open_files` setting (ARROW-12321) which prevents opening too many files. Instead the writer will apply backpressure on the scanner while also closing the open file with the greatest # of rows already written (then resume writing once the file is closed). * `FileSystemDatasetWriteOptions` now has a `existing_data_behavior` setting (ARROW-12358, ARROW-7706) which controls what to do if there is data in the destination. * Deferred for future work: * Add the new options to the python/R APIs (ARROW-13703) * Limiting based on file size (ARROW-10439) * More fine grained error control (ARROW-14175) Closes #10955 from westonpace/feature/ARROW-13542--c-compute-dataset-add-dataset-writenode-for Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

…r logic This ended up being a fairly comprehensive overhaul of the existing dataset writer mechanism. ~~This PR relies on apache#10968 and will remain in draft until that is completed.~~ Breaking Changes: * The dataset writer no longer works with the synchronous scanner. I don't think this would be a huge change but I think the current plan is to hopefully deprecate the synchronous scanner. (ARROW-13338) This required changes in a python/r/ruby which will presumably be reverted when ARROW-13338 is done. * The default behavior is now to error if the output directory has any existing data. This can be controlled with `existing_data_behavior` (see below) * Previously a single global counter was used for naming files. This PR changes to a counter per directory. So the following... ``` /a1/b1/part-0.parquet /a1/b1/part-2.parquet /a1/b2/part-1.parquet ``` ...would be impossible. Instead you would receive... ``` /a1/b1/part-0.parquet /a1/b1/part-1.parquet /a1/b2/part-0.parquet ``` ...this does not, however, mean that the resulting data files will be deterministic. If the data in `/a1/b1/part-0.parquet` and `/a1/b1/part-1.parquet` originated from two different files being scanned in an unordered fashion then either part could represent either file. A number of test cases in all implementations had to change as the expected paths for dataset writes changed. * New features: * The dataset writer now works with the async scanner (ARROW-12803) * The dataset writer now respects backpressure (closes ARROW-2628?, related to but does not fully solve ARROW-13590 and ARROW-13611) and will stop pulling from the scanner when `max_rows_queued` (provided as an argument to `DatasetWriter`) is exceeded. By default `max_rows_queued` is 64M. This is not an "option" as it I don't think it should be exposed to the user. I think it would be offering too many knobs. I think eventually we may want to wrap up all backpressure into a single configurable setting. * `FileSystemDatasetWriteOptions` now has a `max_rows_per_file` setting (ARROW-10439). * `FileSystemDatasetWriteOptions` now has a `max_open_files` setting (ARROW-12321) which prevents opening too many files. Instead the writer will apply backpressure on the scanner while also closing the open file with the greatest # of rows already written (then resume writing once the file is closed). * `FileSystemDatasetWriteOptions` now has a `existing_data_behavior` setting (ARROW-12358, ARROW-7706) which controls what to do if there is data in the destination. * Deferred for future work: * Add the new options to the python/R APIs (ARROW-13703) * Limiting based on file size (ARROW-10439) * More fine grained error control (ARROW-14175) Closes apache#10955 from westonpace/feature/ARROW-13542--c-compute-dataset-add-dataset-writenode-for Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

Initial sketch of asynchronous nursery

d90837c

github-actions bot added the Component: C++ label Aug 20, 2021

lidavidm reviewed Aug 20, 2021

View reviewed changes

cpp/src/arrow/util/async_nursery.h Outdated Show resolved Hide resolved

cpp/src/arrow/util/async_nursery.cc Outdated Show resolved Hide resolved

westonpace added 3 commits August 20, 2021 14:13

ARROW-13542: Updated async nursery to use custom deleter, made parent…

99abda3

…s optional

ARROW-13542: A few more tweaks to the async nursery

f718e6f

ARROW-13542: Logic for parent pointer was backwards. If you pass a pa…

3109527

…rent then the parent should stay alive until the child is finished, not the other way around.

westonpace mentioned this pull request Aug 23, 2021

ARROW-13650: [C++] Create dataset writer to encapsulate dataset writer logic #10955

Closed

westonpace added 5 commits August 23, 2021 13:43

ARROW-13542: Lint

b0eb1c5

Lint

1b0a581

Adding ARROW_EXPORT

4739916

Fixing a bug in futures that could be exposed when synchronously wait…

808d23b

…ing for a future to complete and then deleting the future (which still has callbacks to run that haven't yet been captured

Still working on ARROW_EPXORT correctness

debb3b3

lidavidm reviewed Aug 24, 2021

View reviewed changes

Cleanup and address PR review

7561cfb

lidavidm approved these changes Aug 27, 2021

View reviewed changes

cpp/src/arrow/util/async_nursery.h Outdated Show resolved Hide resolved

Removed stale comment

389292e

pitrou reviewed Aug 30, 2021

View reviewed changes

westonpace closed this Sep 4, 2021

westonpace deleted the feature/arrow-13680-async-nursery branch January 6, 2022 08:16

asfimport mentioned this pull request Sep 9, 2021

[C++] Create an asynchronous nursery to simplify capture logic #29316

Closed


		#pragma once

		#include <list>

ARROW-13680: [C++] Create an asynchronous nursery to simplify capture logic #10968

ARROW-13680: [C++] Create an asynchronous nursery to simplify capture logic #10968

Conversation

westonpace commented Aug 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 20, 2021

Uh oh!

westonpace commented Aug 20, 2021

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace commented Aug 27, 2021

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou commented Aug 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

westonpace commented Aug 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Uh oh!

westonpace commented Aug 31, 2021

Why can't FileWriteOptions be a shared_ptr?

Uh oh!

westonpace commented Sep 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

westonpace commented Aug 20, 2021 •

edited

Loading

westonpace commented Aug 31, 2021 •

edited

Loading

Why can't FileWriteOptions be a `shared_ptr`?