Skip to content

Conversation

@schmidt-sebastian
Copy link
Contributor

  • lastOperation is read from multiple threads
  • the rate limiter is not thread-safe

-lastOperation is read from multiple threads
- the rate limiter is not thread-safe
@schmidt-sebastian schmidt-sebastian requested a review from a team as a code owner February 6, 2021 00:31
@schmidt-sebastian schmidt-sebastian requested a review from a team February 6, 2021 00:31
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Feb 6, 2021
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/java-firestore API. label Feb 6, 2021
@codecov
Copy link

codecov bot commented Feb 6, 2021

Codecov Report

Merging #529 (4473827) into master (3ed79ae) will increase coverage by 0.36%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #529      +/-   ##
============================================
+ Coverage     73.54%   73.90%   +0.36%     
- Complexity     1058     1098      +40     
============================================
  Files            67       67              
  Lines          5722     5837     +115     
  Branches        686      714      +28     
============================================
+ Hits           4208     4314     +106     
  Misses         1299     1299              
- Partials        215      224       +9     
Impacted Files Coverage Δ Complexity Δ
...in/java/com/google/cloud/firestore/BulkWriter.java 100.00% <100.00%> (ø) 45.00 <17.00> (+3.00)
...om/google/cloud/firestore/BulkWriterOperation.java 100.00% <100.00%> (ø) 5.00 <0.00> (-2.00)
...oogle/cloud/firestore/v1/FirestoreAdminClient.java 57.46% <0.00%> (-1.89%) 38.00% <0.00%> (+8.00%) ⬇️
.../com/google/cloud/firestore/CustomClassMapper.java 76.84% <0.00%> (-0.06%) 111.00% <0.00%> (ø%)
...va/com/google/cloud/firestore/BulkCommitBatch.java 100.00% <0.00%> (ø) 7.00% <0.00%> (ø%)
.../com/google/cloud/firestore/BulkWriterOptions.java 100.00% <0.00%> (ø) 2.00% <0.00%> (ø%)
...oogle/cloud/firestore/spi/v1/GrpcFirestoreRpc.java 0.00% <0.00%> (ø) 0.00% <0.00%> (ø%)
...restore/collection/ImmutableSortedMapIterator.java 0.00% <0.00%> (ø) 0.00% <0.00%> (ø%)
...rc/main/java/com/google/cloud/firestore/Watch.java 91.30% <0.00%> (+0.03%) 56.00% <0.00%> (ø%)
.../com/google/cloud/firestore/UserDataConverter.java 95.37% <0.00%> (+0.04%) 37.00% <0.00%> (ø%)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ed79ae...b9a2f56. Read the comment docs.

@schmidt-sebastian
Copy link
Contributor Author

Hm, this is now super flaky.


@Test
public void retriesMaintainCorrectWriteResolutionOrdering() throws Exception {
public void retryResolvesBeforeFlush() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I will be able to deflake the original test as the order of whether doc1, doc2, doc1 or doc1, doc1, doc2 is enqueued is no longer deterministic. I don't think that's a problem though, but it breaks this test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is still reasonable though, so it should be fine.


@Test
public void retriesMaintainCorrectWriteResolutionOrdering() throws Exception {
public void retryResolvesBeforeFlush() throws Exception {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is still reasonable though, so it should be fine.

* time a new write is enqueued.
*/
private ApiFuture<Void> lastOperation = ApiFutures.immediateFuture(null);
private volatile ApiFuture<Void> lastOperation = ApiFutures.immediateFuture(null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, did not know about volatile in Java.

If lastOperation is only accessed inside the synchronized call, why do we still need the volatile label? My stackoverflow hunting initially led me to think of volatile variables as a simpler/weaker form of synchronization than synchronized, which makes volatile redundant (link, another link). However, after reading another conflicting post that argues volatile is required for reading the variable, I'm now uncertain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realize we're reading lastOperation in flush(), which is out of the synchronized block, which then means that we need to use volatile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment... Although I can't find the documentation to back this up, I'm 99% sure that you basically never need to declare a variable as volatile in Java. In my entire career I have never needed to use volatile, even in heavily multi-threaded code. I think that the only cases where you would need to use volatile would be (a) if you're implementing some sort of custom lock-free synchronization on the variable (e.g. busy waiting) or (b) if the variable's value could be changed asynchronously from native code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile ensures that the value that is read is never read from the CPU cache. Without this, flush() is not guaranteed to see the latest state of lastOperation if lastOperation was modified by another thread.

cc @wilhuff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This volatile is an indicator of broader problems within this implementation.

  • verifyNotClosed is called unlocked and accesses closed without locking
  • writesEnqueued is written to by executeWrite also without locking

In Java, volatile is expensive relative to the underlying atomic operations that CPUs can perform--it's like a half lock. If you made all of these values volatile it would actually be more expensive than just synchronizing. In Java synchronization is very cheap in the uncontended case, and this doesn't seem to be a class that should have to optimize for that.

My advice:

  • don't use volatile; just use synchronization consistently
  • acquire the lock nearer the API boundary so you're not acquiring it multiple times per private method used to implement the API call
  • name your internal methods to show you assume the lock is held (e.g. verifyNotClosedLocked)
  • avoid trying to be fancy with multithreading at all

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is to say, at a high level, the way to fix the problem with flush accessing lastOperation outside a synchronized block is to add a synchronized block. You should also fix the other unsynchonized accesses by adding synchronized blocks. Once you do, you'll notice you're synchronizing multiple times per API invocation, in which case you can hoist the synchronized blocks.

Also, see go/atomic-danger internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer, very helpful. I will look at this a bit more tomorrow (and will likely add more synchronization).

To "defend" the current state of the PR: Both this.closed and this.writesEnqueued is only accessed from the user thread. bulkCommitBatch, lastOperation and rateLimiter are accessed both from the user thread and as a result of a BatchCommit response, in which case they are accessed from the BulkWriter thread.

Time to read go/atomic-danger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR to use synchronization. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow my understanding of Java volatile was very mis-guided. I just read https://developer.android.com/reference/java/util/concurrent/atomic/package-summary and the documents to which it links.

I learned that the following code could loop forever if this.done is not declared volatile:

while (!this.done) {
  Thread.sleep(100);
}

The compiler could choose to read this.done exactly once and re-use that cached value forever, even if another thread comes in and sets this.done to true. Declaring this.done as volatile would preclude this optimization and the thread would eventually see this.done set to true by another thread. I did not know this and have definitely coded some buggy busy waits in the past!

Copy link

@thebrianchen thebrianchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how moving the synchronized calls up and using the locked method naming makes the code easier to understand, thanks!

Also, why were the @GuardedBy annotations removed? I think having them makes it much easier to quickly see which variables are accessed under lock.

For example, I'm looking at some Android source code, and they use the Locked method naming convention along with @GuardedBy annotations for the instance variables that are accessed under lock.

private final ScheduledExecutorService bulkWriterExecutor;

/**
* Lock object for all mutable state in bulk writer. Bulk Writer state is accessed from the user

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Bulk Writer/BulkWriter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Used to track when writes are enqueued. The user handler executors cannot be changed after a
* write has been enqueued.
*/
private boolean writesEnqueued = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all accesses to writes enqueued are under lock, should we add a @GuardedBy annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them since this now applies to all state. Based on the Android link, I added them back. There is also some tooling that catches violations when GuardedBy object is accessed under lock (i.e. in Android Studio's build), so it is likely always better to add this annotations.

@thebrianchen thebrianchen removed their assignment Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: firestore Issues related to the googleapis/java-firestore API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants