DiskBasedCache fix with unit tests#52

Merged

jpd236 merged 14 commits intogoogle:masterfrom

Jul 28, 2017

Contributor

joebowbeer commented Jul 7, 2017

Fixes Issue #12 without changing cache format

Changes:

DiskBasedCache fixed to avoid NegativeArraySizeException and OutOfMemoryError thrown from streamToBytes. Enhanced CountingInputStream nested class to maintain bytesRemaining (the file length initially), so that streamToBytes can validate its length parameter before attempting to allocate a new byte array.
DiskBasedCacheTest unit tests extended to 100% method coverage and 95% code coverage.
All tests passing: https://circleci.com/gh/joebowbeer/volley/24

joebowbeer added 9 commits

July 5, 2017 00:38


          Merge pull request #1 from google/master

ff84898

Update from original


          Bump Java version in pom to fix CircleCI

f2cd49a


          Add TODO for DataInput/Ouput. Use diamond operator in DBC test.

06b1391


          Test reading empty stream throws EOF

ec4b2a1


          Code coverage for CountingInputStream

9b24314


          Update versions in pom for mvn integration-test

ad63150


          Near-full test coverage for DBC, but some tests fail!

fba8125


          Code coverage at 95% with all tests passing

1739b4c


          Test CountingInputStream bytesRemaining

81269ec

joebowbeer mentioned this pull request

DiskBasedCache rewrite with unit tests #13

Closed

Contributor Author

joebowbeer commented Jul 7, 2017

See #13 for prior discussion.

joebowbeer added 3 commits

July 6, 2017 23:30


          Restore comment

9560dee


          Merge pull request #2 from google/master

776baf8

Update from original


          Merge remote-tracking branch 'origin/master' into dbc

f48eafa

Collaborator

jpd236 commented Jul 11, 2017

Thanks for the rebase :)

I'll get to this review soon - I want to give it a very careful look. At a high level, though, I feel much better about this approach than what we were looking at before.

jpd236 requested changes

View reviewed changes

Collaborator

jpd236 left a comment

Overall - I feel pretty good about this approach, and really appreciate your thorough unit tests to ensure that nothing regresses. My comments are mostly nits, and I think there's definitely a path forward to getting this merged. Nice work!

src/main/java/com/android/volley/toolbox/DiskBasedCache.java

+                              byte[] data = streamToBytes(cis, cis.bytesRemaining());
+                              return entry.toCacheEntry(data);
+                          } finally {
+                              //noinspection ThrowFromFinallyBlock

Collaborator

jpd236 Jul 13, 2017

This confused me at first, so it would be nice to add a clarifying comment explaining why this is a safe warning to suppress, e,g.:

// Any IOException thrown here is handled by the below catch block by design.

Contributor Author

joebowbeer Jul 14, 2017 •

edited

Loading

done.

BTW, this pattern is Style 2 at javapractices.com - Finally and catch

src/main/java/com/android/volley/toolbox/DiskBasedCache.java Outdated

+                          CountingInputStream cis = new CountingInputStream(
+                                  new BufferedInputStream(createInputStream(file)), (int) file.length());
+                          try {
+                              CacheHeader found = CacheHeader.readHeader(cis);

Collaborator

jpd236 Jul 13, 2017

nit: could you name the variable "entryHeader" or something more clearly indicative of what this variable represents?

Contributor Author

joebowbeer Jul 14, 2017

done

src/main/java/com/android/volley/toolbox/DiskBasedCache.java Outdated

+                                  new BufferedInputStream(createInputStream(file)), (int) file.length());
+                          try {
+                              CacheHeader found = CacheHeader.readHeader(cis);
+                              if (!key.equals(found.key)) {

Collaborator

jpd236 Jul 13, 2017

nit: while this is safe, I prefer "!TextUtils.equals(key, found.key)" as a best practice in general because it offers 100% guarantee that the code won't crash even if key is null, whereas otherwise we have to look more closely at the lines above to make sure that this is a guaranteed impossibility.

Contributor Author

joebowbeer Jul 14, 2017

done

src/main/java/com/android/volley/toolbox/DiskBasedCache.java

+                              if (!key.equals(found.key)) {
+                                  // File was shared by two keys and now holds data for a different entry!
+                                  VolleyLog.d("%s: key=%s, found=%s", file.getAbsolutePath(), key, found.key);
+                                  removeEntry(key);

Collaborator

jpd236 Jul 13, 2017

Should this be remove() or removeEntry()? If the former, maybe it would be cleaner to just do:

throw new IOException(String.format("%s: key=%s but file key = %s", file.getAbsolutePath(), key, found.key));

and let the catch block take care of the cleanup.

Contributor Author

joebowbeer Jul 14, 2017

I added a comment to clarify.

src/main/java/com/android/volley/toolbox/DiskBasedCache.java

+                                  entry.size = cis.bytesRemaining();
+                                  putEntry(entry.key, entry);
+                              } finally {
+                                  //noinspection ThrowFromFinallyBlock

Collaborator

jpd236 Jul 13, 2017

same here re: comment

Contributor Author

joebowbeer Jul 14, 2017

done

src/main/java/com/android/volley/toolbox/DiskBasedCache.java Outdated

-                                  }
-                              } catch (IOException ignored) { }
+                                  CacheHeader entry = CacheHeader.readHeader(cis);
+                                  entry.size = cis.bytesRemaining();

Collaborator

jpd236 Jul 13, 2017

It looks like the previous code considered the entry size equal to the file length (i.e. including the header), but the new code would only count the number of bytes remaining after reading the header.

Contributor Author

joebowbeer Jul 14, 2017

The previous code was inconsistent. The entry.size was set to file.length() in initialize(), but was set to the data.length in put(), via the two-arg CacheHeader constructor. The new code consistently uses the data length.

Collaborator

jpd236 Jul 14, 2017

Ah - good catch re: the inconsistency.

However, since the purpose of tracking the size is to limit the total size of the cache on disk, and since the cache header itself can contain a decent chunk of data (in the response headers), would it be feasible/advisable to consistently use the full size of the file instead?

src/test/java/com/android/volley/toolbox/DiskBasedCacheTest.java Outdated

+              import static org.mockito.Mockito.verify;
+              @RunWith(RobolectricTestRunner.class)
+              @Config(manifest="src/main/AndroidManifest.xml", sdk=21)

Collaborator

jpd236 Jul 13, 2017

Volley does (currently) need to be compatible with SDK 8 and earlier. I'm open to relaxing that to something like API 14, but it'd be a topic for a separate discussion, and I'm worried that if we test against SDK 21 here, we may mask potential issues like calls to APIs that didn't exist back in SDK 8.

Is this necessary, and if so, why?

Contributor Author

joebowbeer Jul 14, 2017

I changed to sdk=16, which is the minimum supported by Robolectric.

Note that Robolectric runs sdk=16 by default.

I added the sdk attribute here because I wanted to eliminate a Robolectric warning about not being able to find the manifest, and adding a manifest attribute seems to require an sdk attribute. (The other Robolectric tests still generate this warning.)

AFAIK none of the unit tests under the test path are testing sdk=8 compliance. For that, I recommend adding some tests under the androidTest path.

src/test/java/com/android/volley/toolbox/DiskBasedCacheTest.java

+                      cache.put("kilobyte3", randomData(1024));
+                      // Create DataInputStream that throws IOException
+                      InputStream mockedInputStream = spy(InputStream.class);

Collaborator

jpd236 Jul 13, 2017

do you actually need a spy here, or does a regular mock work?

Contributor Author

joebowbeer Jul 14, 2017 •

edited

Loading

I think spy is needed, or at the least it is an expedient way to inject only the failure that I want. I need the methods other than read() to function normally.

src/test/java/com/android/volley/toolbox/DiskBasedCacheTest.java Outdated

+                  private Cache.Entry randomData(int length) {
+                      Cache.Entry entry = new Cache.Entry();
+                      byte[] data = new byte[length];
+                      new Random().nextBytes(data);

Collaborator

jpd236 Jul 13, 2017

Random data in unit tests is typically an antipattern - it can lead to flakiness if one set of data causes the tests to fail but others work fine. And flakiness can lead to a lot of pain for things like continuous integration.

Assuming we don't expect the contents of the data to matter much, I'd just write fixed data here of some sort. If we did think the data contents were actually likely to impact code flows rather than just be read agnostically as bytes, it would potentially be reasonable to try a bunch of different combinations of data, but still from a fixed set. But since AFAIK we just read the raw bytes directly and all the interesting logic happens surrounding that, that feels like overkill here.

Contributor Author

joebowbeer Jul 14, 2017 •

edited

Loading

I made the random seed explicit and added a comment.

rules.gradle Outdated

-                  testCompile "org.mockito:mockito-core:1.9.5"
+                  testCompile "junit:junit:4.12"
+                  testCompile "org.hamcrest:hamcrest-library:1.3"
+                  testCompile "org.mockito:mockito-core:2.2.16"

Collaborator

jpd236 Jul 14, 2017

Any objections to using 2.2.29 instead? It's the latest from the 2.2 branch (and is available in our internal prebuilt repositories, whereas 2.2.16 isn't, so it'll make our lives a bit easier when we pull in this patch).

Alternatively, I'd certainly be fine with leaving this at 1.9.5 and dealing with upgrading Mockito separately. Do you need specific new functionality from the upgraded deps?

Contributor Author

joebowbeer Jul 14, 2017

done

jpd236 requested a review from yilanl

July 14, 2017 00:26

Collaborator

jpd236 commented Jul 14, 2017

@yilanl - feel free to take a look as well to see if you have any further concerns.


          changes requested by reviewer

6ed284b

Contributor Author

joebowbeer commented Jul 14, 2017

Thanks for review! PTAL

Contributor Author

joebowbeer commented Jul 15, 2017 via email •

edited

Loading

I think the data size is arguably as good or better than the file size on disk. It gives the user a number they can work with, namely, the size of their own data. It also makes put() easier to implement, and is more easily tested.

On Jul 14, 2017 5:41 PM, "Jeff Davidson" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/main/java/com/android/volley/toolbox/DiskBasedCache.java <#52 (comment)>: > try { - if (fis != null) { - fis.close(); - } - } catch (IOException ignored) { } + CacheHeader entry = CacheHeader.readHeader(cis); + entry.size = cis.bytesRemaining(); Ah - good catch re: the inconsistency. However, since the purpose of tracking the size is to limit the total size of the cache on disk, and since the cache header itself can contain a decent chunk of data (in the response headers), would it be feasible/advisable to consistently use the full size of the file instead? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#52 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABwB_tKXWS6aCquI0PY5iH2BMgPvEuQks5sOAqmgaJpZM4OQh4c> .

Contributor Author

joebowbeer commented Jul 15, 2017 via email

PS - trim is performed before put, based on the size of the data. I think it's a bug that initialize wasn't using the data size as well.

On Jul 14, 2017 6:49 PM, "Joe Bowbeer" ***@***.***> wrote: I think the data size is arguably as good or better. This gives the user a number they can work with, namely, the size of their own data. This makes put() easier to implement, and is also more easily tested. On Jul 14, 2017 5:41 PM, "Jeff Davidson" ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In src/main/java/com/android/volley/toolbox/DiskBasedCache.java > <#52 (comment)>: > > > try { > - if (fis != null) { > - fis.close(); > - } > - } catch (IOException ignored) { } > + CacheHeader entry = CacheHeader.readHeader(cis); > + entry.size = cis.bytesRemaining(); > > Ah - good catch re: the inconsistency. > > However, since the purpose of tracking the size is to limit the total > size of the cache on disk, and since the cache header itself can contain a > decent chunk of data (in the response headers), would it be > feasible/advisable to consistently use the full size of the file instead? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#52 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AABwB_tKXWS6aCquI0PY5iH2BMgPvEuQks5sOAqmgaJpZM4OQh4c> > . >

Collaborator

jpd236 commented Jul 17, 2017

I'd counter that the actual disk usage seems like what is more important to developers - it's what shows up in the platform settings and is attributed as usage to the app. It seems harder for a developer to understand/control what's going on if we just say "max usage not counting for some overhead that changes depending on the number of entries you have in the cache + how many headers the HTTP server happens to return in the response" when all they're really concerned about is saying "don't use more than this much space on disk".

Could you elaborate on what makes that approach difficult?

I'm also okay deferring handling of this issue to its own separate issue, as it's certainly unrelated to the overall goal of fixing the crashes here and adding initial tests. But I would only feel comfortable doing so if this wouldn't introduce a regression where the cache might suddenly grow larger on disk than it would have with older versions of Volley due to changes in accounting.

Contributor Author

joebowbeer commented Jul 17, 2017

The put() implementation works exactly as it did before: pruneIfNeeded is called first in order to release enough space for the new entry to be written (data.length), and the new entry size is recorded as its data.length.

There is no file written before prune is called, so there is no easy way to prune based on the file length of the new entry.

Previously, initialize() worked differently from put(). During initialize() the file length was used to record the entry size. So the recorded size of an entry would change based on whether the entry was initialized from a previously written file, or put() in the current session. But initialize does not prune, so the total size of the cache on disk would not have changed: it would have been pruned according to the data size not the file size. After initialize(), however, a subsequent put() might cause some previously cached entries to be removed, because the size of the cache after initialize() will appear larger than it did when those entries were put().

If this is really the behavior you want to replicate, then it is easy enough. One liner.

However, trying to put() and prune() based on the size of an entry that has not yet been written would be more difficult to achieve, and I would view this as an incompatible change.


          Mimic previous inconsistent entry size metrics for increased compatib…

68f5a59

…ility

Contributor Author

joebowbeer commented Jul 17, 2017 •

edited

Loading

BTW, Travis build failure is not related to my change and appears to be due to a flaky test in RequestQueueIntegrationTest: add_requestProcessedInCorrectOrder

All tests pass here: https://travis-ci.org/joebowbeer/volley/builds/254582764

jpd236 mentioned this pull request

DiskBasedCache uses different inputs for file size #57

Closed

jpd236 approved these changes

View reviewed changes

Collaborator

jpd236 commented Jul 18, 2017

Forked #57 to resolve the inconsistency in how we track sizes. In general, I prefer to keep changes as small and scoped to the bug they're trying to fix as possible - it makes it easier to understand where behavior changed / revert broken changes as needed. This way we won't have to revert this CL or dig into it more deeply if someone reports a cache size growing unexpected large, for instance :)

Forked #56 for the flaky test. I retried the CI run until it passed (took two tries).

This change looks good to me. Since it's non-trivial, let me loop in the other Volley code reviewer and give them a chance to voice any concerns before I merge.

jpd236 merged commit aafd2e5 into google:master

Collaborator

jpd236 commented Jul 28, 2017

Thanks for your work and patience :)

joebowbeer mentioned this pull request

OutOfMemoryError and NegativeArraySizeException in DiskBasedCache.streamToBytes #12

Closed

joebowbeer deleted the dbc branch

July 28, 2017 22:30

joebowbeer mentioned this pull request

Failure to Cache New Response When Cache is Full #145

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet