DiskBasedCache fix with unit tests#52
Conversation
Update from original
|
See #13 for prior discussion. |
|
Thanks for the rebase :) I'll get to this review soon - I want to give it a very careful look. At a high level, though, I feel much better about this approach than what we were looking at before. |
jpd236
left a comment
There was a problem hiding this comment.
Overall - I feel pretty good about this approach, and really appreciate your thorough unit tests to ensure that nothing regresses. My comments are mostly nits, and I think there's definitely a path forward to getting this merged. Nice work!
| byte[] data = streamToBytes(cis, cis.bytesRemaining()); | ||
| return entry.toCacheEntry(data); | ||
| } finally { | ||
| //noinspection ThrowFromFinallyBlock |
There was a problem hiding this comment.
This confused me at first, so it would be nice to add a clarifying comment explaining why this is a safe warning to suppress, e,g.:
// Any IOException thrown here is handled by the below catch block by design.
There was a problem hiding this comment.
done.
BTW, this pattern is Style 2 at javapractices.com - Finally and catch
| CountingInputStream cis = new CountingInputStream( | ||
| new BufferedInputStream(createInputStream(file)), (int) file.length()); | ||
| try { | ||
| CacheHeader found = CacheHeader.readHeader(cis); |
There was a problem hiding this comment.
nit: could you name the variable "entryHeader" or something more clearly indicative of what this variable represents?
| new BufferedInputStream(createInputStream(file)), (int) file.length()); | ||
| try { | ||
| CacheHeader found = CacheHeader.readHeader(cis); | ||
| if (!key.equals(found.key)) { |
There was a problem hiding this comment.
nit: while this is safe, I prefer "!TextUtils.equals(key, found.key)" as a best practice in general because it offers 100% guarantee that the code won't crash even if key is null, whereas otherwise we have to look more closely at the lines above to make sure that this is a guaranteed impossibility.
| if (!key.equals(found.key)) { | ||
| // File was shared by two keys and now holds data for a different entry! | ||
| VolleyLog.d("%s: key=%s, found=%s", file.getAbsolutePath(), key, found.key); | ||
| removeEntry(key); |
There was a problem hiding this comment.
Should this be remove() or removeEntry()? If the former, maybe it would be cleaner to just do:
throw new IOException(String.format("%s: key=%s but file key = %s", file.getAbsolutePath(), key, found.key));
and let the catch block take care of the cleanup.
There was a problem hiding this comment.
I added a comment to clarify.
| entry.size = cis.bytesRemaining(); | ||
| putEntry(entry.key, entry); | ||
| } finally { | ||
| //noinspection ThrowFromFinallyBlock |
| } | ||
| } catch (IOException ignored) { } | ||
| CacheHeader entry = CacheHeader.readHeader(cis); | ||
| entry.size = cis.bytesRemaining(); |
There was a problem hiding this comment.
It looks like the previous code considered the entry size equal to the file length (i.e. including the header), but the new code would only count the number of bytes remaining after reading the header.
There was a problem hiding this comment.
The previous code was inconsistent. The entry.size was set to file.length() in initialize(), but was set to the data.length in put(), via the two-arg CacheHeader constructor. The new code consistently uses the data length.
There was a problem hiding this comment.
Ah - good catch re: the inconsistency.
However, since the purpose of tracking the size is to limit the total size of the cache on disk, and since the cache header itself can contain a decent chunk of data (in the response headers), would it be feasible/advisable to consistently use the full size of the file instead?
| import static org.mockito.Mockito.verify; | ||
|
|
||
| @RunWith(RobolectricTestRunner.class) | ||
| @Config(manifest="src/main/AndroidManifest.xml", sdk=21) |
There was a problem hiding this comment.
Volley does (currently) need to be compatible with SDK 8 and earlier. I'm open to relaxing that to something like API 14, but it'd be a topic for a separate discussion, and I'm worried that if we test against SDK 21 here, we may mask potential issues like calls to APIs that didn't exist back in SDK 8.
Is this necessary, and if so, why?
There was a problem hiding this comment.
I changed to sdk=16, which is the minimum supported by Robolectric.
Note that Robolectric runs sdk=16 by default.
I added the sdk attribute here because I wanted to eliminate a Robolectric warning about not being able to find the manifest, and adding a manifest attribute seems to require an sdk attribute. (The other Robolectric tests still generate this warning.)
AFAIK none of the unit tests under the test path are testing sdk=8 compliance. For that, I recommend adding some tests under the androidTest path.
| cache.put("kilobyte3", randomData(1024)); | ||
|
|
||
| // Create DataInputStream that throws IOException | ||
| InputStream mockedInputStream = spy(InputStream.class); |
There was a problem hiding this comment.
do you actually need a spy here, or does a regular mock work?
There was a problem hiding this comment.
I think spy is needed, or at the least it is an expedient way to inject only the failure that I want. I need the methods other than read() to function normally.
| private Cache.Entry randomData(int length) { | ||
| Cache.Entry entry = new Cache.Entry(); | ||
| byte[] data = new byte[length]; | ||
| new Random().nextBytes(data); |
There was a problem hiding this comment.
Random data in unit tests is typically an antipattern - it can lead to flakiness if one set of data causes the tests to fail but others work fine. And flakiness can lead to a lot of pain for things like continuous integration.
Assuming we don't expect the contents of the data to matter much, I'd just write fixed data here of some sort. If we did think the data contents were actually likely to impact code flows rather than just be read agnostically as bytes, it would potentially be reasonable to try a bunch of different combinations of data, but still from a fixed set. But since AFAIK we just read the raw bytes directly and all the interesting logic happens surrounding that, that feels like overkill here.
There was a problem hiding this comment.
I made the random seed explicit and added a comment.
rules.gradle
Outdated
| testCompile "org.mockito:mockito-core:1.9.5" | ||
| testCompile "junit:junit:4.12" | ||
| testCompile "org.hamcrest:hamcrest-library:1.3" | ||
| testCompile "org.mockito:mockito-core:2.2.16" |
There was a problem hiding this comment.
Any objections to using 2.2.29 instead? It's the latest from the 2.2 branch (and is available in our internal prebuilt repositories, whereas 2.2.16 isn't, so it'll make our lives a bit easier when we pull in this patch).
Alternatively, I'd certainly be fine with leaving this at 1.9.5 and dealing with upgrading Mockito separately. Do you need specific new functionality from the upgraded deps?
|
@yilanl - feel free to take a look as well to see if you have any further concerns. |
|
Thanks for review! PTAL |
|
I think the data size is arguably as good or better than the file size on disk. It gives the user a
number they can work with, namely, the size of their own data. It also makes
put() easier to implement, and is more easily tested.
…On Jul 14, 2017 5:41 PM, "Jeff Davidson" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/main/java/com/android/volley/toolbox/DiskBasedCache.java
<#52 (comment)>:
> try {
- if (fis != null) {
- fis.close();
- }
- } catch (IOException ignored) { }
+ CacheHeader entry = CacheHeader.readHeader(cis);
+ entry.size = cis.bytesRemaining();
Ah - good catch re: the inconsistency.
However, since the purpose of tracking the size is to limit the total size
of the cache on disk, and since the cache header itself can contain a
decent chunk of data (in the response headers), would it be
feasible/advisable to consistently use the full size of the file instead?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#52 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABwB_tKXWS6aCquI0PY5iH2BMgPvEuQks5sOAqmgaJpZM4OQh4c>
.
|
|
PS - trim is performed before put, based on the size of the data. I think
it's a bug that initialize wasn't using the data size as well.
…On Jul 14, 2017 6:49 PM, "Joe Bowbeer" ***@***.***> wrote:
I think the data size is arguably as good or better. This gives the user a
number they can work with, namely, the size of their own data. This makes
put() easier to implement, and is also more easily tested.
On Jul 14, 2017 5:41 PM, "Jeff Davidson" ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In src/main/java/com/android/volley/toolbox/DiskBasedCache.java
> <#52 (comment)>:
>
> > try {
> - if (fis != null) {
> - fis.close();
> - }
> - } catch (IOException ignored) { }
> + CacheHeader entry = CacheHeader.readHeader(cis);
> + entry.size = cis.bytesRemaining();
>
> Ah - good catch re: the inconsistency.
>
> However, since the purpose of tracking the size is to limit the total
> size of the cache on disk, and since the cache header itself can contain a
> decent chunk of data (in the response headers), would it be
> feasible/advisable to consistently use the full size of the file instead?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#52 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AABwB_tKXWS6aCquI0PY5iH2BMgPvEuQks5sOAqmgaJpZM4OQh4c>
> .
>
|
|
I'd counter that the actual disk usage seems like what is more important to developers - it's what shows up in the platform settings and is attributed as usage to the app. It seems harder for a developer to understand/control what's going on if we just say "max usage not counting for some overhead that changes depending on the number of entries you have in the cache + how many headers the HTTP server happens to return in the response" when all they're really concerned about is saying "don't use more than this much space on disk". Could you elaborate on what makes that approach difficult? I'm also okay deferring handling of this issue to its own separate issue, as it's certainly unrelated to the overall goal of fixing the crashes here and adding initial tests. But I would only feel comfortable doing so if this wouldn't introduce a regression where the cache might suddenly grow larger on disk than it would have with older versions of Volley due to changes in accounting. |
|
The put() implementation works exactly as it did before: pruneIfNeeded is called first in order to release enough space for the new entry to be written (data.length), and the new entry size is recorded as its data.length. There is no file written before prune is called, so there is no easy way to prune based on the file length of the new entry. Previously, initialize() worked differently from put(). During initialize() the file length was used to record the entry size. So the recorded size of an entry would change based on whether the entry was initialized from a previously written file, or put() in the current session. But initialize does not prune, so the total size of the cache on disk would not have changed: it would have been pruned according to the data size not the file size. After initialize(), however, a subsequent put() might cause some previously cached entries to be removed, because the size of the cache after initialize() will appear larger than it did when those entries were put(). If this is really the behavior you want to replicate, then it is easy enough. One liner. However, trying to put() and prune() based on the size of an entry that has not yet been written would be more difficult to achieve, and I would view this as an incompatible change. |
|
BTW, Travis build failure is not related to my change and appears to be due to a flaky test in RequestQueueIntegrationTest: add_requestProcessedInCorrectOrder All tests pass here: https://travis-ci.org/joebowbeer/volley/builds/254582764 |
|
Forked #57 to resolve the inconsistency in how we track sizes. In general, I prefer to keep changes as small and scoped to the bug they're trying to fix as possible - it makes it easier to understand where behavior changed / revert broken changes as needed. This way we won't have to revert this CL or dig into it more deeply if someone reports a cache size growing unexpected large, for instance :) Forked #56 for the flaky test. I retried the CI run until it passed (took two tries). This change looks good to me. Since it's non-trivial, let me loop in the other Volley code reviewer and give them a chance to voice any concerns before I merge. |
|
Thanks for your work and patience :) |
Fixes Issue #12 without changing cache format
Changes:
DiskBasedCache fixed to avoid NegativeArraySizeException and OutOfMemoryError thrown from streamToBytes. Enhanced CountingInputStream nested class to maintain bytesRemaining (the file length initially), so that streamToBytes can validate its length parameter before attempting to allocate a new byte array.
DiskBasedCacheTest unit tests extended to 100% method coverage and 95% code coverage.
All tests passing: https://circleci.com/gh/joebowbeer/volley/24